Sampled test data for evaluation of segmentation algorithms

Context

This is the data taken from the project to create dialogue corpus from movies.
The details of the project is explained below including links to the additional data and a paper:
http://i.yz.yamagata-u.ac.jp/moviedialcorpus/index.html

Content

Three column CSV files are uploaded. Each row corresponds to the automatically extracted segment. Each column correspond to 'beginning time' 'ending time' 'if the segment is dialogue or not'.

Past Research

We have conducted dialogue segment extraction from movies based on sounds. We have evaluated the data by our own VAD algorithm and filtering rules. The accuracy is about 90% except music and musical movies where the performances were much worse.

Inspiration

Although the performance is not so bad, it seems there is much room for improvements. We'd like to know if there is a better algorithm for dialogue segment extraction from movies.

Related Datasets

Movies To Emojis Dataset

@kaggle
AI Performance On Language Tasks

@owid
MoTT: A Speech Dataset For Modular Composition Of Turn-Taking Conversations

@zenodo
Trust Questions In The European Social Survey, Latinobarómetro And Afrobarometer

@owid
AI Performance On Coding Problems

@owid
Bechdel Test Results And Film Revenue Data

@fivethirtyeight

Movies To Emojis Dataset

AI Performance On Language Tasks

MoTT: A Speech Dataset For Modular Composition Of Turn-Taking Conversations

Trust Questions In The European Social Survey, Latinobarómetro And Afrobarometer

AI Performance On Coding Problems

Bechdel Test Results And Film Revenue Data