Baselight

Movie Dialogue Segment Extraction

Sampled test data for evaluation of segmentation algorithms

@kaggle.inteng_moviedialcorpus

Loading...
Loading...

About this Dataset

Movie Dialogue Segment Extraction

Context

This is the data taken from the project to create dialogue corpus from movies.
The details of the project is explained below including links to the additional data and a paper:
http://i.yz.yamagata-u.ac.jp/moviedialcorpus/index.html

Content

Three column CSV files are uploaded. Each row corresponds to the automatically extracted segment. Each column correspond to 'beginning time' 'ending time' 'if the segment is dialogue or not'.

Past Research

We have conducted dialogue segment extraction from movies based on sounds. We have evaluated the data by our own VAD algorithm and filtering rules. The accuracy is about 90% except music and musical movies where the performances were much worse.

Inspiration

Although the performance is not so bad, it seems there is much room for improvements. We'd like to know if there is a better algorithm for dialogue segment extraction from movies.

Tables

Ia 6

@kaggle.inteng_moviedialcorpus.ia_6
  • 5.73 KB
  • 232 rows
  • 3 columns
Loading...

CREATE TABLE ia_6 (
  "n_0_0" DOUBLE,
  "n_46_99" DOUBLE,
  "n_0" BIGINT
);

Share link

Anyone who has the link will be able to view this.