Baselight

YouTubers Saying Things

Dataset containing popular Youtuber's video subtitles from different categories.

@kaggle.praneshmukhopadhyay_youtubers_saying_things

Loading...
Loading...

About this Dataset

YouTubers Saying Things

Intro

Founded and maintained since 2005, YouTube is one of the internet's biggest platforms. With their number of videos watched per day exceeding 1 Billion, it's easy for any user to differentiate genres just by glancing at the thumbnail and the Title. My inspiration to make this dataset was to try and answer the question of whether it is equally easy for a computer to do.

The Transcript column in the dataset contains the subtitles for the respective videos. However, the reliability of the subtitles may vary. Even though the auto-generated subtitles work great (most of the time). Sometimes under heavy pressure of thick accents, it lets go of the ball. Please consult the CC attribute to check whether the subtitle is auto-generated or not. 1381 of these video subtitles are auto-generated, the rest of the 1134 are manual ones.

Since the values of Subscribers and Views are based on the time when the dataset was generated. That's to be taken into account. The most recent version of this dataset was generated on 05-Feb-2022.

Description

This dataset contains subtitles from over 91 different YouTubers, ranging from all different kinds of categories. The data were collected and cleaned (as much as necessary) by me. Currently, the dataset contains 2515 unique videos and their subtitles. There are 11 columns in the dataset. You can find their purpose in the column descriptors.

Improvements

I am open to suggestions please feel free to let me know of any major Categories or Channels that I've missed or you'll like to be included. I'll try my best to include them in the dataset. Find the dataset page on my Github.

![drawing](https://github.githubassets.com/images/modules/logos_page/GitHub-Logo.png =100x20)

Tables

Data

@kaggle.praneshmukhopadhyay_youtubers_saying_things.data
  • 17.07 MB
  • 2515 rows
  • 11 columns
Loading...

CREATE TABLE data (
  "id" VARCHAR,
  "channel" VARCHAR,
  "subscribers" VARCHAR,
  "title" VARCHAR,
  "cc" BIGINT,
  "url" VARCHAR,
  "released" VARCHAR,
  "views" VARCHAR,
  "category" VARCHAR,
  "transcript" VARCHAR,
  "length" VARCHAR
);

Share link

Anyone who has the link will be able to view this.