Dataset: Youtube Videos Dataset (~3400 Videos)

About this Dataset

Youtube Videos Dataset (~3400 Videos)

Context 📃

I wanted to practice text classification using NLP techniques, so I thought why not practice it by generating the data myself!
This way, I brushed up on my scraping techniques using Selenium, collected the data, cleaned it, and then started working on it.
You can take a peek at my work Github Repository For This Dataset and Trained Models/ Results

Content 📰

The total number of videos scraped was 3600. I scraped the following things from each video:

link	title	description	category
Video ID	Category for which the video was scraped	Description of the video	Category for which the video was scraped.

I queried the videos for 4 categories:

Travel Vlogs 🧳
Food 🥑
Art and Music 🎨 🎻
History 📜

Acknowledgements 🙏

I could have used a ready made API, but just for the fun of it, I scraped the data from Youtube using Selenium.

Inspiration 🦋

The data is not clean (for your enjoyment of cleaning the data!), has some missing values, and is imbalanced.
Practice text classification on this dataset, you will have to learn different techniques for eg:- How to handle imbalanced classes..?
While working on this dataset, you will learn a lot of different things and also get an opportunity to apply on this dataset.

Tables

Youtube

@kaggle.rajatrc1705_youtube_videos_dataset.youtube

994.6 KB
3599 rows
4 columns


CREATE TABLE youtube (
  "link" VARCHAR,
  "title" VARCHAR,
  "description" VARCHAR,
  "category" VARCHAR
);