Baselight

Youtube Videos Dataset (~3400 Videos)

3400 Videos spread across 4 categories! Testbed to learn text classification.

@kaggle.rajatrc1705_youtube_videos_dataset

About this Dataset

Youtube Videos Dataset (~3400 Videos)

Context 📃

I wanted to practice text classification using NLP techniques, so I thought why not practice it by generating the data myself!
This way, I brushed up on my scraping techniques using Selenium, collected the data, cleaned it, and then started working on it.
You can take a peek at my work Github Repository For This Dataset and Trained Models/ Results

Content 📰

The total number of videos scraped was 3600. I scraped the following things from each video:

link title description category
Video ID Category for which the video was scraped Description of the video Category for which the video was scraped.

I queried the videos for 4 categories:

Travel Vlogs 🧳
Food 🥑
Art and Music 🎨 🎻
History 📜

Acknowledgements 🙏

I could have used a ready made API, but just for the fun of it, I scraped the data from Youtube using Selenium.

Inspiration 🦋

The data is not clean (for your enjoyment of cleaning the data!), has some missing values, and is imbalanced.
Practice text classification on this dataset, you will have to learn different techniques for eg:- How to handle imbalanced classes..?
While working on this dataset, you will learn a lot of different things and also get an opportunity to apply on this dataset.

Share link

Anyone who has the link will be able to view this.