Dataset: TED-Ed Dataset Acquired Via YouTube API

About this Dataset

TED-Ed Dataset Acquired Via YouTube API

As fan of TED-Ed, mycuriosity led me to explore what makes their videos unique. Employing the Google YouTubeData API (version 3.0), I collected the metadata associated with their videos. This dataset, consisting of two CSV files, one for video details and the other for the comments of the videos. The two have video_id as a common primary key.

video_df File:

video_id: The unique identifier assigned to each TED Ed video on YouTube.
channelTitle: The title of the YouTube channel where the TED Ed video is published, in this case, "TED-Ed."
title: The title of the TED Ed video, providing a brief overview of the video's content.
description: A detailed description of the video content, including additional information, background, and context provided by TED Ed.
tags: Keywords or phrases associated with the video, helping to categorize and index content for search purposes.
publishedAt: The date and time when the TED Ed video was published on YouTube.
viewCount: The number of views the video has received on YouTube.
likeCount: The count of "likes" received by the video, indicating positive audience engagement.
favouriteCount: The count of users who have marked the video as a favorite, if applicable.
commentCount: The number of comments posted by viewers on the video.
duration: The total duration of the video, presented in a human-readable format.
definition: Indicates whether the video is in high definition (HD) or another format.
caption: Boolean value indicating whether captions are available for the video.
publishDayName: The name of the day of the week when the video was published.
durationSecs: The total duration of the video in seconds.
tagsCount: The count of tags associated with the video.
likeRatio: The ratio of likes to views, providing a measure of audience appreciation.
commentRatio: The ratio of comments to views, indicating the level of audience interaction.
titleLength: The number of characters in the title of the video.
durationMinutes: The total duration of the video in minutes.
title_no_stopwords: The title of the video with common English stopwords removed, facilitating text analysis.

** comment_df File: **

video_id: The unique identifier linking each comment to the respective TED Ed video.
comments: A list of comments posted by viewers on the TED Ed video

Tables

Comments Df1

@kaggle.hadilhagar_ted_ed_dataset_acquired_via_youtube_api.comments_df1

943.29 KB
2100 rows
2 columns


CREATE TABLE comments_df1 (
  "video_id" VARCHAR,
  "comments" VARCHAR
);

Video Df

@kaggle.hadilhagar_ted_ed_dataset_acquired_via_youtube_api.video_df

1.48 MB
2108 rows
21 columns


CREATE TABLE video_df (
  "video_id" VARCHAR,
  "channeltitle" VARCHAR,
  "title" VARCHAR,
  "description" VARCHAR,
  "tags" VARCHAR,
  "publishedat" TIMESTAMP,
  "viewcount" BIGINT,
  "likecount" DOUBLE,
  "favouritecount" VARCHAR,
  "commentcount" DOUBLE,
  "duration" VARCHAR,
  "definition" VARCHAR,
  "caption" BOOLEAN,
  "publishdayname" VARCHAR,
  "durationsecs" BIGINT,
  "tagscount" BIGINT,
  "likeratio" DOUBLE,
  "commentratio" DOUBLE,
  "titlelength" BIGINT,
  "durationminutes" DOUBLE,
  "title_no_stopwords" VARCHAR
);