Baselight

Most Popular Github Repositories (Projects)

A GitHub dataset of 215K+ repos along with their details

@kaggle.donbarbos_github_repos

About this Dataset

Most Popular Github Repositories (Projects)

About

This dataset lists over 215k top projects by star with over 167 stars. Contains a lot of useful information (attributes).

I collected this dataset using github search api. This allows you to get only the first thousand for a query, so I looped through the low/high (stars) pairs that return less than a thousand repositories when query=stars:{low}..{high}.

The Github API Terms of Service apply.

You may not use this dataset for spamming purposes, including for the purposes of selling GitHub users' personal information, such as to recruiters, headhunters, and job boards.

Columns

Column name Description
Name The name of the GitHub repository
Description A brief textual description that summarizes the purpose or focus of the repository
URL The URL or web address that links to the GitHub repository, which is a unique identifier for the repository
Created At The date and time when the repository was initially created on GitHub, in ISO 8601 format
Updated At The date and time of the most recent update or modification to the repository, in ISO 8601 format
Homepage The URL to the homepage or landing page associated with the repository, providing additional information or resources
Size The size of the repository in bytes, indicating the total storage space used by the repository's files and data
Stars The number of stars or likes that the repository has received from other GitHub users, indicating its popularity or interest
Forks The number of times the repository has been forked by other GitHub users
Issues The total number of open issues
Watchers The number of GitHub users who are "watching" or monitoring the repository for updates and changes
Language The primary programming language
License Information about the software license using a license identifier
Topics A list of topics or tags associated with the repository, helping users discover related projects and topics of interest
Has Issues A boolean value indicating whether the repository has an issue tracker enabled. In this case, it's true, meaning it has an issue tracker
Has Projects A boolean value indicating whether the repository uses GitHub Projects to manage and organize tasks and work items
Has Downloads A boolean value indicating whether the repository offers downloadable files or assets to users
Has Wiki A boolean value indicating whether the repository has an associated wiki with additional documentation and information
Has Pages A boolean value indicating whether the repository has GitHub Pages enabled, allowing the creation of a website associated with the repository
Has Discussions A boolean value indicating whether the repository has GitHub Discussions enabled, allowing community discussions and collaboration
Is Fork A boolean value indicating whether the repository is a fork of another repository. In this case, it's false, meaning it is not a fork
Is Archived A boolean value indicating whether the repository is archived. Archived repositories are typically read-only and are no longer actively maintained
Is Template A boolean value indicating whether the repository is set up as a template
Default Branch The name of the default branch

Share link

Anyone who has the link will be able to view this.