Baselight

GitHub Commit Messages Dataset

4.3 Million commit messages on GitHub

@kaggle.dhruvildave_github_commit_messages_dataset

Loading...
Loading...

About this Dataset

GitHub Commit Messages Dataset

Image credits: https://github.com

Introduction

This is a dataset that contains all commit messages and its related metadata from 34 popular GitHub repositories. These repositories are:

  • tensorflow/tensorflow
  • pytorch/pytorch
  • torvalds/linux
  • python/cpython
  • rust-lang/rust
  • microsoft/TypeScript
  • microsoft/vscode
  • golang/go
  • numpy/numpy
  • scikit-learn/scikit-learn
  • openbsd/src
  • freebsd/freebsd-src
  • pandas-dev/pandas
  • scipy/scipy
  • tidyverse/ggplot2
  • kubernetes/kubernetes
  • postgres/postgres
  • nodejs/node
  • facebook/react
  • angular/angular
  • matplotlib/matplotlib
  • apache/httpd
  • nginx/nginx
  • opencv/opencv
  • ipython/ipython
  • rstudio/rstudio
  • jupyterlab/jupyterlab
  • gcc-mirror/gcc
  • apple/swift
  • denoland/deno
  • apache/spark
  • llvm/llvm-project
  • chromium/chromium
  • v8/v8

Data as of Wed Apr 21 03:42:44 PM IST 2021

Credits

Image credits: Unsplash - plhnk

Tables

Full

@kaggle.dhruvildave_github_commit_messages_dataset.full
  • 1004.57 MB
  • 4336299 rows
  • 5 columns
Loading...

CREATE TABLE full (
  "commit" VARCHAR,
  "author" VARCHAR,
  "date" VARCHAR,
  "message" VARCHAR,
  "repo" VARCHAR
);

Oneline

@kaggle.dhruvildave_github_commit_messages_dataset.oneline
  • 325.66 MB
  • 4336299 rows
  • 3 columns
Loading...

CREATE TABLE oneline (
  "commit" VARCHAR,
  "message" VARCHAR,
  "repo" VARCHAR
);

Share link

Anyone who has the link will be able to view this.