Baselight

WikiSQL (Questions And SQL Queries)

80654 hand-annotated questions and SQL queries on 24241 Wikipedia tables

@kaggle.thedevastator_dataset_for_developing_natural_language_interfac

Loading...
Loading...

About this Dataset

WikiSQL (Questions And SQL Queries)


WikiSQL (Questions and SQL Queries)

80654 hand-annotated questions and SQL queries on 24241 Wikipedia tables

By Huggingface Hub [source]


About this dataset

A large crowd-sourced dataset for developing natural language interfaces for relational databases.
WikiSQL is a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia.

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset can be used to develop natural language interfaces for relational databases. The data fields are the same among all splits, and the file contains information on the phase, question, table, and SQL for each interface

Research Ideas

  • This dataset can be used to develop natural language interfaces for relational databases.
  • This dataset can be used to develop a knowledge base of common SQL queries.
  • This dataset can be used to generate a training set for a neural network that translates natural language into SQL queries

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv

Column name Description
phase The phase of the data collection. (String)
question The question asked by the user. (String)
table The table containing the data for the question. (String)
sql The SQL query corresponding to the question. (String)

File: train.csv

Column name Description
phase The phase of the data collection. (String)
question The question asked by the user. (String)
table The table containing the data for the question. (String)
sql The SQL query corresponding to the question. (String)

File: test.csv

Column name Description
phase The phase of the data collection. (String)
question The question asked by the user. (String)
table The table containing the data for the question. (String)
sql The SQL query corresponding to the question. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Test

@kaggle.thedevastator_dataset_for_developing_natural_language_interfac.test
  • 7.36 MB
  • 15878 rows
  • 4 columns
Loading...

CREATE TABLE test (
  "phase" BIGINT,
  "question" VARCHAR,
  "table" VARCHAR,
  "sql" VARCHAR
);

Train

@kaggle.thedevastator_dataset_for_developing_natural_language_interfac.train
  • 25.6 MB
  • 56355 rows
  • 4 columns
Loading...

CREATE TABLE train (
  "phase" BIGINT,
  "question" VARCHAR,
  "table" VARCHAR,
  "sql" VARCHAR
);

Validation

@kaggle.thedevastator_dataset_for_developing_natural_language_interfac.validation
  • 3.51 MB
  • 8421 rows
  • 4 columns
Loading...

CREATE TABLE validation (
  "phase" BIGINT,
  "question" VARCHAR,
  "table" VARCHAR,
  "sql" VARCHAR
);

Share link

Anyone who has the link will be able to view this.