SuperGLUE by Kaggle | Other

About this Dataset

SuperGLUE

Benchmark of task-specific difficult language understanding tasks

Sources

Huggingface Hub: link

About this dataset

SuperGLUE is a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, improved resources, and a new public leaderboard.

BoolQ (Boolean Questions, Clark et al., 2019a) is a QA task where each example consists of a short passage and a yes/no question about the passage. The questions are provided anonymously and unsolicited by users of the Google search engine, and afterwards paired with a paragraph from a Wikipedia article containing the answer. Following the original work, we evaluate with accuracy.

How to use the dataset

Research Ideas

Train a model to perform question answering.

Perform text classification.

Train a model for entity recognition.

Evaluate a model on the tasks.

And more..

Acknowledgements

License

> License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
> No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: boolq_test.csv

Column name	Description
question	The question to be answered. (String)
passage	The passage of text containing the answer to the question. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: record_test.csv

Column name	Description
passage	The passage of text containing the answer to the question. (String)
query	The question to be answered. (String)
entities	The entities in the passage of text. (List of strings)
answers	The answers to the question. (List of strings)

File: rte_train.csv

Column name	Description
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)
premise	The premise of the question. This is the text that the model will be given as input. (String)
hypothesis	The hypothesis of the question. This is the text that the model will be required to generate as output. (String)

File: wic_test.csv

Column name	Description
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)
word	The word in the question. (String)
sentence1	The first sentence in the question. (String)
sentence2	The second sentence in the question. (String)
start1	The starting index of the word in the first sentence. (Integer)
start2	The starting index of the word in the second sentence. (Integer)
end1	The ending index of the word in the first sentence. (Integer)
end2	The ending index of the word in the second sentence. (Integer)

File: record_validation.csv

Column name	Description
passage	The passage of text containing the answer to the question. (String)
query	The question to be answered. (String)
entities	The entities in the passage of text. (List of strings)
answers	The answers to the question. (List of strings)

File: wsc_validation.csv

Column name	Description
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)
text	The text of the question. (String)
span1_text	The text of the first span. (String)
span2_text	The text of the second span. (String)

File: copa_train.csv

Column name	Description
premise	The premise of the question. This is the text that the model will be given as input. (String)
question	The question to be answered. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: wsc_test.csv

Column name	Description
text	The text of the question. (String)
span1_text	The text of the first span. (String)
span2_text	The text of the second span. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: multirc_train.csv

Column name	Description
question	The question to be answered. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)
paragraph	The paragraph of text containing the answer to the question. (String)

File: cb_validation.csv

Column name	Description
premise	The premise of the question. This is the text that the model will be given as input. (String)
hypothesis	The hypothesis of the question. This is the text that the model will be required to generate as output. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: axg_test.csv

Column name	Description
premise	The premise of the question. This is the text that the model will be given as input. (String)
hypothesis	The hypothesis of the question. This is the text that the model will be required to generate as output. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: rte_test.csv

Column name	Description
premise	The premise of the question. This is the text that the model will be given as input. (String)
hypothesis	The hypothesis of the question. This is the text that the model will be required to generate as output. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: wic_train.csv

Column name	Description
word	The word in the question. (String)
sentence1	The first sentence in the question. (String)
sentence2	The second sentence in the question. (String)
start1	The starting index of the word in the first sentence. (Integer)
start2	The starting index of the word in the second sentence. (Integer)
end1	The ending index of the word in the first sentence. (Integer)
end2	The ending index of the word in the second sentence. (Integer)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: wsc.fixed_train.csv

Column name	Description
text	The text of the question. (String)
span1_text	The text of the first span. (String)
span2_text	The text of the second span. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: boolq_train.csv

Column name	Description
question	The question to be answered. (String)
passage	The passage of text containing the answer to the question. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: record_train.csv

Column name	Description
passage	The passage of text containing the answer to the question. (String)
query	The question to be answered. (String)
entities	The entities in the passage of text. (List of strings)
answers	The answers to the question. (List of strings)

File: wsc_train.csv

Column name	Description
text	The text of the question. (String)
span1_text	The text of the first span. (String)
span2_text	The text of the second span. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: cb_train.csv

Column name	Description
premise	The premise of the question. This is the text that the model will be given as input. (String)
hypothesis	The hypothesis of the question. This is the text that the model will be required to generate as output. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: copa_test.csv

Column name	Description
premise	The premise of the question. This is the text that the model will be given as input. (String)
question	The question to be answered. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: rte_validation.csv

Column name	Description
premise	The premise of the question. This is the text that the model will be given as input. (String)
hypothesis	The hypothesis of the question. This is the text that the model will be required to generate as output. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: multirc_validation.csv

Column name	Description
paragraph	The paragraph of text containing the answer to the question. (String)
question	The question to be answered. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: wsc.fixed_test.csv

Column name	Description
text	The text of the question. (String)
span1_text	The text of the first span. (String)
span2_text	The text of the second span. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: axb_test.csv

Column name	Description
sentence1	The first sentence in the question. (String)
sentence2	The second sentence in the question. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: wsc.fixed_validation.csv

Column name	Description
text	The text of the question. (String)
span1_text	The text of the first span. (String)
span2_text	The text of the second span. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: boolq_validation.csv

Column name	Description
question	The question to be answered. (String)
passage	The passage of text containing the answer to the question. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: cb_test.csv

Column name	Description
premise	The premise of the question. This is the text that the model will be given as input. (String)
hypothesis	The hypothesis of the question. This is the text that the model will be required to generate as output. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: multirc_test.csv

Column name	Description
paragraph	The paragraph of text containing the answer to the question. (String)
question	The question to be answered. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: copa_validation.csv

Column name	Description
premise	The premise of the question. This is the text that the model will be given as input. (String)
question	The question to be answered. (String)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

File: wic_validation.csv

Column name	Description
word	The word in the question. (String)
sentence1	The first sentence in the question. (String)
sentence2	The second sentence in the question. (String)
start1	The starting index of the word in the first sentence. (Integer)
start2	The starting index of the word in the second sentence. (Integer)
end1	The ending index of the word in the first sentence. (Integer)
end2	The ending index of the word in the second sentence. (Integer)
label	The label for the question. This can be one of three values: ENTAILMENT, NEUTRAL, or CONTRADICTION. (String)

Tables

Axb Test

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.axb_test

78.04 KB
1104 rows
4 columns


CREATE TABLE axb_test (
  "sentence1" VARCHAR,
  "sentence2" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Axg Test

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.axg_test

15.03 KB
356 rows
4 columns


CREATE TABLE axg_test (
  "premise" VARCHAR,
  "hypothesis" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Boolq Test

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.boolq_test

1.23 MB
3245 rows
4 columns


CREATE TABLE boolq_test (
  "question" VARCHAR,
  "passage" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Boolq Train

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.boolq_train

3.69 MB
9427 rows
4 columns


CREATE TABLE boolq_train (
  "question" VARCHAR,
  "passage" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Boolq Validation

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.boolq_validation

1.23 MB
3270 rows
4 columns


CREATE TABLE boolq_validation (
  "question" VARCHAR,
  "passage" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Cb Test

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.cb_test

63.06 KB
250 rows
4 columns


CREATE TABLE cb_test (
  "premise" VARCHAR,
  "hypothesis" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Cb Train

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.cb_train

57.7 KB
250 rows
4 columns


CREATE TABLE cb_train (
  "premise" VARCHAR,
  "hypothesis" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Cb Validation

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.cb_validation

18.66 KB
56 rows
4 columns


CREATE TABLE cb_validation (
  "premise" VARCHAR,
  "hypothesis" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Copa Test

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.copa_test

40.68 KB
500 rows
6 columns


CREATE TABLE copa_test (
  "premise" VARCHAR,
  "choice1" VARCHAR,
  "choice2" VARCHAR,
  "question" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Copa Train

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.copa_train

34.53 KB
400 rows
6 columns


CREATE TABLE copa_train (
  "premise" VARCHAR,
  "choice1" VARCHAR,
  "choice2" VARCHAR,
  "question" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Copa Validation

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.copa_validation

13.09 KB
100 rows
6 columns


CREATE TABLE copa_validation (
  "premise" VARCHAR,
  "choice1" VARCHAR,
  "choice2" VARCHAR,
  "question" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Multirc Test

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.multirc_test

488.38 KB
9693 rows
5 columns


CREATE TABLE multirc_test (
  "paragraph" VARCHAR,
  "question" VARCHAR,
  "answer" VARCHAR,
  "idx" VARCHAR,
  "label" BIGINT
);

Multirc Train

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.multirc_train

1.32 MB
27243 rows
5 columns


CREATE TABLE multirc_train (
  "paragraph" VARCHAR,
  "question" VARCHAR,
  "answer" VARCHAR,
  "idx" VARCHAR,
  "label" BIGINT
);

Multirc Validation

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.multirc_validation

252.77 KB
4848 rows
5 columns


CREATE TABLE multirc_validation (
  "paragraph" VARCHAR,
  "question" VARCHAR,
  "answer" VARCHAR,
  "idx" VARCHAR,
  "label" BIGINT
);

Record Test

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.record_test

6.3 MB
10000 rows
5 columns


CREATE TABLE record_test (
  "passage" VARCHAR,
  "query" VARCHAR,
  "entities" VARCHAR,
  "answers" VARCHAR,
  "idx" VARCHAR
);

Record Train

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.record_train

59.19 MB
100730 rows
5 columns


CREATE TABLE record_train (
  "passage" VARCHAR,
  "query" VARCHAR,
  "entities" VARCHAR,
  "answers" VARCHAR,
  "idx" VARCHAR
);

Record Validation

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.record_validation

6.38 MB
10000 rows
5 columns


CREATE TABLE record_validation (
  "passage" VARCHAR,
  "query" VARCHAR,
  "entities" VARCHAR,
  "answers" VARCHAR,
  "idx" VARCHAR
);

Rte Test

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.rte_test

602.2 KB
3000 rows
4 columns


CREATE TABLE rte_test (
  "premise" VARCHAR,
  "hypothesis" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Rte Train

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.rte_train

546.11 KB
2490 rows
4 columns


CREATE TABLE rte_train (
  "premise" VARCHAR,
  "hypothesis" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Rte Validation

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.rte_validation

69.26 KB
277 rows
4 columns


CREATE TABLE rte_validation (
  "premise" VARCHAR,
  "hypothesis" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Wic Test

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.wic_test

119.53 KB
1400 rows
9 columns


CREATE TABLE wic_test (
  "word" VARCHAR,
  "sentence1" VARCHAR,
  "sentence2" VARCHAR,
  "start1" BIGINT,
  "start2" BIGINT,
  "end1" BIGINT,
  "end2" BIGINT,
  "idx" BIGINT,
  "label" BIGINT
);

Wic Train

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.wic_train

306.46 KB
5428 rows
9 columns


CREATE TABLE wic_train (
  "word" VARCHAR,
  "sentence1" VARCHAR,
  "sentence2" VARCHAR,
  "start1" BIGINT,
  "start2" BIGINT,
  "end1" BIGINT,
  "end2" BIGINT,
  "idx" BIGINT,
  "label" BIGINT
);

Wic Validation

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.wic_validation

60.75 KB
638 rows
9 columns


CREATE TABLE wic_validation (
  "word" VARCHAR,
  "sentence1" VARCHAR,
  "sentence2" VARCHAR,
  "start1" BIGINT,
  "start2" BIGINT,
  "end1" BIGINT,
  "end2" BIGINT,
  "idx" BIGINT,
  "label" BIGINT
);

Wsc Fixed Test

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.wsc_fixed_test

13.56 KB
146 rows
7 columns


CREATE TABLE wsc_fixed_test (
  "text" VARCHAR,
  "span1_index" BIGINT,
  "span2_index" BIGINT,
  "span1_text" VARCHAR,
  "span2_text" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Wsc Fixed Train

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.wsc_fixed_train

28.37 KB
554 rows
7 columns


CREATE TABLE wsc_fixed_train (
  "text" VARCHAR,
  "span1_index" BIGINT,
  "span2_index" BIGINT,
  "span1_text" VARCHAR,
  "span2_text" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Wsc Fixed Validation

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.wsc_fixed_validation

11.65 KB
104 rows
7 columns


CREATE TABLE wsc_fixed_validation (
  "text" VARCHAR,
  "span1_index" BIGINT,
  "span2_index" BIGINT,
  "span1_text" VARCHAR,
  "span2_text" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Wsc Test

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.wsc_test

13.56 KB
146 rows
7 columns


CREATE TABLE wsc_test (
  "text" VARCHAR,
  "span1_index" BIGINT,
  "span2_index" BIGINT,
  "span1_text" VARCHAR,
  "span2_text" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Wsc Train

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.wsc_train

28.26 KB
554 rows
7 columns


CREATE TABLE wsc_train (
  "text" VARCHAR,
  "span1_index" BIGINT,
  "span2_index" BIGINT,
  "span1_text" VARCHAR,
  "span2_text" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);

Wsc Validation

@kaggle.thedevastator_task_oriented_natural_language_understanding_dat.wsc_validation

11.64 KB
104 rows
7 columns


CREATE TABLE wsc_validation (
  "text" VARCHAR,
  "span1_index" BIGINT,
  "span2_index" BIGINT,
  "span1_text" VARCHAR,
  "span2_text" VARCHAR,
  "idx" BIGINT,
  "label" BIGINT
);