Baselight

SciTail (Multiple-choice Science Exams)

27,026 Multiple-choice science exams and web sentences

@kaggle.thedevastator_futuristic_natural_language_inference_with_the_s

Loading...
Loading...

About this Dataset

SciTail (Multiple-choice Science Exams)


SciTail (Multiple-choice science exams)

27,026 Multiple-choice science exams and web sentences

By Huggingface Hub [source]


About this dataset

The Scitail dataset is your gateway to unlocking powerful and advanced Sci-Fi Natural Language Inference (NLI) algorithms. With data sourced from popular books, movies, and TV shows in the genre, this dataset gives you the opportunity to develop and train NLI algorithms capable of understanding complex sci-fi conversations. Containing seven distinct formats including training sets for both predictor format and datagem format as well as testing sets in tsv format and SNLI format - all containing the same fields but in varied structures - this is an essential resource for any scientist looking to explore the realm of sci-fi NLI! Train your algorithm today with Scitail; unlock a future of supercharged Sci-Fi language processing!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide will explain how to use the Scitail dataset for Natural Language Inference (NLI). NLI is a machine learning task which involves making predictions about a statement’s labels, such as entailment, contradiction, or neutral. The Scitail dataset contains sci-fi samples sourced from various sources such as books, movies and TV shows that can be used to train and evaluate NLI algorithms.

The Scitail dataset is split into seven different formats: Dataset Gem format for testing and training, Predictor format for validation and training, .TSV format for testing and validation. Each of these formats contain the same data fields in different forms; including premise, hypothesis, label (entailment/contradiction/neutral), label assigned by annotators etc.

To get started using this dataset we recommend downloading the datasets in whichever format you prefer from Kaggle. All files are stored as csv’s with each row representing a single data point in the form of premise-hypothesis pairs with labels assigned by annotators which indicate whether two statements entail one another or not.

Once you have downloaded your preferred datasets it’s time to prepare them for training or evaluation purposes; this includes formatting them correctly so they can be used properly by algorithms. To do so we suggest splitting your chosen file(s) into separate sets — training/validation — such that you have selected samples that are sufficiently representative of real-world language samples that demonstrate positive entailing relations as well examples where no entailing relation exists between two statements or uncertainty exists due to lack of evidence provided within a pair’s context i.e., neutral relation between two statements if ambiguity regarding outcome exists based on premises provided within those statements is present

Research Ideas

  • Develop and fine-tune NLI algorithms with different levels of Sci-Fi language complexity.
  • Use the annotator labels to develop an automated human-in-the-loop approach to NLI algorithms.
  • Incorporate the hypothesis graph structure into existing models to improve accuracy and reduce error rates in identifying contextual comparisons between premises and hypotheses in Sci-Fi texts

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: dgem_format_test.csv

Column name Description
premise The premise of the statement (String).
hypothesis The hypothesis of the statement (String).
label The label of the statement – either entailment, neutral or contradiction (String).
hypothesis_graph_structure A graph structure of the hypothesis (Graph)

File: predictor_format_validation.csv

Column name Description
answer The answer to the question. (String)
sentence2_structure A graph structure of the second sentence. (Graph)
sentence1 The first sentence of the statement. (String)
gold_label The label of the statement – either entailment, neutral or contradiction. (String)

File: tsv_format_test.csv

Column name Description
premise The premise of the statement (String).
hypothesis The hypothesis of the statement (String).
label The label of the statement – either entailment, neutral or contradiction (String).

File: snli_format_validation.csv

Column name Description
sentence1 The first sentence of the statement. (String)
sentence2_structure A graph structure of the second sentence. (Graph)
gold_label The label of the statement – either entailment, neutral or contradiction. (String)
sentence1_binary_parse Binary parse of first sentence. (String)
sentence1_parse Parse of first sentence. (String)
sentence2_parse Parse of second sentence. (String)
annotator_labels Labels assigned by annotators. (String)

File: dgem_format_train.csv

Column name Description
premise The premise of the statement (String).
hypothesis The hypothesis of the statement (String).
label The label of the statement – either entailment, neutral or contradiction (String).
hypothesis_graph_structure A graph structure of the hypothesis (Graph)

File: snli_format_train.csv

Column name Description
sentence1_binary_parse Binary parse of first sentence. (String)
sentence1_parse Parse of first sentence. (String)
sentence1 The first sentence of the statement. (String)
sentence2_parse Parse of second sentence. (String)
sentence2_structure A graph structure of the second sentence. (Graph)
annotator_labels Labels assigned by annotators. (String)
gold_label The label of the statement – either entailment, neutral or contradiction. (String)

File: predictor_format_train.csv

Column name Description
answer The answer to the question. (String)
sentence2_structure A graph structure of the second sentence. (Graph)
sentence1 The first sentence of the statement. (String)
gold_label The label of the statement – either entailment, neutral or contradiction. (String)

File: snli_format_test.csv

Column name Description
sentence1_binary_parse Binary parse of first sentence. (String)
sentence1_parse Parse of first sentence. (String)
sentence1 The first sentence of the statement. (String)
sentence2_parse Parse of second sentence. (String)
sentence2_structure A graph structure of the second sentence. (Graph)
annotator_labels Labels assigned by annotators. (String)
gold_label The label of the statement – either entailment, neutral or contradiction. (String)

File: tsv_format_validation.csv

Column name Description
premise The premise of the statement (String).
hypothesis The hypothesis of the statement (String).
label The label of the statement – either entailment, neutral or contradiction (String).

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Dgem Format Test

@kaggle.thedevastator_futuristic_natural_language_inference_with_the_s.dgem_format_test
  • 161.77 KB
  • 2126 rows
  • 4 columns
Loading...

CREATE TABLE dgem_format_test (
  "premise" VARCHAR,
  "hypothesis" VARCHAR,
  "label" VARCHAR,
  "hypothesis_graph_structure" VARCHAR
);

Dgem Format Train

@kaggle.thedevastator_futuristic_natural_language_inference_with_the_s.dgem_format_train
  • 1.53 MB
  • 23088 rows
  • 4 columns
Loading...

CREATE TABLE dgem_format_train (
  "premise" VARCHAR,
  "hypothesis" VARCHAR,
  "label" VARCHAR,
  "hypothesis_graph_structure" VARCHAR
);

Dgem Format Validation

@kaggle.thedevastator_futuristic_natural_language_inference_with_the_s.dgem_format_validation
  • 106.32 KB
  • 1304 rows
  • 4 columns
Loading...

CREATE TABLE dgem_format_validation (
  "premise" VARCHAR,
  "hypothesis" VARCHAR,
  "label" VARCHAR,
  "hypothesis_graph_structure" VARCHAR
);

Predictor Format Test

@kaggle.thedevastator_futuristic_natural_language_inference_with_the_s.predictor_format_test
  • 177.93 KB
  • 2126 rows
  • 6 columns
Loading...

CREATE TABLE predictor_format_test (
  "answer" VARCHAR,
  "sentence2_structure" VARCHAR,
  "sentence1" VARCHAR,
  "sentence2" VARCHAR,
  "gold_label" VARCHAR,
  "question" VARCHAR
);

Predictor Format Train

@kaggle.thedevastator_futuristic_natural_language_inference_with_the_s.predictor_format_train
  • 1.6 MB
  • 23587 rows
  • 6 columns
Loading...

CREATE TABLE predictor_format_train (
  "answer" VARCHAR,
  "sentence2_structure" VARCHAR,
  "sentence1" VARCHAR,
  "sentence2" VARCHAR,
  "gold_label" VARCHAR,
  "question" VARCHAR
);

Predictor Format Validation

@kaggle.thedevastator_futuristic_natural_language_inference_with_the_s.predictor_format_validation
  • 117.41 KB
  • 1304 rows
  • 6 columns
Loading...

CREATE TABLE predictor_format_validation (
  "answer" VARCHAR,
  "sentence2_structure" VARCHAR,
  "sentence1" VARCHAR,
  "sentence2" VARCHAR,
  "gold_label" VARCHAR,
  "question" VARCHAR
);

Snli Format Test

@kaggle.thedevastator_futuristic_natural_language_inference_with_the_s.snli_format_test
  • 602.59 KB
  • 2126 rows
  • 7 columns
Loading...

CREATE TABLE snli_format_test (
  "sentence1_binary_parse" VARCHAR,
  "sentence1_parse" VARCHAR,
  "sentence1" VARCHAR,
  "sentence2_parse" VARCHAR,
  "sentence2" VARCHAR,
  "annotator_labels" VARCHAR,
  "gold_label" VARCHAR
);

Snli Format Train

@kaggle.thedevastator_futuristic_natural_language_inference_with_the_s.snli_format_train
  • 5.86 MB
  • 23596 rows
  • 7 columns
Loading...

CREATE TABLE snli_format_train (
  "sentence1_binary_parse" VARCHAR,
  "sentence1_parse" VARCHAR,
  "sentence1" VARCHAR,
  "sentence2_parse" VARCHAR,
  "sentence2" VARCHAR,
  "annotator_labels" VARCHAR,
  "gold_label" VARCHAR
);

Snli Format Validation

@kaggle.thedevastator_futuristic_natural_language_inference_with_the_s.snli_format_validation
  • 380.63 KB
  • 1304 rows
  • 7 columns
Loading...

CREATE TABLE snli_format_validation (
  "sentence1_binary_parse" VARCHAR,
  "sentence1_parse" VARCHAR,
  "sentence1" VARCHAR,
  "sentence2_parse" VARCHAR,
  "sentence2" VARCHAR,
  "annotator_labels" VARCHAR,
  "gold_label" VARCHAR
);

Tsv Format Test

@kaggle.thedevastator_futuristic_natural_language_inference_with_the_s.tsv_format_test
  • 148.08 KB
  • 2126 rows
  • 3 columns
Loading...

CREATE TABLE tsv_format_test (
  "premise" VARCHAR,
  "hypothesis" VARCHAR,
  "label" VARCHAR
);

Tsv Format Train

@kaggle.thedevastator_futuristic_natural_language_inference_with_the_s.tsv_format_train
  • 1.43 MB
  • 23097 rows
  • 3 columns
Loading...

CREATE TABLE tsv_format_train (
  "premise" VARCHAR,
  "hypothesis" VARCHAR,
  "label" VARCHAR
);

Tsv Format Validation

@kaggle.thedevastator_futuristic_natural_language_inference_with_the_s.tsv_format_validation
  • 95.96 KB
  • 1304 rows
  • 3 columns
Loading...

CREATE TABLE tsv_format_validation (
  "premise" VARCHAR,
  "hypothesis" VARCHAR,
  "label" VARCHAR
);

Share link

Anyone who has the link will be able to view this.