For training Turkish language models

NLI-TR (Turkish NLI Research)

Unleash Your NLI Research in Turkish Language!

By Huggingface Hub [source]

About this dataset

NLI-TR is a revolutionary set of two datasets that provide an unparalleled opportunity for the natural language processing and machine learning community to conduct inference research in the Turkish Language. The datasets - SNLI-TR and MNLI-TR - contain carefully curated natural language inference data that have been translated into Turkish. With NLI-TR, researchers can explore the exciting prospects of developing automated models tailored to make inferences on texts produced in this vibrant language. Moreover, they can also investigate how models trained on data from one language fare when applied in another, a valuable insight into cross-lingual generalization capabilities. NLI-TR offers both seasoned and budding researchers an unprecedented platform to further our understanding of natural language inferencing capability

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

How To Use The NLI-TR Dataset to Unlock Turkish NLI Research

Welcome to the exciting world of natural language inference (NLI) research! If you’re looking for a great dataset to use for your research in this field, the NLI-TR dataset is a perfect starting point. This guide will provide an overview of how you can use the data from this dataset to uncover new insights about NLI tasks in Turkish.

The NLI-TR dataset contains two large scale datasets intended for natural language inference tasks – SNLI-TR and MNLI- TR. Both datasets offer researchers an opportunity to explore Natural Language Inference (NLI) research in the Turkish language, with examples ranging from sentence paraphrasing task and classification tasks to question answering scenarios using various NLP techniques.

Using the Data:

The data provided in this dataset includes both training and validation sets, making it easy for researchers who are just getting started with their projects. The SNLI_tr_train.csv file is used as input for training your models, while slni_tr_validation can be used as input for testing or validating model accuracy on unseen data. Additionally, multinli_tr_validation_{matched / mismatched}.csv files offer additional validation on how well your trained models perform on more complex scenarios such as sentence paraphrasing or question answering tasks using various NLP techniques.

Each record includes four columns – premise ,hypothesis ,label , (and domain). The premise column specifies what information is provided before asking a question or making an inference; think of it as context clues that explain why one statement implies another statement more directly than others might do without them . The hypothesis column provides what lies at the heart of inference --the conclusion reached after introducing facts given before it . Last but not least we have label column which denotes whether two sentences entail each other (ENTAILMENT), contradict each other(CONTRADICTION) or are unrelated(NEUTRAL). A domain label has also been assigned by some authors when necessary; this mostly applies when inferring between sentences across different semantic domains such as weather vs sports vs finance etc .

Research Ideas

Developing an NLI-based Turkish language question answering system.

Training a sentiment analysis algorithm to identify sentiment in text written in Turkish.

Building a Machine Learning Chatbot that uses NLI to understand conversational context and respond accordingly for users intending to converse in the Turkish language

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: snli_tr_train.csv

Column name	Description
premise	This column contains sentences written in Turkish which have been translated from English sources used for SNLI and MNLI datasets respectively. (String)
hypothesis	This column contains sentences written in Turkish which have been translated from English sources used for SNLI and MNLI datasets respectively. (String)
label	This column assigns either ‘entailment’ , 'contradiction',or ‘neutral' sentiment/word association depending on whether they accept(entailment), reject(contradiction), or are neutral towards each other(neutral). (String)

File: multinli_tr_validation_matched.csv

Column name	Description
premise	This column contains sentences written in Turkish which have been translated from English sources used for SNLI and MNLI datasets respectively. (String)
hypothesis	This column contains sentences written in Turkish which have been translated from English sources used for SNLI and MNLI datasets respectively. (String)
label	This column assigns either ‘entailment’ , 'contradiction',or ‘neutral' sentiment/word association depending on whether they accept(entailment), reject(contradiction), or are neutral towards each other(neutral). (String)

File: snli_tr_validation.csv

Column name	Description
premise	This column contains sentences written in Turkish which have been translated from English sources used for SNLI and MNLI datasets respectively. (String)
hypothesis	This column contains sentences written in Turkish which have been translated from English sources used for SNLI and MNLI datasets respectively. (String)
label	This column assigns either ‘entailment’ , 'contradiction',or ‘neutral' sentiment/word association depending on whether they accept(entailment), reject(contradiction), or are neutral towards each other(neutral). (String)

File: multinli_tr_validation_mismatched.csv

Column name	Description
premise	This column contains sentences written in Turkish which have been translated from English sources used for SNLI and MNLI datasets respectively. (String)
hypothesis	This column contains sentences written in Turkish which have been translated from English sources used for SNLI and MNLI datasets respectively. (String)
label	This column assigns either ‘entailment’ , 'contradiction',or ‘neutral' sentiment/word association depending on whether they accept(entailment), reject(contradiction), or are neutral towards each other(neutral). (String)

File: multinli_tr_train.csv

Column name	Description
premise	This column contains sentences written in Turkish which have been translated from English sources used for SNLI and MNLI datasets respectively. (String)
hypothesis	This column contains sentences written in Turkish which have been translated from English sources used for SNLI and MNLI datasets respectively. (String)
label	This column assigns either ‘entailment’ , 'contradiction',or ‘neutral' sentiment/word association depending on whether they accept(entailment), reject(contradiction), or are neutral towards each other(neutral). (String)

File: snli_tr_test.csv

Column name	Description
premise	This column contains sentences written in Turkish which have been translated from English sources used for SNLI and MNLI datasets respectively. (String)
hypothesis	This column contains sentences written in Turkish which have been translated from English sources used for SNLI and MNLI datasets respectively. (String)
label	This column assigns either ‘entailment’ , 'contradiction',or ‘neutral' sentiment/word association depending on whether they accept(entailment), reject(contradiction), or are neutral towards each other(neutral). (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Related Datasets

Korean Natural Language Inference

@kaggle
AI Performance On Language Tasks

@owid
Ethnic Power Relations Dataset (ETH, 2021)

@owid
European Electricity Review (Ember, 2022)

@owid
Trust Questions In The European Social Survey, Latinobarómetro And Afrobarometer

@owid
Eucalyptus Growth And Environmental Data

@euremarkable

Korean Natural Language Inference

AI Performance On Language Tasks

Ethnic Power Relations Dataset (ETH, 2021)

European Electricity Review (Ember, 2022)

Trust Questions In The European Social Survey, Latinobarómetro And Afrobarometer

Eucalyptus Growth And Environmental Data