XQuAD: XQuAD (Cross-lingual Q&A)
Cross-lingual Question & Answering
By Huggingface Hub [source]
About this dataset
Is your research challenging and exciting? It can be with the XQuAD dataset. This dataset was designed to help researchers advance their work in cross-lingual question answering by providing a professional translation of the data into ten languages. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set of SQuAD v1.1, making it entirely parallel across 11 languages. With this valuable resource, your research can be taken to new heights.
More Datasets
For more datasets, click here.
Featured Notebooks
- 🚨 Your notebook can be here! 🚨!
How to use the dataset
This dataset is a great resource for researchers who want to evaluate cross-lingual question answering performance. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set of SQuAD v1.1 (Rajpurkar et al., 2016) together with their professional translations into ten languages: Spanish, German, Greek, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, and Hindi. Consequently, the dataset is entirely parallel across 11 languages.
To use this dataset effectively, researchers should first familiarize themselves with the source data (SQuAD v1.1), which is available in English only. Once they have a good understanding of the task and how to approach it, they can then begin to explore the XQuAD dataset and compare results across languages.
This dataset provides an excellent opportunity for cross-lingual learning anddeep neural network techniques
Research Ideas
- To evaluate the performance of a cross-lingual question answering system
- To compare the performance of different cross-lingual question answering systems
- To understand how well a cross-lingual question answering system works
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: xquad.vi_validation.csv
Column name |
Description |
context |
The context of the question. (String) |
answers |
The answers to the question. (List of strings) |
File: xquad.ro_validation.csv
Column name |
Description |
context |
The context of the question. (String) |
answers |
The answers to the question. (List of strings) |
File: xquad.de_validation.csv
Column name |
Description |
context |
The context of the question. (String) |
answers |
The answers to the question. (List of strings) |
File: xquad.el_validation.csv
Column name |
Description |
context |
The context of the question. (String) |
answers |
The answers to the question. (List of strings) |
File: xquad.ar_validation.csv
Column name |
Description |
context |
The context of the question. (String) |
answers |
The answers to the question. (List of strings) |
File: xquad.hi_validation.csv
Column name |
Description |
context |
The context of the question. (String) |
answers |
The answers to the question. (List of strings) |
File: xquad.zh_validation.csv
Column name |
Description |
context |
The context of the question. (String) |
answers |
The answers to the question. (List of strings) |
File: xquad.th_validation.csv
Column name |
Description |
context |
The context of the question. (String) |
answers |
The answers to the question. (List of strings) |
File: xquad.es_validation.csv
Column name |
Description |
context |
The context of the question. (String) |
answers |
The answers to the question. (List of strings) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.