Break (Question Decomposition Meaning)
Human annotated dataset of natural language questions and their Question Decomp
By Huggingface Hub [source]
About this dataset
BreakData
Welcome to BreakData, an innovative and cutting-edge dataset devoted to exploring language understanding. This dataset contains a wealth of information related to question decomposition, operators, splits, sources, and allowed tokens and can be used to answer questions with precision. With deep insights into how humans comprehend and interpret language, BreakData provides an immense value for researchers developing sophisticated models that can help advance AI technologies. Our goal is to enable the development of more complex natural language processing which can be used in various applications such as automated customer support, chatbots for health care advice or automated marketing campaigns. Dive into this intriguing dataset now and discover how your work could change the world!
More Datasets
For more datasets, click here.
Featured Notebooks
- 🚨 Your notebook can be here! 🚨!
How to use the dataset
This dataset provides an exciting opportunity to explore and understand the complexities of language understanding. With this dataset, you can train models for natural language processing (NLP) activities such as question answering, text analytics, automated dialog systems, and more.
In order to make most effective use of the BreakData dataset, it’s important to know how it is organized and what types of data are included in each file. The BreakData dataset is broken down into nine different files:
- QDMR_train.csv
- QDMR_validation.csv
- QDMR-highlevel_train.csv
- QDMR-highlevel_test.csv
- logicalforms_train.csv
- logicalforms_validation.csv
- QDMRlexicon_train.csv
- QDMRLexicon_test csv
- QDHMLexiconHighLevelTest csv
Each file contains a different set of data that can be used to train your models for natural language understanding tasks or analyze existing questions or commands with accurate decompositions and operators from these datasets into their component parts and understand their relationships with each other:
-
The QDMR files include questions or statements from common domains like health care or banking that need to be interpreted according to a series of operators (elements such as verbs). This task requires identifying keywords in the statement or question text that trigger certain responses indicating variable values and variables themselves so any model trained on these datasets will need to accurately identify entities like time references (dates/times), monetary amounts, Boolean values (yes/no), etc., as well as relationships between those entities–all while following a defined rule set specific domain languages specialize in interpreting such text accurately by modeling complex context dependent queries requiring linguistic analysis in multiple steps through rigorous training on this kind of data would optimize decisions made by machines based on human relevant interactions like conversations inducing more accurate next best actions resulting in better decision making respectively matching human scale solution accuracy rate given growing customer demands being served increasingly faster leveraging machine learning models powered by breakdata NLP layer accuracy enabled interpreters able do seamless inference while using this comprehensive training set providing deeper insights with improved results transforming customer engagement quality at unprecedented rate .
-
The LogicalForms files include logical forms containing the building blocks (elements such as operators) for linking ideas together together across different sets of incoming variables which
Research Ideas
- Developing advanced natural language processing models to analyze questions using decompositions, operators, and splits.
- Training a machine learning algorithm to predict the semantic meaning of questions based on their decomposition and split.
- Conducting advanced text analytics by using the allowed tokens dataset to map out how people communicate specific concepts in different contexts or topics
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: QDMR-high-level_train.csv
Column name |
Description |
question_text |
The text of the question. (String) |
decomposition |
The decomposition of the question into its component parts. (String) |
operators |
The operators used to answer the question. (String) |
split |
The split of the question into its component parts. (String) |
File: QDMR-lexicon_test.csv
Column name |
Description |
source |
The source of the question. (String) |
allowed_tokens |
The allowed tokens for the question. (String) |
File: QDMR-lexicon_train.csv
Column name |
Description |
source |
The source of the question. (String) |
allowed_tokens |
The allowed tokens for the question. (String) |
File: logical-forms_train.csv
Column name |
Description |
question_text |
The text of the question. (String) |
decomposition |
The decomposition of the question into its component parts. (String) |
operators |
The operators used to answer the question. (String) |
split |
The split of the question into its component parts. (String) |
program |
The programming language used to answer the question. (String) |
File: QDMR-high-level-lexicon_test.csv
Column name |
Description |
source |
The source of the question. (String) |
allowed_tokens |
The allowed tokens for the question. (String) |
File: QDMR_train.csv
Column name |
Description |
question_text |
The text of the question. (String) |
decomposition |
The decomposition of the question into its component parts. (String) |
operators |
The operators used to answer the question. (String) |
split |
The split of the question into its component parts. (String) |
File: logical-forms_validation.csv
Column name |
Description |
question_text |
The text of the question. (String) |
decomposition |
The decomposition of the question into its component parts. (String) |
operators |
The operators used to answer the question. (String) |
split |
The split of the question into its component parts. (String) |
program |
The programming language used to answer the question. (String) |
File: QDMR_validation.csv
Column name |
Description |
question_text |
The text of the question. (String) |
decomposition |
The decomposition of the question into its component parts. (String) |
operators |
The operators used to answer the question. (String) |
split |
The split of the question into its component parts. (String) |
File: QDMR-high-level_test.csv
Column name |
Description |
question_text |
The text of the question. (String) |
decomposition |
The decomposition of the question into its component parts. (String) |
operators |
The operators used to answer the question. (String) |
split |
The split of the question into its component parts. (String) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.