Break (Question Decomposition Meaning) by Kaggle | Other

About this Dataset

Break (Question Decomposition Meaning)

Human annotated dataset of natural language questions and their Question Decomp

By Huggingface Hub [source]

About this dataset

BreakData
Welcome to BreakData, an innovative and cutting-edge dataset devoted to exploring language understanding. This dataset contains a wealth of information related to question decomposition, operators, splits, sources, and allowed tokens and can be used to answer questions with precision. With deep insights into how humans comprehend and interpret language, BreakData provides an immense value for researchers developing sophisticated models that can help advance AI technologies. Our goal is to enable the development of more complex natural language processing which can be used in various applications such as automated customer support, chatbots for health care advice or automated marketing campaigns. Dive into this intriguing dataset now and discover how your work could change the world!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides an exciting opportunity to explore and understand the complexities of language understanding. With this dataset, you can train models for natural language processing (NLP) activities such as question answering, text analytics, automated dialog systems, and more.

In order to make most effective use of the BreakData dataset, it’s important to know how it is organized and what types of data are included in each file. The BreakData dataset is broken down into nine different files:

QDMR_train.csv

QDMR_validation.csv

QDMR-highlevel_train.csv

QDMR-highlevel_test.csv

logicalforms_train.csv

logicalforms_validation.csv

QDMRlexicon_train.csv

QDMRLexicon_test csv

QDHMLexiconHighLevelTest csv

Each file contains a different set of data that can be used to train your models for natural language understanding tasks or analyze existing questions or commands with accurate decompositions and operators from these datasets into their component parts and understand their relationships with each other:

The QDMR files include questions or statements from common domains like health care or banking that need to be interpreted according to a series of operators (elements such as verbs). This task requires identifying keywords in the statement or question text that trigger certain responses indicating variable values and variables themselves so any model trained on these datasets will need to accurately identify entities like time references (dates/times), monetary amounts, Boolean values (yes/no), etc., as well as relationships between those entities–all while following a defined rule set specific domain languages specialize in interpreting such text accurately by modeling complex context dependent queries requiring linguistic analysis in multiple steps through rigorous training on this kind of data would optimize decisions made by machines based on human relevant interactions like conversations inducing more accurate next best actions resulting in better decision making respectively matching human scale solution accuracy rate given growing customer demands being served increasingly faster leveraging machine learning models powered by breakdata NLP layer accuracy enabled interpreters able do seamless inference while using this comprehensive training set providing deeper insights with improved results transforming customer engagement quality at unprecedented rate .

The LogicalForms files include logical forms containing the building blocks (elements such as operators) for linking ideas together together across different sets of incoming variables which

Research Ideas

Developing advanced natural language processing models to analyze questions using decompositions, operators, and splits.

Training a machine learning algorithm to predict the semantic meaning of questions based on their decomposition and split.

Conducting advanced text analytics by using the allowed tokens dataset to map out how people communicate specific concepts in different contexts or topics

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: QDMR-high-level_train.csv

Column name	Description
question_text	The text of the question. (String)
decomposition	The decomposition of the question into its component parts. (String)
operators	The operators used to answer the question. (String)
split	The split of the question into its component parts. (String)

File: QDMR-lexicon_test.csv

Column name	Description
source	The source of the question. (String)
allowed_tokens	The allowed tokens for the question. (String)

File: QDMR-lexicon_train.csv

Column name	Description
source	The source of the question. (String)
allowed_tokens	The allowed tokens for the question. (String)

File: logical-forms_train.csv

Column name	Description
question_text	The text of the question. (String)
decomposition	The decomposition of the question into its component parts. (String)
operators	The operators used to answer the question. (String)
split	The split of the question into its component parts. (String)
program	The programming language used to answer the question. (String)

File: QDMR-high-level-lexicon_test.csv

Column name	Description
source	The source of the question. (String)
allowed_tokens	The allowed tokens for the question. (String)

File: QDMR_train.csv

Column name	Description
question_text	The text of the question. (String)
decomposition	The decomposition of the question into its component parts. (String)
operators	The operators used to answer the question. (String)
split	The split of the question into its component parts. (String)

File: logical-forms_validation.csv

Column name	Description
question_text	The text of the question. (String)
decomposition	The decomposition of the question into its component parts. (String)
operators	The operators used to answer the question. (String)
split	The split of the question into its component parts. (String)
program	The programming language used to answer the question. (String)

File: QDMR_validation.csv

Column name	Description
question_text	The text of the question. (String)
decomposition	The decomposition of the question into its component parts. (String)
operators	The operators used to answer the question. (String)
split	The split of the question into its component parts. (String)

File: QDMR-high-level_test.csv

Column name	Description
question_text	The text of the question. (String)
decomposition	The decomposition of the question into its component parts. (String)
operators	The operators used to answer the question. (String)
split	The split of the question into its component parts. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Logical Forms Test

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.logical_forms_test

354.27 KB
8006 rows
6 columns


CREATE TABLE logical_forms_test (
  "question_id" VARCHAR,
  "question_text" VARCHAR,
  "decomposition" VARCHAR,
  "operators" VARCHAR,
  "split" VARCHAR,
  "program" VARCHAR
);

Logical Forms Train

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.logical_forms_train

5.47 MB
44098 rows
6 columns


CREATE TABLE logical_forms_train (
  "question_id" VARCHAR,
  "question_text" VARCHAR,
  "decomposition" VARCHAR,
  "operators" VARCHAR,
  "split" VARCHAR,
  "program" VARCHAR
);

Logical Forms Validation

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.logical_forms_validation

1.03 MB
7719 rows
6 columns


CREATE TABLE logical_forms_validation (
  "question_id" VARCHAR,
  "question_text" VARCHAR,
  "decomposition" VARCHAR,
  "operators" VARCHAR,
  "split" VARCHAR,
  "program" VARCHAR
);

Qdmr High Level Lexicon Test

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.qdmr_high_level_lexicon_test

722.62 KB
3195 rows
2 columns


CREATE TABLE qdmr_high_level_lexicon_test (
  "source" VARCHAR,
  "allowed_tokens" VARCHAR
);

Qdmr High Level Lexicon Train

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.qdmr_high_level_lexicon_train

3.84 MB
17503 rows
2 columns


CREATE TABLE qdmr_high_level_lexicon_train (
  "source" VARCHAR,
  "allowed_tokens" VARCHAR
);

Qdmr High Level Lexicon Validation

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.qdmr_high_level_lexicon_validation

707.71 KB
3130 rows
2 columns


CREATE TABLE qdmr_high_level_lexicon_validation (
  "source" VARCHAR,
  "allowed_tokens" VARCHAR
);

Qdmr High Level Test

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.qdmr_high_level_test

251.83 KB
3195 rows
5 columns


CREATE TABLE qdmr_high_level_test (
  "question_id" VARCHAR,
  "question_text" VARCHAR,
  "decomposition" VARCHAR,
  "operators" VARCHAR,
  "split" VARCHAR
);

Qdmr High Level Train

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.qdmr_high_level_train

2.25 MB
17503 rows
5 columns


CREATE TABLE qdmr_high_level_train (
  "question_id" VARCHAR,
  "question_text" VARCHAR,
  "decomposition" VARCHAR,
  "operators" VARCHAR,
  "split" VARCHAR
);

Qdmr High Level Validation

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.qdmr_high_level_validation

417.4 KB
3130 rows
5 columns


CREATE TABLE qdmr_high_level_validation (
  "question_id" VARCHAR,
  "question_text" VARCHAR,
  "decomposition" VARCHAR,
  "operators" VARCHAR,
  "split" VARCHAR
);

Qdmr Lexicon Test

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.qdmr_lexicon_test

1.34 MB
8069 rows
2 columns


CREATE TABLE qdmr_lexicon_test (
  "source" VARCHAR,
  "allowed_tokens" VARCHAR
);

Qdmr Lexicon Train

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.qdmr_lexicon_train

7.28 MB
44321 rows
2 columns


CREATE TABLE qdmr_lexicon_train (
  "source" VARCHAR,
  "allowed_tokens" VARCHAR
);

Qdmr Lexicon Validation

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.qdmr_lexicon_validation

1.3 MB
7760 rows
2 columns


CREATE TABLE qdmr_lexicon_validation (
  "source" VARCHAR,
  "allowed_tokens" VARCHAR
);

Qdmr Test

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.qdmr_test

356.16 KB
8069 rows
5 columns


CREATE TABLE qdmr_test (
  "question_id" VARCHAR,
  "question_text" VARCHAR,
  "decomposition" VARCHAR,
  "operators" VARCHAR,
  "split" VARCHAR
);

Qdmr Train

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.qdmr_train

3.63 MB
44321 rows
5 columns


CREATE TABLE qdmr_train (
  "question_id" VARCHAR,
  "question_text" VARCHAR,
  "decomposition" VARCHAR,
  "operators" VARCHAR,
  "split" VARCHAR
);

Qdmr Validation

@kaggle.thedevastator_unlock_the_mysteries_of_language_understanding_w.qdmr_validation

700.24 KB
7760 rows
5 columns


CREATE TABLE qdmr_validation (
  "question_id" VARCHAR,
  "question_text" VARCHAR,
  "decomposition" VARCHAR,
  "operators" VARCHAR,
  "split" VARCHAR
);