Compositional Freebase Questions
Compositional Freebase Questions dataset for measuring generalization
By cfq (From Huggingface) [source]
About this dataset
The Compositional Freebase Questions (CFQ) dataset is a collection of questions and their corresponding queries. It is designed to measure compositional generalization in question answering. The dataset consists of a training set, a test set for evaluating the performance of query pattern splitting, as well as additional files providing further information about the dataset.
The training set, named query_pattern_split_train.csv, contains pairs of questions and their corresponding query representations. These queries are used to train models for question answering tasks on the CFQ dataset. The file mcd2_train.csv provides additional training data specifically tailored for measuring compositional generalization in question answering.
On the other hand, the test set, named query_pattern_split_test.csv, is specifically designed to evaluate the performance of query pattern splitting techniques. This allows researchers to assess how well different approaches handle query decomposition and compositionality.
Overall, this dataset serves as a valuable resource for studying and evaluating compositional generalization in question answering systems. By providing both natural language questions and their corresponding query representations, it enables researchers to develop and test models that can comprehend complex queries involving structured knowledge bases like Freebase
Research Ideas
- Compositional Generalization Measurement: The dataset can be used to measure the compositional generalization abilities of question answering models. By testing how well a model can generalize to new combinations of query patterns and questions, researchers can evaluate the language understanding and reasoning capabilities of different models.
- Query Pattern Splitting Evaluation: The test set in this dataset can be used specifically for evaluating the performance of query pattern splitting techniques. Query pattern splitting involves decomposing complex queries into simpler subqueries based on patterns observed in the training data. By evaluating how well a model performs on this task, researchers can assess the effectiveness of such decomposition approaches.
- Question Answering Model Training: The dataset can also be used as a training set for developing question answering models that aim to handle compositional questions based on Freebase knowledge graphs. By using this dataset, researchers and developers can build models that understand complex questions and generate appropriate queries to retrieve relevant information from knowledge graphs like Freebase
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: query_pattern_split_train.csv
Column name |
Description |
question |
The natural language question asked. (Text) |
query |
The corresponding query representation of the question in a machine-readable format. (Text) |
File: mcd2_train.csv
Column name |
Description |
question |
The natural language question asked. (Text) |
query |
The corresponding query representation of the question in a machine-readable format. (Text) |
File: query_pattern_split_test.csv
Column name |
Description |
question |
The natural language question asked. (Text) |
query |
The corresponding query representation of the question in a machine-readable format. (Text) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit cfq (From Huggingface).