Baselight

LC-QuAD 2.0 (Question & Answering)

30,000 pairs of question and its corresponding SPARQL query

@kaggle.thedevastator_unlock_smarter_querying_with_lc_quad_2_0

Loading...
Loading...

About this Dataset

LC-QuAD 2.0 (Question & Answering)


LC-QuAD 2.0 (Question & Answering)

30,000 pairs of question and its corresponding SPARQL query

By Huggingface Hub [source]


About this dataset

LC-QuAD 2.0 is a breakthrough dataset designed to advance the state of intelligent querying towards unprecedented heights. By providing a collection of 30,000 different pairs of questions and their respective SPARQL queries each, it presents an enormous opportunity for every person looking to unlock the power of knowledge with smart querying techniques.

These questions have been carefully devised such that they relate to the latest version of Wikidata and DBpedia, granting tech-savvy individuals an access key to an information repository far beyond what was once thought imaginable. The dataset found under this union is nothing short of amazing - consisting not just of Natural Language Questions but also their solutions in the form of a SPARQL query. With LC-QuAD 2.0, you have at your fingertips more than thirty thousand answers ready for any query you can think up! Unlocking knowledge has never been easier!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

Using the LC-QuAD 2.0 dataset can be a great way to power up your intelligent systems with smarter querying. Whether you want to build a question-answering system or create new knowledge graphs and search systems, utilizing this dataset can certainly be helpful. Here is a guide on how to use this dataset:

  • Understand the structure of the data: The LC-QuAD 2.0 consists of 30,000 different pairs of questions and their corresponding SPARQL queries in two files – train (used for training an intelligent system) and test (used for testing an intelligent system). The columns present in each pair are NNQT_question (Natural Language Question), subgraph (Subgraph information for the question), sparql_dbpedia18 (SPARQL query for DBpedia 18), template (Templates from which SPARQL query was generated).

  • Read up on SPARQL: Before you start using this dataset, it is important that you read up more on what SPARQL means and how it works as SPAQL will be used frequently when browsing through this data set. This will make the understanding of the content easier and quicker!

  • Start exploring!: After doing some research about SPARQL, now it’s time to explore! You can start by looking at each pair in detail - read through its natural language question, subgraph information and try understanding its relation with its corresponding sparql queries from both DBpedia 18 or try running these sparql queries yourself against Wikidata or DBPedia platform to see where they lead you eventually! In case any query has multiple results having different variances with respect to answers range , then look inside entity definitions contained within words \ phrases / synonyms reflected by natural language parsing services API's like AIKATsetu etc., before writing authoritative answer modules/endpoints forming partinmonly sustainable pipeline architecture using such prepared & refined datasets like LC-QUAD !

  • Use your own data: Once you have familiarized yourself sufficiently with the available pairs & understand their relevance , consider creating your own data set by adding more complex questions along associated unique attributes which shall give great insights . If not done already evaluate if population enrichment techniques should be applied suiting specific domain's needs your bot purports - either just features selection criterion wise or entire classifier selection algorithm wise as otherwise global extracted vectors may decide either selectively for reducing overfitting/generalization penalty in

Research Ideas

  • Incorporating the LC-QUAD 2.0 dataset into Intelligent systems such as Chatbots, Question Answering Systems, and Document Summarization programs to allow them to retrieve the required information by transforming natural language questions into SPARQL queries.
  • Utilizing this dataset in Semantic Scholar Search Engines and Academic Digital Libraries which can use natural language queries instead of keywords in order to perform more sophisticated searches and provide more accurate results for researchers in diverse areas.
  • Applying this dataset for building Knowledge Graphs that can store entities along with their attributes, categories and relations thereby allowing better understanding of complex relationships between entities or data and further advancing development of AI agents that are able to answer specific questions or provide personalized recommendations in various contexts or tasks

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv

Column name Description
NNQT_question Natural language questions. (String)
subgraph Subgraph of the question. (Graph)
sparql_dbpedia18 SPARQL query for the question. (Query)
template Template for the question. (String)
paraphrased_question Paraphrased version of the question. (String)

File: test.csv

Column name Description
NNQT_question Natural language questions. (String)
subgraph Subgraph of the question. (Graph)
sparql_dbpedia18 SPARQL query for the question. (Query)
template Template for the question. (String)
paraphrased_question Paraphrased version of the question. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Test

@kaggle.thedevastator_unlock_smarter_querying_with_lc_quad_2_0.test
  • 1.08 MB
  • 4781 rows
  • 9 columns
Loading...

CREATE TABLE test (
  "nnqt_question" VARCHAR,
  "uid" BIGINT,
  "subgraph" VARCHAR,
  "template_index" BIGINT,
  "question" VARCHAR,
  "sparql_wikidata" VARCHAR,
  "sparql_dbpedia18" VARCHAR,
  "template" VARCHAR,
  "paraphrased_question" VARCHAR
);

Train

@kaggle.thedevastator_unlock_smarter_querying_with_lc_quad_2_0.train
  • 4.35 MB
  • 19293 rows
  • 9 columns
Loading...

CREATE TABLE train (
  "nnqt_question" VARCHAR,
  "uid" BIGINT,
  "subgraph" VARCHAR,
  "template_index" BIGINT,
  "question" VARCHAR,
  "sparql_wikidata" VARCHAR,
  "sparql_dbpedia18" VARCHAR,
  "template" VARCHAR,
  "paraphrased_question" VARCHAR
);

Share link

Anyone who has the link will be able to view this.