Baselight

Scicite (Classifying Citation Intents In Papers)

Classifying citation intents in academic papers

@kaggle.thedevastator_harvesting_scholarly_insight_with_scicite

Loading...
Loading...

About this Dataset

Scicite (Classifying Citation Intents In Papers)


Scicite (Classifying Citation Intents In Papers)

Classifying citation intents in academic papers

By Huggingface Hub [source]


About this dataset

Discover a world of knowledge and power with scicite! Through its labeled data of scholarly citations extracted from scientific articles, scicite unlocks the key to uncovering information in multiple fields like computer science, biomedicine, ecology and beyond. Laid out in easily digestible columns including strings, section names, labels, isKeyCitations, label2s and more – you’ll soon find yourself losing track of time as you explore this goldmine of facts and figures. With a quick glance at each entry noted down in the dataset’s information log, you can quickly start pinpointing pertinent pieces of info straight away; from sources to key citations to start/end indices that say it all. Don't be left behind - unlock the power hidden within today!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset consists of three CSV files, each containing different elements related to scholarly citations gathered from scientific articles: train.csv, test.csv and validation.csv. These can be used in a variety of ways in order to gain insight into the research process and improve its accuracy and efficiency.

  • Extracting useful information from citations: The labels attached to each citation section can help in extracting specific information about the sources cited or any other data included for research purposes. Additionally, isKeyCitation gives an indication if the source referred is a key citation which could be looked into in greater detail by researchers or practitioners.

  • Identifying relationships between citations: scicite's sectionName column helps identify related elements of writing including introduction and abstracts that enable the identification of Potential relationships between these elements and references found within them thus helping better understand what connections scholar have made previously with their research pieces

  • Improving accuracy in data gathering: With string, citeStart and citeEnd columns available along with source labels one can easily identify if certain references are repeated multiple times while also double checking accuracy through start/end values associated with them

  • Validation purposes : Last but not least one can also use this dataset for validating documents written by scholars for peer review where similar sections found prior inside unrelated documents can be used as reference points that need to match signaling correctness on original authors part

Research Ideas

  • Developing a search engine to quickly find citations relevant to specific topics and research areas.
  • Creating algorithms that can predict key citations and streamline the research process by automatically including only the most important references in a paper.
  • Designing AI systems that can accurately classify, analyze and summarize different scholarly works based on the citation frequency, source type & label assigned to them

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv

Column name Description
string The string of text associated with the citation. (String)
sectionName The name of the section the citation is found in. (String)
label The label associated with the citation. (String)
isKeyCitation A boolean value indicating whether the citation is a key citation. (Boolean)
label2 The second label associated with the citation. (String)
citeEnd The end index of the citation in the text. (Integer)
citeStart The start index of the citation in the text. (Integer)
source The source of the citation. (String)

File: train.csv

Column name Description
string The string of text associated with the citation. (String)
sectionName The name of the section the citation is found in. (String)
label The label associated with the citation. (String)
isKeyCitation A boolean value indicating whether the citation is a key citation. (Boolean)
label2 The second label associated with the citation. (String)
citeEnd The end index of the citation in the text. (Integer)
citeStart The start index of the citation in the text. (Integer)
source The source of the citation. (String)

File: test.csv

Column name Description
string The string of text associated with the citation. (String)
sectionName The name of the section the citation is found in. (String)
label The label associated with the citation. (String)
isKeyCitation A boolean value indicating whether the citation is a key citation. (Boolean)
label2 The second label associated with the citation. (String)
citeEnd The end index of the citation in the text. (Integer)
citeStart The start index of the citation in the text. (Integer)
source The source of the citation. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Test

@kaggle.thedevastator_harvesting_scholarly_insight_with_scicite.test
  • 603.81 KB
  • 1859 rows
  • 14 columns
Loading...

CREATE TABLE test (
  "string" VARCHAR,
  "sectionname" VARCHAR,
  "label" BIGINT,
  "citingpaperid" VARCHAR,
  "citedpaperid" VARCHAR,
  "excerpt_index" BIGINT,
  "iskeycitation" BOOLEAN,
  "label2" BIGINT,
  "citeend" BIGINT,
  "citestart" BIGINT,
  "source" BIGINT,
  "label_confidence" DOUBLE,
  "label2_confidence" DOUBLE,
  "id" VARCHAR
);

Train

@kaggle.thedevastator_harvesting_scholarly_insight_with_scicite.train
  • 2.04 MB
  • 8194 rows
  • 14 columns
Loading...

CREATE TABLE train (
  "string" VARCHAR,
  "sectionname" VARCHAR,
  "label" BIGINT,
  "citingpaperid" VARCHAR,
  "citedpaperid" VARCHAR,
  "excerpt_index" BIGINT,
  "iskeycitation" BOOLEAN,
  "label2" BIGINT,
  "citeend" BIGINT,
  "citestart" BIGINT,
  "source" BIGINT,
  "label_confidence" DOUBLE,
  "label2_confidence" DOUBLE,
  "id" VARCHAR
);

Validation

@kaggle.thedevastator_harvesting_scholarly_insight_with_scicite.validation
  • 294.49 KB
  • 916 rows
  • 14 columns
Loading...

CREATE TABLE validation (
  "string" VARCHAR,
  "sectionname" VARCHAR,
  "label" BIGINT,
  "citingpaperid" VARCHAR,
  "citedpaperid" VARCHAR,
  "excerpt_index" BIGINT,
  "iskeycitation" BOOLEAN,
  "label2" BIGINT,
  "citeend" BIGINT,
  "citestart" BIGINT,
  "source" BIGINT,
  "label_confidence" DOUBLE,
  "label2_confidence" DOUBLE,
  "id" VARCHAR
);

Share link

Anyone who has the link will be able to view this.