Baselight

Named Entity Recognition (NER) Corpus

Ready to use Named Entity Recognition Corpus

@kaggle.naseralqaydeh_named_entity_recognition_ner_corpus

Loading...
Loading...

About this Dataset

Named Entity Recognition (NER) Corpus

Task

Named Entity Recognition(NER) is a task of categorizing the entities in a text into categories like names of persons, locations, organizations, etc.

Dataset

Each row in the CSV file is a complete sentence, list of POS tags for each word in the sentence, and list of NER tags for each word in the sentence

You can use Pandas Dataframe to read and manipulate this dataset.

Since each row in the CSV file contain lists, if we read the file with pandas.read_csv() and try to get tag lists by indexing the list will be a string.

>>> data['tag'][0] 
"['O', 'O', 'O', 'O', 'O', 'O', 'B-geo', 'O', 'O', 'O', 'O', 'O', 'B-geo', 'O', 'O', 'O', 'O', 'O', 'B-gpe', 'O', 'O', 'O', 'O', 'O']"
>>> type(data['tag'][0])
string

You can use the following to convert it back to list type:

>>> from ast import literal_eval
>>> literal_eval(data['tag'][0] )
['O', 'O', 'O', 'O', 'O', 'O', 'B-geo', 'O', 'O', 'O', 'O', 'O', 'B-geo', 'O', 'O', 'O', 'O', 'O', 'B-gpe', 'O', 'O', 'O', 'O', 'O']
>>> type(literal_eval(data['tag'][0] ))
list

Acknowledgements

This dataset is taken from Annotated Corpus for Named Entity Recognition by Abhinav Walia dataset and then processed.

Annotated Corpus for Named Entity Recognition is annotated Corpus for Named Entity Recognition using GMB(Groningen Meaning Bank) corpus for entity classification with enhanced and popular features by Natural Language Processing applied to the data set.

Essential info about entities:

  • geo = Geographical Entity
  • org = Organization
  • per = Person
  • gpe = Geopolitical Entity
  • tim = Time indicator
  • art = Artifact
  • eve = Event
  • nat = Natural Phenomenon

Tables

Ner

@kaggle.naseralqaydeh_named_entity_recognition_ner_corpus.ner
  • 6.66 MB
  • 47959 rows
  • 4 columns
Loading...

CREATE TABLE ner (
  "sentence" VARCHAR,
  "sentence_3e3809" VARCHAR,
  "pos" VARCHAR,
  "tag" VARCHAR
);

Share link

Anyone who has the link will be able to view this.