Baselight

Annotated NER For Indian Language

This dataset will contain Annotated NER data for multiple languages

@kaggle.vpkprasanna_annotated_indian_language_ner

Loading...
Loading...

About this Dataset

Annotated NER For Indian Language

Context

There's a story behind every dataset and here's your opportunity to share yours.
The main idea of this dataset os to perform NER on regional languages as well like Tamil,Telugu,Kannada,Malayalam,Hindi and more.

Content

For now have added for Tamil language and in upcoming days I will add more.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

I have done some Named Entity Recognition (NER) on English data , so why can't we do for our regional data. That's how I started this.

Tables

Hindi Ner

@kaggle.vpkprasanna_annotated_indian_language_ner.hindi_ner
  • 1.45 MB
  • 493603 rows
  • 2 columns
Loading...

CREATE TABLE hindi_ner (
  "tokens" VARCHAR,
  "tags" VARCHAR
);

Kannada Ner

@kaggle.vpkprasanna_annotated_indian_language_ner.kannada_ner
  • 692.83 KB
  • 100479 rows
  • 2 columns
Loading...

CREATE TABLE kannada_ner (
  "tokens" VARCHAR,
  "tags" VARCHAR
);

Malayalam Ner

@kaggle.vpkprasanna_annotated_indian_language_ner.malayalam_ner
  • 2.39 MB
  • 280130 rows
  • 2 columns
Loading...

CREATE TABLE malayalam_ner (
  "tokens" VARCHAR,
  "tags" VARCHAR
);

Tamil Ner

@kaggle.vpkprasanna_annotated_indian_language_ner.tamil_ner
  • 4.07 MB
  • 542225 rows
  • 2 columns
Loading...

CREATE TABLE tamil_ner (
  "name" VARCHAR,
  "tags" VARCHAR
);

Telugu Ner

@kaggle.vpkprasanna_annotated_indian_language_ner.telugu_ner
  • 1.54 MB
  • 259458 rows
  • 2 columns
Loading...

CREATE TABLE telugu_ner (
  "tokens" VARCHAR,
  "tags" VARCHAR
);

Share link

Anyone who has the link will be able to view this.