Tamil NLP
Datasets for Natural Language Processing in Tamil
@kaggle.sudalairajkumar_tamil_nlp
Datasets for Natural Language Processing in Tamil
@kaggle.sudalairajkumar_tamil_nlp
Indic NLP - Natural Language Processing for Indian Languages.
This dataset is a step towards the same for tamil language. Thanks for Malaikannan for the initiative and Selva for getting the data from websites. The idea is to add more datasets related to Tamil NLP at a single place.
The dataset has the following files.
Tamil News Classficaition
This dataset has 14521 rows for training and 3631 rows for testing. It has 6 news categories - "tamilnadu", "india", "cinema", "sports", "politics", "world". The data is obtained from this link
Tamil Movie Review Dataset
This dataset has 480 training samples and 121 testing samples. It has the review text in tamil and ratings between 1 to 5. The data is obtained from this link
Thirukkural Dataset
From Wikipedia, The Tirukkural, or shortly the Kural, is a classic Tamil text consisting of 1,330 couplets or Kurals, dealing with the everyday virtues of an individual. It is one of the two oldest works now extant in Tamil literature.
I have split the data into train and test and we can use the kural and / or the explanations to predict the three parts - aram (virtue), porul (polity) and inbam (love). The dataset is obtained from this link.
Will add more datasets in the following versions.
My sincere thanks to :
Some questions which can be answered are
And lot more interesting questions to be answered.
Checkout this link to find similar and dissimilar words for Tamil.
CREATE TABLE tamil_movie_reviews_test (
"reviewid" BIGINT,
"reviewintamil" VARCHAR,
"rating" DOUBLE
);CREATE TABLE tamil_movie_reviews_train (
"reviewid" BIGINT,
"reviewintamil" VARCHAR,
"rating" DOUBLE
);CREATE TABLE tamil_news_test (
"newsinenglish" VARCHAR,
"newsintamil" VARCHAR,
"category" VARCHAR,
"categoryintamil" VARCHAR
);CREATE TABLE tamil_news_train (
"newsinenglish" VARCHAR,
"newsintamil" VARCHAR,
"category" VARCHAR,
"categoryintamil" VARCHAR
);CREATE TABLE tamil_thirukkural_test (
"number" BIGINT,
"kural" VARCHAR,
"explanation" VARCHAR,
"adikaram_name" VARCHAR,
"iyal_name" VARCHAR,
"paul_name" VARCHAR,
"paul_translation" VARCHAR,
"mk" VARCHAR,
"mv" VARCHAR,
"sp" VARCHAR
);CREATE TABLE tamil_thirukkural_train (
"number" BIGINT,
"kural" VARCHAR,
"explanation" VARCHAR,
"adikaram_name" VARCHAR,
"iyal_name" VARCHAR,
"paul_name" VARCHAR,
"paul_translation" VARCHAR,
"mk" VARCHAR,
"mv" VARCHAR,
"sp" VARCHAR
);Anyone who has the link will be able to view this.