Yektanet( Dataset For Text Classification)
Text Classification!
@kaggle.pkdarabi_yektanet
Text Classification!
@kaggle.pkdarabi_yektanet
The Yektanet Dataset is a real Persian web data collection that has been refined and gathered by the Yektanet platform. It serves as the basis for an industrial case study on the application of machine learning in Natural Language Processing (NLP). The primary objective of this exercise is to develop a machine-learning model capable of predicting the categorical topic of a document based on the available text features, such as the title, description, complete text content, and more.
The dataset consists of multiple instances, each containing various features that provide information about the documents. The main target variable in this dataset is the category column, which indicates the topic or category of the content. This column serves as the target variable for the prediction task.
Additional features in the dataset include
With the provided features and the target variable, the Yektanet dataset offers a valuable resource for training and evaluating machine learning models in the domain of NLP. It enables researchers and practitioners to explore and develop effective approaches for document categorization and topic prediction tasks.
CREATE TABLE yektanet_train (
"category" VARCHAR,
"description" VARCHAR,
"text_content" VARCHAR,
"title" VARCHAR,
"h1" VARCHAR,
"h2" VARCHAR,
"url" VARCHAR,
"domain" VARCHAR,
"id" BIGINT
);Anyone who has the link will be able to view this.