Text Classification!

The Yektanet Dataset is a real Persian web data collection that has been refined and gathered by the Yektanet platform. It serves as the basis for an industrial case study on the application of machine learning in Natural Language Processing (NLP). The primary objective of this exercise is to develop a machine-learning model capable of predicting the categorical topic of a document based on the available text features, such as the title, description, complete text content, and more.

The dataset consists of multiple instances, each containing various features that provide information about the documents. The main target variable in this dataset is the category column, which indicates the topic or category of the content. This column serves as the target variable for the prediction task.

Additional features in the dataset include

Description: This column describes the document.
Text_content: This column contains the complete text content of the document.
Title: This column represents the title of the document.
h1 and h2: These columns contain the content within the HTML tags h1 and h2, respectively.
URL: This column specifies the link address associated with the document.
Domain: This column indicates the domain or website from which the document originates.
Id: This column represents the unique identifier for each link.

With the provided features and the target variable, the Yektanet dataset offers a valuable resource for training and evaluating machine learning models in the domain of NLP. It enables researchers and practitioners to explore and develop effective approaches for document categorization and topic prediction tasks.

Related Datasets

Text Classification For QA Dataset

@kaggle
Ethnic Power Relations Dataset (ETH, 2021)

@owid
AI Performance On Language Tasks

@owid
Dhds Dataset

@cdc
Nuclear Weapons Proliferation

@owid
Trust Questions In The European Social Survey, Latinobarómetro And Afrobarometer

@owid

Text Classification For QA Dataset

Ethnic Power Relations Dataset (ETH, 2021)

AI Performance On Language Tasks

Dhds Dataset

Nuclear Weapons Proliferation

Trust Questions In The European Social Survey, Latinobarómetro And Afrobarometer