Baselight

Yektanet( Dataset For Text Classification)

Text Classification!

@kaggle.pkdarabi_yektanet

About this Dataset

Yektanet( Dataset For Text Classification)

The Yektanet Dataset is a real Persian web data collection that has been refined and gathered by the Yektanet platform. It serves as the basis for an industrial case study on the application of machine learning in Natural Language Processing (NLP). The primary objective of this exercise is to develop a machine-learning model capable of predicting the categorical topic of a document based on the available text features, such as the title, description, complete text content, and more.

The dataset consists of multiple instances, each containing various features that provide information about the documents. The main target variable in this dataset is the category column, which indicates the topic or category of the content. This column serves as the target variable for the prediction task.

Additional features in the dataset include

  • Description: This column describes the document.
  • Text_content: This column contains the complete text content of the document.
  • Title: This column represents the title of the document.
  • h1 and h2: These columns contain the content within the HTML tags h1 and h2, respectively.
  • URL: This column specifies the link address associated with the document.
  • Domain: This column indicates the domain or website from which the document originates.
  • Id: This column represents the unique identifier for each link.

With the provided features and the target variable, the Yektanet dataset offers a valuable resource for training and evaluating machine learning models in the domain of NLP. It enables researchers and practitioners to explore and develop effective approaches for document categorization and topic prediction tasks.

Share link

Anyone who has the link will be able to view this.