Baselight

ArabSummaries

ملخصات عربية: قاعدة بيانات للنصوص العربية مع تلخيصاتها

@kaggle.tahaalselwii_arabsummaries

Loading...
Loading...

About this Dataset

ArabSummaries

ArabSummaries is a curated dataset designed for Arabic text summarization tasks. It consists of 7,000 Arabic documents spanning various domains, paired with their corresponding summaries generated using advanced AI models. The dataset provides a rich resource for natural language processing (NLP) research, especially in the field of text summarization.

Key Features:
- Domain Coverage:
1,000 documents per domain, covering Culture, Medical, Politics, Religion, Sports, and Technology.

- Data Source:
The original documents are derived from the SANAD Dataset.https://www.kaggle.com/datasets/haithemhermessi/sanad-dataset/data

- Summaries:
Generated using state-of-the-art AI summarization models, ensuring high-quality, concise representations of the original texts.

Potential Use Cases:

  • Training and evaluation of Arabic text summarization models.
  • Domain-specific summarization research and analysis.
  • Fine-tuning and benchmarking NLP models for Arabic-language applications.

ArabSummaries bridges the gap in Arabic summarization resources, providing a diverse and comprehensive dataset for researchers and developers.

Tables

Arabsummaries

@kaggle.tahaalselwii_arabsummaries.arabsummaries
  • 15.29 MB
  • 7000 rows
  • 5 columns
Loading...

CREATE TABLE arabsummaries (
  "text" VARCHAR,
  "summary" VARCHAR,
  "type" VARCHAR,
  "text_length" BIGINT,
  "summary_length" BIGINT
);

Share link

Anyone who has the link will be able to view this.