ArabSummaries
ملخصات عربية: قاعدة بيانات للنصوص العربية مع تلخيصاتها
@kaggle.tahaalselwii_arabsummaries
ملخصات عربية: قاعدة بيانات للنصوص العربية مع تلخيصاتها
@kaggle.tahaalselwii_arabsummaries
ArabSummaries is a curated dataset designed for Arabic text summarization tasks. It consists of 7,000 Arabic documents spanning various domains, paired with their corresponding summaries generated using advanced AI models. The dataset provides a rich resource for natural language processing (NLP) research, especially in the field of text summarization.
Key Features:
- Domain Coverage:
1,000 documents per domain, covering Culture, Medical, Politics, Religion, Sports, and Technology.
- Data Source:
The original documents are derived from the SANAD Dataset.https://www.kaggle.com/datasets/haithemhermessi/sanad-dataset/data
- Summaries:
Generated using state-of-the-art AI summarization models, ensuring high-quality, concise representations of the original texts.
Potential Use Cases:
ArabSummaries bridges the gap in Arabic summarization resources, providing a diverse and comprehensive dataset for researchers and developers.
CREATE TABLE arabsummaries (
"text" VARCHAR,
"summary" VARCHAR,
"type" VARCHAR,
"text_length" BIGINT,
"summary_length" BIGINT
);Anyone who has the link will be able to view this.