ArabSummaries is a curated dataset designed for Arabic text summarization tasks. It consists of 7,000 Arabic documents spanning various domains, paired with their corresponding summaries generated using advanced AI models. The dataset provides a rich resource for natural language processing (NLP) research, especially in the field of text summarization.
Key Features:
- Domain Coverage:
1,000 documents per domain, covering Culture, Medical, Politics, Religion, Sports, and Technology.
- Data Source:
The original documents are derived from the SANAD Dataset.https://www.kaggle.com/datasets/haithemhermessi/sanad-dataset/data
- Summaries:
Generated using state-of-the-art AI summarization models, ensuring high-quality, concise representations of the original texts.
Potential Use Cases:
- Training and evaluation of Arabic text summarization models.
- Domain-specific summarization research and analysis.
- Fine-tuning and benchmarking NLP models for Arabic-language applications.
ArabSummaries bridges the gap in Arabic summarization resources, providing a diverse and comprehensive dataset for researchers and developers.