Baselight
Sign In
kaggle

High-Quality Financial News Dataset For NLP Tasks

@kaggle.sayelabualigah_high_quality_financial_news_for_nlp_tasks

Loading...
Loading...

Financial Dataset for SFT Task

High-Quality Financial News Dataset

Description

This repository contains a meticulously scraped dataset from various financial websites. The data extraction process ensures high-quality and accurate text, including content from both the websites and their embedded PDFs.

Dataset Features

  • Date: The date of the announcement.
  • Subject: The subject of the financial news.
  • Content: The full content of the announcement, including text from the website and PDFs.

Additional Processed Fields

We applied the advanced Mixtral 7X8 model to generate the following additional fields:

  • ParaphrasedSubject: A paraphrased version of the original subject.
  • CompactedSummary: A concise summary limited to 1.5 lines.
  • DetailedSummary: A detailed summary of the content.
  • Impact: The impact of the announcement, summarized in 2 lines.

Methodology

The prompt used to generate the additional fields was highly effective, thanks to extensive discussions and collaboration with the Mistral AI team. This ensures that the dataset provides valuable insights and is ready for further analysis and model training.

Usage

This dataset can be used for various applications, including but not limited to:

  • Financial news analysis
  • Abstractive/Exctractive Summarization tasks
  • Machine learning model training
  • Natural language processing tasks

Related Datasets

Share link

Anyone who has the link will be able to view this.