Baselight

High-Quality Financial News Dataset For NLP Tasks

Financial Dataset for SFT Task

@kaggle.sayelabualigah_high_quality_financial_news_for_nlp_tasks

Loading...
Loading...

About this Dataset

High-Quality Financial News Dataset For NLP Tasks

High-Quality Financial News Dataset

Description

This repository contains a meticulously scraped dataset from various financial websites. The data extraction process ensures high-quality and accurate text, including content from both the websites and their embedded PDFs.

Dataset Features

  • Date: The date of the announcement.
  • Subject: The subject of the financial news.
  • Content: The full content of the announcement, including text from the website and PDFs.

Additional Processed Fields

We applied the advanced Mixtral 7X8 model to generate the following additional fields:

  • ParaphrasedSubject: A paraphrased version of the original subject.
  • CompactedSummary: A concise summary limited to 1.5 lines.
  • DetailedSummary: A detailed summary of the content.
  • Impact: The impact of the announcement, summarized in 2 lines.

Methodology

The prompt used to generate the additional fields was highly effective, thanks to extensive discussions and collaboration with the Mistral AI team. This ensures that the dataset provides valuable insights and is ready for further analysis and model training.

Usage

This dataset can be used for various applications, including but not limited to:

  • Financial news analysis
  • Abstractive/Exctractive Summarization tasks
  • Machine learning model training
  • Natural language processing tasks

Tables

Dataset

@kaggle.sayelabualigah_high_quality_financial_news_for_nlp_tasks.dataset
  • 2.71 MB
  • 1839 rows
  • 7 columns
Loading...

CREATE TABLE dataset (
  "date" VARCHAR,
  "subject" VARCHAR,
  "content" VARCHAR,
  "paraphrasedsubject" VARCHAR,
  "compactedsummary" VARCHAR,
  "detailedsummary" VARCHAR,
  "impact" VARCHAR
);

Share link

Anyone who has the link will be able to view this.