Context
This dataset contains around 41 500 french news from 11/2018 to 03/2021 scraped on a famous financial media website.
For ease of use I’v add English translation (Helsinki-NLP/opus-mt-fr-en) and sentiment analysis (VADER)
Analysis
The picture below show the effect of covid crisis on news sentiment (Purple) and CAC40 (Blue).
We see clearly a link between the news sentiment and the stocks market
(Note : March 2020: The covid crisis break down / November 2020: Release of the pfizer vaccine)
CAC 40 next day open prediction (Works a little bit )
CAC 40 next 20 day prediction (Multi-day prediction gets imprecise results...)
Compare the sentiment of news title, text and text in the URL.
We can conclude that titles are often more dramatic to attract attention.
-> See linked notebook.
Content
-> FrenchNews.csv
This dataset contains around 41 500 french news from 11/2018 to 03/2021 scraped on a famous financial media website.
-> FrenchNewsDayConcat.csv
The dataset FrenchNews.csv with post process to sample it at day and compare with CAC40.
The number of news per day varies form day to day (see FrenchNewsDayConcat.csv param NbrNewsJour).
The amount of news increase with the time.
Inspiration
Could we use directly the text of the news scraped to make CAC40 prediction (NLP)?
Use of the news text to find the main stream subject of news during the time.
Feel free to play with the dataset