Stock Market News Data in Portuguese
The Financial Phrase Bank is a dataset originally developed for the paper Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts, made available by researchers from Aalto University and the Indian Institute of Management. The dataset allows for a useful benchmark for fine-tuning Language Models on Sentiment Analysis Tasks.
As the amount of annotated text data (especially about the financial market) in Portuguese, I went ahead and translated the entire dataset for people to try out Sentiment Analysis tasks in Portuguese.
Content
The dataset originally contains about 4840
manually annotated financial news in English and consists of three columns:
y
: the annotated label for the sentiment of the news text (neutral, positive, negative);
text
: the original text for each record;
text_pt
: the translated and that I manually validated version of the original record;
Acknowledgments
[1] Malo, P., Sinha, A., Korhonen, P., Wallenius, J., & Takala, P. (2014). Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65(4), 782-796.
Photo by Markus Winkler on Unsplash