Dataset represents a collection of news articles, mostly from 2022-2023 years, specifically focused on cryptocurrencies. It contains information related to 140 different cryptocurrencies (BTC, ETH, USDT, BNB, USDC, XRP, ADA, DOGE, LTC, SOL, TRX, DOT, MATIC, SHIB, BUSD, XMR, ATOM, NEAR, ALGO, EOS, LUNC, LUNA, DASH etc.).
The dataset contains important columns including cryptocurrency, URL, title, text, date, predicted labels, sentiment, polarity, and subjectivity.
Notably, the dataset provides labels obtained through a machine learning algorithm that categorize the news articles as potentially related (1) or unrelated (0) to criminal activities (ML model - https://github.com/LaraCodesmith/crypto_news_illicit). This information enables researchers to investigate the prevalence of illicit practices within the cryptocurrency domain.
Sentiment, polarity and subjectivity criteria were generated using TextBlob.
This example dataset was collected using a news web-scrapper https://github.com/LaraCodesmith/crypto-news-scraper.