Baselight

India News Headlines Dataset

Twenty One years of headlines focusing on India

@kaggle.therohk_india_headlines_news_dataset

About this Dataset

India News Headlines Dataset

Context

This news dataset is a persistent historical archive of noteable events in the Indian subcontinent from start-2001 to q2-2023, recorded in real-time by the journalists of India. It contains approximately 3.8 million events published by Times of India.

A majority of the data is focusing on Indian local news including national, city level and entertainment.

Prepared by Rohit Kulkarni

Content

Time Range : Start Date: 2001-01-01 ; End Date: 2023-06-30

CSV Rows: 3,876,557

Columns:

  1. publish_date: Date of the article being published online in yyyyMMdd format
  2. headline_category: Category of the headline, ascii, dot delimited, lowercase values
  3. headline_text: Text of the Headline in English, only ascii characters

Inspiration

Times Group as a news agency, reaches out a very wide audience across Asia and drawfs every other agency in the quantity of English articles published per day. Due to the heavy daily volume (avg. 600 articles) over multiple years, this data offers a deep insight into Indian society, its priorities, events, issues and talking points and how they have unfolded over time.

It is possible to chop this dataset into a smaller piece based on one or more facets.

  • Time Range: Headlines during 2006 Mumbai bombings, 2014 election, ongoing health crisis
  • One or more Categories: like Citywise, Bollywood, ICC updates, Magazine, Middle East
  • One or more Keywords: like crime or ecology related tokens, names of political parties, celebrities, corporations.

Similar news datasets exploring other attributes, countries and topics can be seen on my profile.