Combining productivity and influence in a blog community

Context

This dataset is a crawl of the blog posts of the Techcrunch technology blog which was conducted on April of 2010. It was used as an experimental dataset for the requirements of the research paper:

L. Akritidis, D. Katsaros, P. Bozanis, "Identifying the Productive and Influential Bloggers in a Community", IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews, vol. 41, no 5, pp. 759-764, 2011.

The primary goal of this dataset was to provide an active community for the identification of members who are both productive and influential. However, since the full text of the posts is present, it can also be used for a wide variety of text mining tasks, such as sentiment analysis, opinion retrieval, and NLP. There is also a (My)SQL version that is available from here.

The researchers who used, or will use this dataset, are kindly asked to cite the aforementioned article in their work/s.

If you found this dataset useful, you may also check my TUAW dataset for identifying influential bloggers.

Content

The repository consisfts of four files:

A list of the bloggers of Techcrunch, along with their (unique) IDs and some statistics
A database of the retrieved blog posts,
The incoming links to the blog posts of Techcrunch, automatically retrieved by using the Googl Blog Search service.
The submitted comments to the posts.

Precise descriptions and record counts for each file are provided below.

Related Datasets

Technothepig Tweets

@kaggle
Notable AI Systems By Researcher Affiliation

@owid
Wars On Territory

@owid
SFC2014 - REACT EU Overview Allocation Vs Decided

@esifunds
Large-scale AI Models By Organization

@owid
Nuclear Weapons Proliferation

@owid

Technothepig Tweets

Notable AI Systems By Researcher Affiliation

Wars On Territory

SFC2014 - REACT EU Overview Allocation Vs Decided

Large-scale AI Models By Organization

Nuclear Weapons Proliferation