Baselight

Identifying Influential Bloggers: Techcrunch

Combining productivity and influence in a blog community

@kaggle.lakritidis_identifying_influential_bloggers_techcrunch

Loading...
Loading...

About this Dataset

Identifying Influential Bloggers: Techcrunch

Context

This dataset is a crawl of the blog posts of the Techcrunch technology blog which was conducted on April of 2010. It was used as an experimental dataset for the requirements of the research paper:

L. Akritidis, D. Katsaros, P. Bozanis, "Identifying the Productive and Influential Bloggers in a Community", IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews, vol. 41, no 5, pp. 759-764, 2011.

The primary goal of this dataset was to provide an active community for the identification of members who are both productive and influential. However, since the full text of the posts is present, it can also be used for a wide variety of text mining tasks, such as sentiment analysis, opinion retrieval, and NLP. There is also a (My)SQL version that is available from here.

The researchers who used, or will use this dataset, are kindly asked to cite the aforementioned article in their work/s.

If you found this dataset useful, you may also check my TUAW dataset for identifying influential bloggers.

Content

The repository consisfts of four files:

  • A list of the bloggers of Techcrunch, along with their (unique) IDs and some statistics
  • A database of the retrieved blog posts,
  • The incoming links to the blog posts of Techcrunch, automatically retrieved by using the Googl Blog Search service.
  • The submitted comments to the posts.

Precise descriptions and record counts for each file are provided below.

Tables

Authors

@kaggle.lakritidis_identifying_influential_bloggers_techcrunch.authors
  • 9.2 kB
  • 106 rows
  • 6 columns
Loading...
CREATE TABLE authors (
  "n_1" BIGINT  -- 1,
  "jason_kincaid" VARCHAR,
  "n_43" BIGINT  -- 43,
  "n_43_1" BIGINT  -- 43.1,
  "n_4_257910" DOUBLE  -- 4.257910,
  "n_5_858018" DOUBLE  -- 5.858018
);

Comments

@kaggle.lakritidis_identifying_influential_bloggers_techcrunch.comments
  • 135.19 MB
  • 746,560 rows
  • 6 columns
Loading...
CREATE TABLE comments (
  "n_1" BIGINT  -- 1,
  "n_1_1" BIGINT  -- 1.1,
  "seemed_to_work_fine" VARCHAR  -- Seemed To Work Fine.,
  "bj_cook" VARCHAR,
  "n_2010_04_01" TIMESTAMP  -- 2010-04-01,
  "n_1_2" BIGINT  -- 1.2
);
@kaggle.lakritidis_identifying_influential_bloggers_techcrunch.inlinks
  • 14.66 MB
  • 193,807 rows
  • 6 columns
Loading...
CREATE TABLE inlinks (
  "n_319895" BIGINT  -- 319895,
  "n_19464" BIGINT  -- 19464,
  "popular_blogs_and_their_first_post" VARCHAR,
  "alex" VARCHAR,
  "n_2009_06_17" TIMESTAMP  -- 2009-06-17,
  "http_feeds_notaniche_com_r_afrison_3_hxarxfintem" VARCHAR  -- Http://feeds.notaniche.com/~r/Afrison/~3/hXARXfINteM/
);

Posts

@kaggle.lakritidis_identifying_influential_bloggers_techcrunch.posts
  • 28.57 MB
  • 19,463 rows
  • 16 columns
Loading...
CREATE TABLE posts (
  "n_1" BIGINT  -- 1,
  "we_just_tested_twitter_8217_s_anywhere_platform_screenshots" VARCHAR  -- We Just Tested Twitter\u0026#8217;s @anywhere Platform (Screenshots),
  "jason_kincaid" VARCHAR,
  "n_1_1" BIGINT  -- 1.1,
  "n_14" BIGINT  -- 14,
  "during_his_keynote_at_sxsw_last_month_twitter_ceo_evan_dc85c240" VARCHAR  -- During His Keynote At SXSW Last Month, Twitter CEO Evan Wiliams Announced An Upcoming New Platform Called @anywhere, Which Would Allow Third Party Sites To Integrate Twitter Features (he Also Showed Off Some Of The Partners Who Would Be Featuring The Platform, Which You Can See In The Image At Right). Twitter Didn\u0026#8217;t Give A Launch Date For When Sites Would Start Integrating The New Platform, But It Looks Like We\u0026#8217;ve Just Come Across The First Site To Feature @anywhere. Meet Eggboiling.com.The Site, Which Will Almost Certainly Be Pulled Down Soon After This Post Is Published, Is Clearly A Testing Environment For @anywhere, But It\u0026#8217;s Currently Open To The Public. Update: Twitter Has Taken The Site Down. It Features The Following (all Shown In The Screenshots Below): Various Variable States; A Button To \u0026#8216;Connect With Twitter\u0026#8217;; Buttons To Follow Twitter Users @jack, @biz, And @ev; A Test Hovercard That Allows Me To See @wendyverse\u0026#8217;s Latest Tweets And Follow Counts At A Glance, And A Test Box That Lets Me Tweet. It Isn\u0026#8217;t Particularly Easy On The Eyes, But It Works Well Enough.Hitting \u0026#8220;Connect To Twitter\u0026#8221; Pulled In My Twitter Profile Photo And Gave Me The Option To Log Out. Clicking On Each Of The \u0026#8216;follow\u0026#8217; Buttons Appropriately Changed The Status From \u0026#8220;Follow @jack\u0026#8221; To \u0026#8220;Following @jack\u0026#8221; The Next Time I Refreshed The Page. (it Just Showed \u0026#8216;pending\u0026#8217; Until I Refreshed). And Sending A Tweet From The Tweet Box Worked Properly (it Says That My Tweet Was Sent Via Egg Boiling).If You\u0026#8217;re Fast, You May Be Able To Try It Out For Yourself.Before Logging InOAuth To LoginConnected To Twitter (but Before Image/logout Link Have Loaded)Image/logout Link Appear After Refreshing The PageTesting The HovercardAfter Clicking The \u0026#8216;more\u0026#8217; Button On The HovercardThanks To Spencer Transier For The Tip!CrunchBase Information,
  "http_techcrunch_com_2010_04_01_we_just_tested_twitters_711e7eeb" VARCHAR  -- Http://techcrunch.com/2010/04/01/we-just-tested-twitters-anywhere-platform-screenshots/,
  "n_2010_04_01" TIMESTAMP  -- 2010-04-01,
  "n_0" BIGINT  -- 0,
  "n_14_1" BIGINT  -- 14.1,
  "n_314" BIGINT  -- 314,
  "n_223" BIGINT  -- 223,
  "n_4_130293" DOUBLE  -- 4.130293,
  "n_5_686099" DOUBLE  -- 5.686099,
  "n_0_00000" DOUBLE  -- 0.00000,
  "n_0_00000_1" DOUBLE  -- 0.00000.1
);

Share link

Anyone who has the link will be able to view this.