Baselight

DBPedia Classes

Hierarchical Taxonomy of Wikipedia article classes

@kaggle.danofer_dbpedia_classes

Loading...
Loading...

About this Dataset

DBPedia Classes

DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in Wikipedia.
This is an extract of the data (after cleaning, kernel included) that provides taxonomic, hierarchical categories ("classes") for 342,782 wikipedia articles. There are 3 levels, with 9, 70 and 219 classes respectively.
A version of this dataset is a popular baseline for NLP/text classification tasks. This version of the dataset is much tougher, especially if the L2/L3 levels are used as the targets.

This is an excellent benchmark for hierarchical multiclass/multilabel text classification.
Some example approaches are included as code snippets.

Content

DBPedia dataset with multiple levels of hierarchy/classes, as a multiclass dataset.
Original DBPedia ontology (triplets data): https://wiki.dbpedia.org/develop/datasets
Listing of the class tree/taxonomy: http://mappings.dbpedia.org/server/ontology/classes/

Acknowledgements

Thanks to the Wikimedia foundation for creating Wikipedia, DBPedia and associated open-data goodness!

Thanks to my colleagues at Sparkbeyond (https://www.sparkbeyond.com) for pointing me towards the taxonomical version of this dataset (as opposed to the classic 14 class version)

Inspiration

Tables

Dbpedia Test

@kaggle.danofer_dbpedia_classes.dbpedia_test
  • 23.8 MB
  • 60794 rows
  • 4 columns
Loading...

CREATE TABLE dbpedia_test (
  "text" VARCHAR,
  "l1" VARCHAR,
  "l2" VARCHAR,
  "l3" VARCHAR
);

Dbpedia Train

@kaggle.danofer_dbpedia_classes.dbpedia_train
  • 94.3 MB
  • 240942 rows
  • 4 columns
Loading...

CREATE TABLE dbpedia_train (
  "text" VARCHAR,
  "l1" VARCHAR,
  "l2" VARCHAR,
  "l3" VARCHAR
);

Dbpedia Val

@kaggle.danofer_dbpedia_classes.dbpedia_val
  • 14.03 MB
  • 36003 rows
  • 4 columns
Loading...

CREATE TABLE dbpedia_val (
  "text" VARCHAR,
  "l1" VARCHAR,
  "l2" VARCHAR,
  "l3" VARCHAR
);

Dbp Wiki Data

@kaggle.danofer_dbpedia_classes.dbp_wiki_data
  • 124.18 MB
  • 342781 rows
  • 6 columns
Loading...

CREATE TABLE dbp_wiki_data (
  "text" VARCHAR,
  "l1" VARCHAR,
  "l2" VARCHAR,
  "l3" VARCHAR,
  "wiki_name" VARCHAR,
  "word_count" BIGINT
);

Share link

Anyone who has the link will be able to view this.