Concept dictionary and entity (geo) reference tables for joining with per-indicator datasets
Dataset Description
What’s in @gapminder and how it’s organized
Gapminder data is split into two kinds of datasets you combine via simple joins:
-
@gapminder.metadata ← (this dataset)
- concepts (
@gapminder.metadata.concepts): one row per indicator, identified by a concept id.
Use it to turn ids into human-friendly name, description (and sometimes unit/format, source, tags). - entities (
@gapminder.metadata.entities_*): reference tables for codes like geo (countries/areas)
and groupings (regions, income groups, etc.). Key tables you’ll use:entities_geo_country(key:geo) — country/area names and attributes. Often includes columns like
world_4region,world_6region,income_groups, which point to group tables below.- Group tables keyed by a group code, e.g.:
entities_geo_world_4region,entities_geo_world_6region,entities_geo_income_groups,
entities_geo_income_3groups,entities_geo_unicef_region,entities_geo_unhcr_region,
entities_geo_g77_and_oecd_countries,entities_geo_landlocked,entities_geo_global, etc. - Non-geo entities such as
entities_gender, and topical vocabularies likeentities_tag.
- concepts (
-
Per-indicator datasets (one per concept in
@gapminder.metadata.concepts)- Named after the concept id, e.g.
@gapminder.age_at_1st_marriage_women,
@gapminder.adults_with_hiv_percent_age_15_49. - Each typically has a table named
<concept_id>_by_<dimensions>(most often<concept_id>_by_geo_time)
with columns geo, time, and a value column named exactly as the concept id.
- Named after the concept id, e.g.
─────────────────────────────────────────────────────────────────────────────
HOW TO USE
-
Pick an indicator
- Search @gapminder.metadata.concepts for your topic/keywords; note the
conceptid. - Read
name,description(andunit/formatif present) to confirm it matches your intent.
- Search @gapminder.metadata.concepts for your topic/keywords; note the
-
Retrieve the datapoints
- Open the dataset named after the concept id, e.g. @gapminder.aged_25_54_unemployment_rate_percent.
- Query its main table (usually
<concept_id>_by_geo_time) to get values bygeoandtime.
-
Add readable labels and groupings
- Join countries: datapoints.
geo→ @gapminder.metadata.entities_geo_country.geo
to getname,name_short,name_long, and attributes likeworld_4region,income_groups, etc. - Join region labels: first take the region code you found on the country row
(e.g.,world_4region), then join to the matching region table
(@gapminder.metadata.entities_geo_world_4region onworld_4region) to get region names/colors/shapes.
- Join countries: datapoints.
─────────────────────────────────────────────────────────────────────────────
COMMON QUERIES
-- Concept metadata (labels/units)
SELECT name, description
FROM @gapminder.metadata.concepts
WHERE concept = '<concept_id>';
-- Country names and region membership
SELECT d.time,
c.name AS country,
c.world_4region AS region_code,
r.name AS region_name,
d.<concept_id> AS value
FROM @gapminder.<concept_id>.<concept_id>_by_geo_time d
JOIN @gapminder.metadata.entities_geo_country c ON d.geo = c.country
LEFT JOIN @gapminder.metadata.entities_geo_world_4region r ON c.world_4region = r.world_4region;
-- Income groups example
SELECT d.time, c.name AS country, ig.name AS income_group, d.<concept_id> AS value
FROM @gapminder.<concept_id>.<concept_id>_by_geo_time d
JOIN @gapminder.metadata.entities_geo_country c ON d.geo = c.country
LEFT JOIN @gapminder.metadata.entities_geo_income_groups ig ON c.income_groups = ig.income_groups;
─────────────────────────────────────────────────────────────────────────────
TIPS & CAVEATS
geocodes may include aggregates (regions/world); treat aggregates differently from countries.- Time is usually an integer
time(year). Some concepts may use other time forms—check the table schema. - Grouping can be derived via a two-step join: (country → group code) then (group code → group label table).
TL;DR: Use @gapminder.metadata.concepts to interpret indicators and @gapminder.metadata.entities_*
to label countries and groupings; then query the per-concept datasets (e.g., @gapminder.age_at_1st_marriage_women) for the actual values and join on geo (and group keys) as shown above.
Related Datasets
-
Precumputed Shapes For Mountain Chart
@gapminder
-
Income Per Person, Long Series
@gapminder
-
Personal Computers (total)
@gapminder
-
Fixed Line Subscribers (per 100 People)
@gapminder
-
GDP Total, Yearly Growth
@gapminder