EDGAR-FOOD: A global emission inventory of GHGs from the food systems

Context

EDGAR is the Emissions Database for Global Atmospheric Research and EDGAR-FOOD is a global emission inventory of GHGs from the food systems. I found it very useful when conducting exploratory research for a project analyzing greenhouse gas emissions across the lifecycle of food production and consumption. Check out my project notebook GlobalFoodEmissions and more in-depth analysis and insights in the interactive dashboard I created on Tableau public, Low Hanging Fruit: Accessible opportunities to reduce emissions and grow sustainable food systems.

The data in the .csv file I've uploaded is the result of data cleaning and transformation involving pivoting the data from a tab, Supplementary Data’ labeled ‘Table S7-FOOD semi by sector’ from the master EDGAR-FOOD file available on the EDGAR link below. The transformation makes this data much more usable for exploratory data analysis. There are over 200 countries with GHG emissions from the food sector; each country has a distinct value for GHG type and Food System Stage within each year as was recorded from 1990 to 2015.

Data cleaning and transformation

Notes on cleaning and general transformation

GHG values: kt to t — in table S7, a note at the top says “Supplementary Table 7 - GHG emissions from food system (each GHG is expressed in kt CO2eq, GWP-100 AR5) by sector and country” while my other data sources expressed GHGs in tonnes of CO2eq. I transformed all values from kilotons to tons using simple multiplication.

Year: pivoting — I created a column to contain all the years from 1990 to 2015, while the original file I downloaded from EDGAR-FOOD contains 26 'year' columns, one for each year from 1990 to 2015. Except in the EDGAR-FOOD 'Table S7', data for each year is listed as ‘Y_1990’ to ‘’Y_1991’ all the way to ‘Y_2015’. I removed the ‘Y_’ part, transformed and pivoted the data so that ‘year’ is a column name and each year is a value within that column. ('1990' '1991' ... '2015').

Unit: I added the Unit column with ‘metric tons CO2e (GWP-100, AR5)’ so that I didn’t need to have an extraneous note at the top that explained what the values were measuring and why. So, let’s break it down. GWP-100 (AR5) refers to…
GWP = Global Warming Potential (GWP) value
GWP-100 = GWP over 100 years
AR5 = The GWP-100 value as calculated in the AR5, shorthand for the 5th Assessment Report from the IPCC. There are many acronyms, but if you don’t know what the IPCC is, go find out and come back later :-)

GHG names: cleaning — In S7 there is a column named ‘Substance’ with four values: ‘GWP_100_CH4’, ‘GWP_100_CO2’, ‘GWP_100_F-gases’, ‘GWP_100_N2O’. If you don’t know what these values are referring to, they’re the molecular name for the most common greenhouse gases that turn up in our atmosphere as a result of human activities. You know what the first part of each value ‘GWP_100’ was from reading my documentation above (see: Unit), so I removed that duplicative info from being a part of this column as well. So, the value ’GWP_100_CH4’ is ‘Methane (CH4)’ in my dataset; ‘GWP_100_CO2’ is now ‘Carbon dioxide (CO2)’ etc.

Country names Country name and Country code should be 1:1. In initial data exploration I realized that the dataset from EDGAR-FOOD had listed some country names inconsistently, leading to duplicative values. I realized this as a result of mapping the country code values using the ‘Country_code_A3’ column, and finding that some country code values had multiple country names attached to them. A few examples...
Country code BOL = Bolivia. In ‘Table S7’ the ’Name’ of the country for the country code BOL is both ‘Bolivia (Plurinational State of)’ and ‘’Bolivia’. I changed all to the most commonly used version, ‘Bolivia’.
CHN is both ‘China, Mainland’ and ‘China’
CIV is both ‘CÃ´te d'Ivoire’ and ‘Cote d'Ivoire’
COD is both ‘Democratic Republic of the Congo’ and ‘Congo_the Democratic Republic of the’
CZE is both ‘Czechia’ and ‘Czech Republic’
FSM is both ‘Micronesia, Federated States of’ and ‘Micronesia (Federated States of)’
And that’s just to name a few! I’ve found this to be a not-uncommon occurrence when working with global data over multiple years — country names can sometimes add or lose a comma or accent through data processing, sometimes a country name actually changes, sometimes it’s entered differently based on the locale where the data entry is occurring. For this reason and more, I always prefer to use the country codes, and ensure that’s matched to a single name value (as opposed to the other way around, e.g. create new country codes for each duplicative spelling). For the purposes of mapping data this is crucial. Simplicity and consistency FTW!

Country code: Duplicative — since I wasn’t using dataset for mapping purposes, I removed the column ‘Country_code_A3’ after ensuring there was a 1:1 match to Country Code and Name.

Country group and type: unnecessary — I removed the column that’s in S7 with the name ‘C_group_IM24_sh’ because it was not pertinent to my analysis. (It’s a country-grouping column with values like ’11:_OECD_Europe’). Additionally, I removed the column named ‘dev_country’ which is there to indicate if a country is ‘developing’. In S7 there were 3 possible values in this column: ‘I’ ‘D’ and ‘0’.

I also added a column to indicate the order of food system stages, and cleaned the 'Stage' names to match my other datasets — for example, I notated the first stage as 'Land' while in S7 it is 'LULUC (Production)', which I found helpful for maintaining the clarity, consistency, and order of stages in a line graph (see the Code connected to this dataset).

Content

From the website for EDGAR-FOOD where I downloaded the data files to analyze for a project exploring opportunities to reduce GHGs and grow sustainable food systems:

Introduction
The global food system involves all economic sectors and makes a significant contribution to total anthropogenic greenhouse gas (GHGs) emissions. Understanding the various components is a necessary precursor to the design and implementation of actionable and efficient mitigation measures for the system. Food systems' GHG emissions are far more than emissions from the land-based sector (agriculture and food-relevant emissions from land use and land use change). Food needs to be farmed, harvested or caught, transported, processed, packaged, distributed and cooked, and the residuals disposed of. The energy required for all those processes needs to be produced and made available at the right time and location.

EDGAR-FOOD has been developed to aid the understanding of the activities underlying the energy demand and use, agriculture and land use change emissions associated with the production, distribution, consumption and disposal of food through the various stages and sectors of the composite global food system. These data were complemented with data from the FAOSTAT database on GHG emissions from land use related to agriculture (FAO, 2020). EDGAR-FOOD represents the first database consistently covering each stage of the food chain for all countries with yearly frequency for the period 1990-2015. Details regarding the methodology applied are available in Crippa et al. (2021).

Sources and references

Crippa, M., Solazzo, E., Guizzardi, D. et al. Food systems are responsible for a third of global anthropogenic GHG emissions. Nat Food (2021). doi:10.1038/s43016-021-00225-9.

Dataset: Crippa, M., Solazzo, E., Guizzardi, D., Monforti-Ferrario, F., Tubiello, F.N. and Leip, A. EDGAR-FOOD data. Figshare, doi:10.6084/m9.figshare.13476666 (2021).

Related Datasets

FoodEmissions

@kaggle
GHG Emissions From Food (Crippa Et Al. 2021)

@owid
Global Forest Resources Assessment

@owid
Long-term Food And Agriculture Trends

@owid
Nuclear Weapons Proliferation

@owid
Food Expenditure (USDA/ERS, 2023)

@owid

FoodEmissions

GHG Emissions From Food (Crippa Et Al. 2021)

Global Forest Resources Assessment

Long-term Food And Agriculture Trends

Nuclear Weapons Proliferation

Food Expenditure (USDA/ERS, 2023)