Baselight

Global Food Nutrition Database(10k Products)

9,804 Global Products with Nutri-Score,Complete Macronutrients and Ingredients

@kaggle.kanchana1990_global_food_nutrition_database10k_products

Loading...
Loading...

About this Dataset

Global Food Nutrition Database(10k Products)

9,804 Global Products with Nutri-Score, Eco-Score, NOVA Groups, Complete Macronutrients & Ingredients

Dataset Overview

This dataset contains a curated, high-quality snapshot of 9,804 global food products, aggregated from the Open Food Facts database as of December 2025. It provides a comprehensive multi-dimensional view of food health through three validated scoring systems:

  • Nutri-Score (A-E): Official European nutritional quality rating
  • Eco-Score (A+ to F): Environmental sustainability assessment
  • NOVA Classification (1-4): Food processing intensity scale

Key Statistics

  • Total Rows: 9,804
  • Total Columns: 20
  • Nutri-Score Coverage: 99.97% (9,801 products)
  • Eco-Score Coverage: 99.98% (9,802 products)
  • NOVA Group Coverage: 91.38% (8,959 products)
  • Ingredients Text Coverage: 95.94% (9,406 products)
  • Core Macronutrients Coverage: 93-95% (energy, fat, carbs, protein, sugars)
  • Unique Brands: 3,501 (Nestlé, Danone, Heinz, Hacendado, Tesco, Carrefour, etc.)
  • Geographic Scope: Predominantly European (France 20%, UK 6.5%, Spain 1.9%)

Data Science Applications

  1. Nutri-Score Prediction (Multi-Class Classification)
    Predict nutritional quality grades (A-E) based on macronutrient composition (fat_100g, sugars_100g, saturated-fat_100g, salt_100g).

  2. Ultra-Processed Food Detection (Binary Classification)
    Identify NOVA Group 4 products (53.5% of dataset) using ingredient lists and nutritional patterns.

  3. Ingredient NLP & Allergen Detection
    Extract allergens, E-numbers, and food additives from ingredients_text using Named Entity Recognition.

  4. Environmental Impact Clustering
    Group products by ecoscore_grade and nova_group to identify sustainable food categories.

  5. Calorie Regression
    Predict energy-kcal_100g from macronutrient breakdown.

  6. Brand Health Profiling
    Rank 3,501 brands by average Nutri-Score and percentage of ultra-processed products.

  7. Category-Based Benchmarking
    Analyze nutritional performance across 7,408 unique food categories.

Column Descriptors

Identification & Metadata (5 columns)

Column Type Coverage Description
code int64 100.00% Unique barcode/EAN identifier (e.g., 6111035000430). Primary key.
product_name object 97.57% Commercial product name as displayed on packaging.
brands object 96.19% Manufacturer/Brand name (e.g., Nestlé, Danone, Heinz, Hacendado, Tesco).
countries object 99.98% Countries where product is sold (may include multiple, e.g., "France, United Kingdom").
quantity object 89.92% Package size/weight (e.g., "500 ml", "100 g", "2 L"). Mixed formatting.

Category & Labels (2 columns)

Column Type Coverage Description
categories object 98.77% Hierarchical product categories (comma-separated), e.g., "Beverages,Waters,Mineral waters". 7,408 unique combinations.
labels object 77.03% Certification/Quality labels (e.g., "Organic", "Fair Trade", "Green Dot"). Multiple labels per product. 5,687 unique combinations.

Health & Environmental Scores (3 columns)

Column Type Coverage Description
nutriscore_grade object 99.97% Official Nutri-Score (A=Healthiest to E=Least Healthy). Includes "not-applicable" (2.98%) and "unknown" (8.20%). Distribution: A=16.11%, B=12.70%, C=22.48%, D=18.23%, E=19.28%.
ecoscore_grade object 99.98% Environmental impact score (A+=Best to F=Worst). Considers carbon footprint, water use, packaging recyclability. Distribution: A+=5.41%, A=15.46%, B=18.59%, C=13.39%, D=14.29%, E=10.18%, F=2.52%.
nova_group float64 91.38% NOVA processing classification (1=Unprocessed, 2=Processed ingredients, 3=Processed foods, 4=Ultra-processed). Distribution: Group 1=11.02%, Group 2=4.15%, Group 3=22.69%, Group 4=53.52%.

Ingredient Data (1 column)

Column Type Coverage Description
ingredients_text object 95.94% Full ingredient list (unstructured text). Multi-language (English, French, Spanish, Arabic). Contains allergens, E-numbers, additives. Perfect for NLP/NER tasks.

Macronutrients per 100g (9 columns)

Column Type Coverage Mean Median Description
energy-kcal_100g float64 94.66% 292.75 280.0 Energy in kilocalories per 100g. Waters = 0 kcal, oils ≈900 kcal.
fat_100g float64 94.95% 15.32 7.02 Total fat content per 100g. Low-fat: <3g, high-fat: >17.5g (EU thresholds).
saturated-fat_100g float64 93.23% 5.25 1.50 Saturated fatty acids per 100g. Key penalty factor in Nutri-Score calculation.
carbohydrates_100g float64 94.70% 30.19 18.0 Total carbohydrates per 100g (sugars + fiber + starch).
sugars_100g float64 93.51% 11.98 4.20 Total sugars per 100g (glucose, fructose, lactose, sucrose). High-sugar threshold: >22.5g.
fiber_100g float64 69.74% 4.09* 2.90 Dietary fiber per 100g. *Note: 2 outlier products exceed 100g/100g (data quality issue). Mean calculated after excluding outliers >50g.
proteins_100g float64 94.90% 7.33 6.20 Protein content per 100g. High-protein products: >20g/100g.
salt_100g float64 93.24% 1.24 0.39 Sodium chloride (salt) per 100g. WHO guideline: <5g/day total intake.
sodium_100g float64 93.24% 0.50 0.16 Elemental sodium per 100g. Conversion: Salt (g) = Sodium (g) × 2.5.

Ethically Mined Data

  • Source: Open Food Facts (https://world.openfoodfacts.org)
  • License: ODbL (Open Database License) – Free to share and adapt with attribution
  • Collection Method: Community-contributed data + Official manufacturer information
  • API Retrieval: December 2025 snapshot via Open Food Facts API v2
  • Privacy: No personal data – Product-level information only
  • Transparency: All data traceable to Open Food Facts database with full revision history

Acknowledgements

  • Data Source: Open Food Facts collaborative database
  • Curator: Kanchana Karunarathna (Kanchana1990)
  • Community: 300,000+ Open Food Facts contributors worldwide
  • Scoring Systems: Santé Publique France (Nutri-Score), Open Food Facts Foundation (Eco-Score), NOVA Classification (University of São Paulo)

Tables

Openfoodfacts Nutrition Final 2025–12–10

@kaggle.kanchana1990_global_food_nutrition_database10k_products.openfoodfacts_nutrition_final_2025_12_10
  • 2.51 MB
  • 9,804 rows
  • 20 columns
Loading...
CREATE TABLE openfoodfacts_nutrition_final_2025_12_10 (
  "code" BIGINT,
  "product_name" VARCHAR,
  "brands" VARCHAR,
  "countries" VARCHAR,
  "quantity" VARCHAR,
  "categories" VARCHAR,
  "labels" VARCHAR,
  "nutriscore_grade" VARCHAR,
  "ecoscore_grade" VARCHAR,
  "nova_group" DOUBLE,
  "ingredients_text" VARCHAR,
  "energy_kcal_100g" DOUBLE,
  "fat_100g" DOUBLE,
  "saturated_fat_100g" DOUBLE,
  "carbohydrates_100g" DOUBLE,
  "sugars_100g" DOUBLE,
  "fiber_100g" DOUBLE,
  "proteins_100g" DOUBLE,
  "salt_100g" DOUBLE,
  "sodium_100g" DOUBLE
);

Share link

Anyone who has the link will be able to view this.