Dataset: White Wine Quality

About this Dataset

White Wine Quality

Description

This dataset contains detailed information about the chemical properties and quality ratings of white wines. It is intended for use in predictive modeling and data analysis. Derived from the work of Cortez et al. (2009), the dataset explores wine quality through data mining of physicochemical properties, making it suitable for both regression and classification tasks in machine learning.

Features

Fixed Acidity: Amount of fixed acids in the wine (g/L).
Volatile Acidity: Amount of volatile acids in the wine (g/L).
Citric Acid: Amount of citric acid in the wine (g/L).
Residual Sugar: Amount of residual sugar in the wine (g/L).
Chlorides: Amount of chlorides in the wine (g/L).
Free Sulfur Dioxide: Amount of free sulfur dioxide in the wine (mg/L).
Total Sulfur Dioxide: Total amount of sulfur dioxide in the wine (mg/L).
Density: Density of the wine (g/cm³).
pH: pH level of the wine.
Sulphates: Amount of sulphates in the wine (g/L).
Alcohol: Alcohol content in the wine (percentage).
Quality: Quality rating of the wine, on a scale from 0 (very bad) to 10 (very excellent).

Statistics

The dataset includes frequency distributions for each feature, with counts in specified ranges. For example:

Fixed Acidity: Ranges from 3.80 to 14.20.
Volatile Acidity: Ranges from 0.08 to 1.10.
Citric Acid: Ranges from 0.00 to 1.66.
Residual Sugar: Ranges from 0.60 to 65.80.
Chlorides: Ranges from 0.01 to 0.35.
Free Sulfur Dioxide: Ranges from 2.00 to 440.00.
Total Sulfur Dioxide: Ranges from 9.00 to 396.90.
Density: Ranges from 0.99 to 1.04.
pH: Ranges from 2.72 to 3.82.
Sulphates: Ranges from 0.22 to 1.08.
Alcohol: Ranges from 0.99 to 1.04.

Citation Request

This dataset is publicly available for research purposes. Please include the following citation if you use this dataset:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. (2009). Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4), 547-553. ISSN: 0167-9236. Available at: Elsevier | Pre-press (pdf) | bib

Sources

Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos, and Jose Reis (CVRVV), 2009.

Past Usage

The dataset has been used in the research by Cortez et al. (2009) to model wine preferences based on physicochemical properties. The study involved applying various data mining methods, including support vector machines, to predict wine quality. Metrics such as MAD and confusion matrix for fixed error tolerance were computed.

Relevant Information

The dataset includes data related to the Portuguese "Vinho Verde" wine. Due to privacy and logistical reasons, only physicochemical attributes and sensory quality ratings are available.
The dataset can be used for classification or regression tasks, with considerations for feature selection and outlier detection due to class imbalance.

Number of Instances

White wine: 4898 instances

Number of Attributes

11 input attributes + 1 output attribute (quality)

Attribute Information

For detailed information, refer to Cortez et al., 2009.

Missing Attribute Values

None

Tables

Winequality White

@kaggle.dakshbhalala_uci_white_wine.winequality_white

74.81 KB
4898 rows
12 columns


CREATE TABLE winequality_white (
  "fixed_acidity" DOUBLE,
  "volatile_acidity" DOUBLE,
  "citric_acid" DOUBLE,
  "residual_sugar" DOUBLE,
  "chlorides" DOUBLE,
  "free_sulfur_dioxide" DOUBLE,
  "total_sulfur_dioxide" DOUBLE,
  "density" DOUBLE,
  "ph" DOUBLE,
  "sulphates" DOUBLE,
  "alcohol" DOUBLE,
  "quality" BIGINT
);