Description
This dataset contains detailed information about the chemical properties and quality ratings of white wines. It is intended for use in predictive modeling and data analysis. Derived from the work of Cortez et al. (2009), the dataset explores wine quality through data mining of physicochemical properties, making it suitable for both regression and classification tasks in machine learning.
Features
- Fixed Acidity: Amount of fixed acids in the wine (g/L).
- Volatile Acidity: Amount of volatile acids in the wine (g/L).
- Citric Acid: Amount of citric acid in the wine (g/L).
- Residual Sugar: Amount of residual sugar in the wine (g/L).
- Chlorides: Amount of chlorides in the wine (g/L).
- Free Sulfur Dioxide: Amount of free sulfur dioxide in the wine (mg/L).
- Total Sulfur Dioxide: Total amount of sulfur dioxide in the wine (mg/L).
- Density: Density of the wine (g/cm³).
- pH: pH level of the wine.
- Sulphates: Amount of sulphates in the wine (g/L).
- Alcohol: Alcohol content in the wine (percentage).
- Quality: Quality rating of the wine, on a scale from 0 (very bad) to 10 (very excellent).
Statistics
The dataset includes frequency distributions for each feature, with counts in specified ranges. For example:
- Fixed Acidity: Ranges from 3.80 to 14.20.
- Volatile Acidity: Ranges from 0.08 to 1.10.
- Citric Acid: Ranges from 0.00 to 1.66.
- Residual Sugar: Ranges from 0.60 to 65.80.
- Chlorides: Ranges from 0.01 to 0.35.
- Free Sulfur Dioxide: Ranges from 2.00 to 440.00.
- Total Sulfur Dioxide: Ranges from 9.00 to 396.90.
- Density: Ranges from 0.99 to 1.04.
- pH: Ranges from 2.72 to 3.82.
- Sulphates: Ranges from 0.22 to 1.08.
- Alcohol: Ranges from 0.99 to 1.04.
Citation Request
This dataset is publicly available for research purposes. Please include the following citation if you use this dataset:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. (2009). Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4), 547-553. ISSN: 0167-9236. Available at: Elsevier | Pre-press (pdf) | bib
Sources
- Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos, and Jose Reis (CVRVV), 2009.
Past Usage
The dataset has been used in the research by Cortez et al. (2009) to model wine preferences based on physicochemical properties. The study involved applying various data mining methods, including support vector machines, to predict wine quality. Metrics such as MAD and confusion matrix for fixed error tolerance were computed.
Relevant Information
- The dataset includes data related to the Portuguese "Vinho Verde" wine. Due to privacy and logistical reasons, only physicochemical attributes and sensory quality ratings are available.
- The dataset can be used for classification or regression tasks, with considerations for feature selection and outlier detection due to class imbalance.
Number of Instances
- White wine: 4898 instances
Number of Attributes
- 11 input attributes + 1 output attribute (quality)
Attribute Information
For detailed information, refer to Cortez et al., 2009.
Missing Attribute Values