Red Wine DataSet
wine-quality-measure
@kaggle.soorajgupta7_red_wine_dataset
wine-quality-measure
@kaggle.soorajgupta7_red_wine_dataset
Datasets Description:
The datasets under discussion pertain to the red and white variants of Portuguese "Vinho Verde" wine. Detailed information is available in the reference by Cortez et al. (2009). These datasets encompass physicochemical variables as inputs and sensory variables as outputs. Notably, specifics regarding grape types, wine brand, and selling prices are absent due to privacy and logistical concerns.
Classification and Regression Tasks:
One can interpret these datasets as being suitable for both classification and regression analyses. The classes are ordered, albeit imbalanced. For instance, the dataset contains a more significant number of normal wines compared to excellent or poor ones.
Dataset Contents:
For a comprehensive understanding, readers are encouraged to review the work by Cortez et al. (2009). The input variables, derived from physicochemical tests, include:
The output variable, based on sensory data, is denoted by:
12. Quality (score ranging from 0 to 10)
Usage Tips:
A practical suggestion involves setting a threshold for the dependent variable, defining wines with a quality score of 7 or higher as 'good/1' and the rest as 'not good/0.' This facilitates meaningful experimentation with hyperparameter tuning using decision tree algorithms and analyzing ROC curves and AUC values.
Operational Workflow:
To efficiently utilize the dataset, the following steps are recommended:
Tools and Acknowledgments:
For an efficient analysis, consider using KNIME, a valuable graphical user interface (GUI) tool. Additionally, the dataset is available on the UCI machine learning repository, and proper acknowledgment and citation of the dataset source by Cortez et al. (2009) are essential for use.
Anyone who has the link will be able to view this.