Wine Quality
Analyzing "Vinho Verde" Wine Quality: A Data Science Approach
@kaggle.abdelazizsami_wine_quality
Analyzing "Vinho Verde" Wine Quality: A Data Science Approach
@kaggle.abdelazizsami_wine_quality
Input Variables: Physicochemical properties (e.g., pH, alcohol content, acidity).
Output Variable: Sensory ratings (quality), which are ordered categories.
Classification or Regression:
Treat the output as a categorical variable (classification) or as a continuous score (regression).
Outlier Detection:
Identify outliers (e.g., excellent or poor wines) using techniques like Isolation Forest or Local Outlier Factor (LOF).
Feature Selection:
Apply methods such as Recursive Feature Elimination (RFE), LASSO, or tree-based feature importance to identify relevant features.
Try models like Logistic Regression, Decision Trees, Random Forest, or Gradient Boosting.
Use Linear Regression, SVR, or Tree-based models like Random Forest Regressor.
Analyze which features contribute the most to the predictions to aid in understanding the data.
CREATE TABLE winequality_red (
"fixed_acidity" DOUBLE,
"volatile_acidity" DOUBLE,
"citric_acid" DOUBLE,
"residual_sugar" DOUBLE,
"chlorides" DOUBLE,
"free_sulfur_dioxide" DOUBLE,
"total_sulfur_dioxide" DOUBLE,
"density" DOUBLE,
"ph" DOUBLE,
"sulphates" DOUBLE,
"alcohol" DOUBLE,
"quality" BIGINT
);CREATE TABLE winequality_white (
"fixed_acidity" DOUBLE,
"volatile_acidity" DOUBLE,
"citric_acid" DOUBLE,
"residual_sugar" DOUBLE,
"chlorides" DOUBLE,
"free_sulfur_dioxide" DOUBLE,
"total_sulfur_dioxide" DOUBLE,
"density" DOUBLE,
"ph" DOUBLE,
"sulphates" DOUBLE,
"alcohol" DOUBLE,
"quality" BIGINT
);Anyone who has the link will be able to view this.