Coffee Quality database
These datasets are gathered from Coffee Quality Institute (CQI) in January, 2018.
I'm not the Owner of the Datasets, nor scrapping was performed by me. It was done in this GitHub's repo (kudos for the author), see there for further details.
What about the data files?
Three CSV files are provided:
-
An Arabica coffee pre-cleaned dataset;
-
A Robusta coffee pre-cleaned dataset;
-
A dataset constructed through a merging of the datasets.
The file names indicates the above datasets clearly.
And what is inside?
As explained in the repo, the datasets have reviews from specialized reviewers for both coffees: arabica and robusta. The below information is provided in each dataset.
Quality Measures
- Aroma
- Flavor
- Aftertaste
- Acidity
- Body
- Balance
- Uniformity
- Cup Cleanliness
- Sweetness
- Moisture
- Defects
Bean Metadata
- Processing Method
- Color
- Species (arabica / robusta)
Farm Metadata
- Owner
- Country of Origin
- Farm Name
- Lot Number
- Mill
- Company
- Altitude
- Region
Related datasets
There is one related dataset here in Kaggle, please check here. It's pretty much similar to the datasets presented here, but without Robusta coffee data.