Predict Droughts Using Weather & Soil Data
Predicting continental US drought levels using meteorological & soil data.
@kaggle.cdminix_us_drought_meteorological_data
Predicting continental US drought levels using meteorological & soil data.
@kaggle.cdminix_us_drought_meteorological_data
Update (14/12/21): Kaggle Tasks are being deprecated, so I moved the current results on this dataset here:
User | Model/Notebook | Macro F1 Mean | MAE Mean |
---|---|---|---|
@cdminix | LSTM Baseline | 0.639 | 0.277 |
@epistoteles | Ridge Regression (default features) | 0.579 | 0.255 |
@epistoteles | Ridge Regression (MiniROCKET features) | 0.444 | 0.372 |
On NaN values: The drought scores are available weekly while the meteorological data points are available daily. To make using previous drought scores for prediction easier (e.g. by interpolating), I merged them into one file and set the drought scores to NaN were not available.
The US drought monitor is a measure of drought across the US manually created by experts using a wide range of data.
This datasets' aim is to help investigate if droughts could be predicted using only meteorological data, potentially leading to generalization of US predictions to other areas of the world.
This is a classification dataset over six levels of drought, which is no drought (None in the dataset), and five drought levels shown below.
Each entry is a drought level at a specific point in time in a specific US county, accompanied by the last 90 days of 18 meteorological indicators shown in the bottom of this description.
(image source: https://droughtmonitor.unl.edu)
To avoid data leakage, the data has been split into the following subsets.
Split | Year Range (inclusive) | Percentage (approximate) |
---|---|---|
Train | 2000-2009 | 47% |
Validation | 2010-2011 | 10% |
Test | 2012-2020 | 43% |
The dataset is imbalanced, as can be seen in the following graph.
This dataset would not exist without the open data offered by the NASA POWER Project and the authors of the US Drought Monitor.
Indicator | Description |
---|---|
WS10M_MIN | Minimum Wind Speed at 10 Meters (m/s) |
QV2M | Specific Humidity at 2 Meters (g/kg) |
T2M_RANGE | Temperature Range at 2 Meters (C) |
WS10M | Wind Speed at 10 Meters (m/s) |
T2M | Temperature at 2 Meters (C) |
WS50M_MIN | Minimum Wind Speed at 50 Meters (m/s) |
T2M_MAX | Maximum Temperature at 2 Meters (C) |
WS50M | Wind Speed at 50 Meters (m/s) |
TS | Earth Skin Temperature (C) |
WS50M_RANGE | Wind Speed Range at 50 Meters (m/s) |
WS50M_MAX | Maximum Wind Speed at 50 Meters (m/s) |
WS10M_MAX | Maximum Wind Speed at 10 Meters (m/s) |
WS10M_RANGE | Wind Speed Range at 10 Meters (m/s) |
PS | Surface Pressure (kPa) |
T2MDEW | Dew/Frost Point at 2 Meters (C) |
T2M_MIN | Minimum Temperature at 2 Meters (C) |
T2MWET | Wet Bulb Temperature at 2 Meters (C) |
PRECTOT | Precipitation (mm day-1) |
Update (23/07/21): The prediction task is now finalised. The [starter]((https://www.kaggle.com/cdminix/starter-us-drought-meteorological-data) and baseline notebooks have been updated. We now use a 180-day window of past data for predictions, and include previous drought values, static data, and meteorological data from the year prior. We also now evaluate on 6 future weeks of predictions. While the baseline model is still very simple, it performs much better using this additional input data.
Update (03/03/21): the new version adds features from the harmonized world soil database.
Anyone who has the link will be able to view this.