Update (14/12/21): Kaggle Tasks are being deprecated, so I moved the current results on this dataset here:
On NaN values: The drought scores are available weekly while the meteorological data points are available daily. To make using previous drought scores for prediction easier (e.g. by interpolating), I merged them into one file and set the drought scores to NaN were not available.
Context
The US drought monitor is a measure of drought across the US manually created by experts using a wide range of data.
This datasets' aim is to help investigate if droughts could be predicted using only meteorological data, potentially leading to generalization of US predictions to other areas of the world.
Content
This is a classification dataset over six levels of drought, which is no drought (None in the dataset), and five drought levels shown below.
Each entry is a drought level at a specific point in time in a specific US county, accompanied by the last 90 days of 18 meteorological indicators shown in the bottom of this description.
(image source: https://droughtmonitor.unl.edu)
To avoid data leakage, the data has been split into the following subsets.
Split |
Year Range (inclusive) |
Percentage (approximate) |
Train |
2000-2009 |
47% |
Validation |
2010-2011 |
10% |
Test |
2012-2020 |
43% |
Dataset Imbalance
The dataset is imbalanced, as can be seen in the following graph.
Acknowledgements
This dataset would not exist without the open data offered by the NASA POWER Project and the authors of the US Drought Monitor.
- These data were obtained from the NASA Langley Research Center (LaRC) POWER Project funded through the NASA Earth Science/Applied Science Program.
- The U.S. Drought Monitor is produced through a partnership between the National Drought Mitigation Center at the University of Nebraska-Lincoln, the United States Department of Agriculture, and the National Oceanic and Atmospheric Administration.
- This dataset utilizes the Harmonized World Soil Database by Fischer, G., F. Nachtergaele, S. Prieler, H.T. van Velthuizen, L. Verelst, D. Wiberg, 2008. Global Agro-ecological Zones Assessment for Agriculture (GAEZ 2008). IIASA, Laxenburg, Austria and FAO, Rome, Italy.
Meteorological Indicators
Indicator |
Description |
WS10M_MIN |
Minimum Wind Speed at 10 Meters (m/s) |
QV2M |
Specific Humidity at 2 Meters (g/kg) |
T2M_RANGE |
Temperature Range at 2 Meters (C) |
WS10M |
Wind Speed at 10 Meters (m/s) |
T2M |
Temperature at 2 Meters (C) |
WS50M_MIN |
Minimum Wind Speed at 50 Meters (m/s) |
T2M_MAX |
Maximum Temperature at 2 Meters (C) |
WS50M |
Wind Speed at 50 Meters (m/s) |
TS |
Earth Skin Temperature (C) |
WS50M_RANGE |
Wind Speed Range at 50 Meters (m/s) |
WS50M_MAX |
Maximum Wind Speed at 50 Meters (m/s) |
WS10M_MAX |
Maximum Wind Speed at 10 Meters (m/s) |
WS10M_RANGE |
Wind Speed Range at 10 Meters (m/s) |
PS |
Surface Pressure (kPa) |
T2MDEW |
Dew/Frost Point at 2 Meters (C) |
T2M_MIN |
Minimum Temperature at 2 Meters (C) |
T2MWET |
Wet Bulb Temperature at 2 Meters (C) |
PRECTOT |
Precipitation (mm day-1) |
Previous Updates
Update (23/07/21): The prediction task is now finalised. The [starter]((https://www.kaggle.com/cdminix/starter-us-drought-meteorological-data) and baseline notebooks have been updated. We now use a 180-day window of past data for predictions, and include previous drought values, static data, and meteorological data from the year prior. We also now evaluate on 6 future weeks of predictions. While the baseline model is still very simple, it performs much better using this additional input data.
Update (03/03/21): the new version adds features from the harmonized world soil database.