Predict Droughts Using Weather & Soil Data by Kaggle | Environmental and Climate Sciences

About this Dataset

Predict Droughts Using Weather & Soil Data

Update (14/12/21): Kaggle Tasks are being deprecated, so I moved the current results on this dataset here:

User	Model/Notebook	Macro F1 Mean	MAE Mean
@cdminix	LSTM Baseline	0.639	0.277
@epistoteles	Ridge Regression (default features)	0.579	0.255
@epistoteles	Ridge Regression (MiniROCKET features)	0.444	0.372

On NaN values: The drought scores are available weekly while the meteorological data points are available daily. To make using previous drought scores for prediction easier (e.g. by interpolating), I merged them into one file and set the drought scores to NaN were not available.

Context

The US drought monitor is a measure of drought across the US manually created by experts using a wide range of data.
This datasets' aim is to help investigate if droughts could be predicted using only meteorological data, potentially leading to generalization of US predictions to other areas of the world.

Content

This is a classification dataset over six levels of drought, which is no drought (None in the dataset), and five drought levels shown below.
Each entry is a drought level at a specific point in time in a specific US county, accompanied by the last 90 days of 18 meteorological indicators shown in the bottom of this description.

(image source: https://droughtmonitor.unl.edu)

To avoid data leakage, the data has been split into the following subsets.

Split	Year Range (inclusive)	Percentage (approximate)
Train	2000-2009	47%
Validation	2010-2011	10%
Test	2012-2020	43%

Dataset Imbalance

The dataset is imbalanced, as can be seen in the following graph.

Acknowledgements

This dataset would not exist without the open data offered by the NASA POWER Project and the authors of the US Drought Monitor.

These data were obtained from the NASA Langley Research Center (LaRC) POWER Project funded through the NASA Earth Science/Applied Science Program.
The U.S. Drought Monitor is produced through a partnership between the National Drought Mitigation Center at the University of Nebraska-Lincoln, the United States Department of Agriculture, and the National Oceanic and Atmospheric Administration.
This dataset utilizes the Harmonized World Soil Database by Fischer, G., F. Nachtergaele, S. Prieler, H.T. van Velthuizen, L. Verelst, D. Wiberg, 2008. Global Agro-ecological Zones Assessment for Agriculture (GAEZ 2008). IIASA, Laxenburg, Austria and FAO, Rome, Italy.

Meteorological Indicators

Indicator	Description
WS10M_MIN	Minimum Wind Speed at 10 Meters (m/s)
QV2M	Specific Humidity at 2 Meters (g/kg)
T2M_RANGE	Temperature Range at 2 Meters (C)
WS10M	Wind Speed at 10 Meters (m/s)
T2M	Temperature at 2 Meters (C)
WS50M_MIN	Minimum Wind Speed at 50 Meters (m/s)
T2M_MAX	Maximum Temperature at 2 Meters (C)
WS50M	Wind Speed at 50 Meters (m/s)
TS	Earth Skin Temperature (C)
WS50M_RANGE	Wind Speed Range at 50 Meters (m/s)
WS50M_MAX	Maximum Wind Speed at 50 Meters (m/s)
WS10M_MAX	Maximum Wind Speed at 10 Meters (m/s)
WS10M_RANGE	Wind Speed Range at 10 Meters (m/s)
PS	Surface Pressure (kPa)
T2MDEW	Dew/Frost Point at 2 Meters (C)
T2M_MIN	Minimum Temperature at 2 Meters (C)
T2MWET	Wet Bulb Temperature at 2 Meters (C)
PRECTOT	Precipitation (mm day-1)

Previous Updates

Update (23/07/21): The prediction task is now finalised. The [starter]((https://www.kaggle.com/cdminix/starter-us-drought-meteorological-data) and baseline notebooks have been updated. We now use a 180-day window of past data for predictions, and include previous drought values, static data, and meteorological data from the year prior. We also now evaluate on 6 future weeks of predictions. While the baseline model is still very simple, it performs much better using this additional input data.

Update (03/03/21): the new version adds features from the harmonized world soil database.

Tables

Soil Data

@kaggle.cdminix_us_drought_meteorological_data.soil_data

423.1 KB
3109 rows
32 columns


CREATE TABLE soil_data (
  "fips" BIGINT,
  "lat" DOUBLE,
  "lon" DOUBLE,
  "elevation" BIGINT,
  "slope1" DOUBLE,
  "slope2" DOUBLE,
  "slope3" DOUBLE,
  "slope4" DOUBLE,
  "slope5" DOUBLE,
  "slope6" DOUBLE,
  "slope7" DOUBLE,
  "slope8" DOUBLE,
  "aspectn" DOUBLE,
  "aspecte" DOUBLE,
  "aspects" DOUBLE,
  "aspectw" DOUBLE,
  "aspectunknown" DOUBLE,
  "wat_land" DOUBLE,
  "nvg_land" DOUBLE,
  "urb_land" DOUBLE,
  "grs_land" DOUBLE,
  "for_land" DOUBLE,
  "cultrf_land" DOUBLE,
  "cultir_land" DOUBLE,
  "cult_land" DOUBLE,
  "sq1" BIGINT,
  "sq2" BIGINT,
  "sq3" BIGINT,
  "sq4" BIGINT,
  "sq5" BIGINT,
  "sq6" BIGINT,
  "sq7" BIGINT
);