Weather Data For Renewable Generation Prediction
3 years of weather from The National Solar Radiation Database (Tours FR)
@kaggle.adri1g_nsrdb_tours
3 years of weather from The National Solar Radiation Database (Tours FR)
@kaggle.adri1g_nsrdb_tours
This dataset contains comprehensive weather data recorded over three years, from 2017 to 2019, by NASA in Tours, France. It is sourced from the National Solar Radiation Database (NSRDB) and is specifically designed for predicting solar and wind generation. The dataset includes various meteorological measurements and conditions.
Files and Structure
The dataset consists of three CSV files, one for each year from 2017 to 2019. Each file contains detailed weather observations with the following columns:
Location ID | Latitude | Longitude | Time Zone | Elevation | Local Time Zone |
---|---|---|---|---|---|
361685 | 47.41 | 0.78 | 1 | 54 | 1 |
The Cloud Type column contains identifiers representing different types of clouds observed:
-15 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
N/A | Clear | Probably Clear | Fog | Water | Super-Cooled Water | Mixed | Opaque Ice | Cirrus | Overlapping | Overshooting | Unknown | Dust | Smoke |
The Fill Flag column contains identifiers indicating data quality and the presence of any issues:
0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
N/A | Missing Image | Low Irradiance | Exceeds Clearsky | Missing CLoud Properties | Rayleigh Violation |
This dataset is valuable for researchers and analysts working on solar and wind energy generation prediction, weather forecasting models, climate change research, and other meteorological applications. It provides detailed and granular data over a span of three years, allowing for in-depth analysis and model training.
Acknowledgments
The data is provided by the National Solar Radiation Database (NSRDB). Proper citation and acknowledgment should be given when using this dataset for research and publication purposes.
import pandas as pd
import numpy as np
# Import NSRDB files
df2017 = pd.read_csv('361685_47.41_0.78_2017.csv', skiprows=2)
df2018 = pd.read_csv('361685_47.41_0.78_2018.csv', skiprows=2)
df2019 = pd.read_csv('361685_47.41_0.78_2019.csv', skiprows=2)
# concatenate in a whole dataset
df = pd.concat([df2017, df2018, df2019])
# genenarate a datetime index column
df['datetime'] = pd.to_datetime(df['Year'].astype(str) +'-'
+ df['Month'].astype(str)
+ '-' + df['Day'].astype(str)
+ ' ' + df['Hour'].astype(str)
+ ':' + df['Minute'].astype(str) + ':00'
)
# drop useless columns
df = df.drop(['Year', 'Month', 'Day', 'Hour', 'Minute'], axis=1)
# set datetiem as index
df = df.set_index('datetime')
# accelerate processing by reducing information
df = df.astype(np.float32)
Anyone who has the link will be able to view this.