Baselight

Weather Data For Renewable Generation Prediction

3 years of weather from The National Solar Radiation Database (Tours FR)

@kaggle.adri1g_nsrdb_tours

About this Dataset

Weather Data For Renewable Generation Prediction

Overview

This dataset contains comprehensive weather data recorded over three years, from 2017 to 2019, by NASA in Tours, France. It is sourced from the National Solar Radiation Database (NSRDB) and is specifically designed for predicting solar and wind generation. The dataset includes various meteorological measurements and conditions.
Files and Structure

The dataset consists of three CSV files, one for each year from 2017 to 2019. Each file contains detailed weather observations with the following columns:

  • Year: Year of the observation
  • Month: Month of the observation
  • Day: Day of the observation
  • Hour: Hour of the observation
  • Minute: Minute of the observation
  • Temperature (°C): Temperature in Celsius
  • Clearsky DHI (W/m²): Clearsky diffuse horizontal irradiance in watts per square meter
  • Clearsky DNI (W/m²): Clearsky direct normal irradiance in watts per square meter
  • Clearsky GHI (W/m²): Clearsky global horizontal irradiance in watts per square meter
  • Cloud Type: Cloud type identifier

Location Details

Location ID Latitude Longitude Time Zone Elevation Local Time Zone
361685 47.41 0.78 1 54 1

Cloud Type

The Cloud Type column contains identifiers representing different types of clouds observed:

-15 0 1 2 3 4 5 6 7 8 9 10 11 12
N/A Clear Probably Clear Fog Water Super-Cooled Water Mixed Opaque Ice Cirrus Overlapping Overshooting Unknown Dust Smoke

Fill Flag

The Fill Flag column contains identifiers indicating data quality and the presence of any issues:

0 1 2 3 4 5
N/A Missing Image Low Irradiance Exceeds Clearsky Missing CLoud Properties Rayleigh Violation

Data Columns Description

  • Year, Month, Day: These columns specify the date of the observation.
  • Hour, Minute: These columns specify the time of the observation.
  • Temperature (°C): The ambient temperature at the time of observation in degrees Celsius.
  • Clearsky DHI (W/m²): The amount of solar radiation received per unit area by a surface that is horizontal to the ground from the sky excluding the direct sunlight, measured in watts per square meter.
  • Clearsky DNI (W/m²): The amount of solar radiation received per unit area by a surface that is always held perpendicular to the rays that come in a straight line from the direction of the sun, measured in watts per square meter.
  • Clearsky GHI (W/m²): The total amount of solar radiation received per unit area by a horizontal surface, measured in watts per square meter.
  • *Cloud Type: An identifier representing the type of clouds present at the time of observation.

Usage

This dataset is valuable for researchers and analysts working on solar and wind energy generation prediction, weather forecasting models, climate change research, and other meteorological applications. It provides detailed and granular data over a span of three years, allowing for in-depth analysis and model training.
Acknowledgments

The data is provided by the National Solar Radiation Database (NSRDB). Proper citation and acknowledgment should be given when using this dataset for research and publication purposes.

Convert NSRDB csv to pandas

import pandas as pd
import numpy as np

# Import NSRDB files
df2017 = pd.read_csv('361685_47.41_0.78_2017.csv', skiprows=2)
df2018 = pd.read_csv('361685_47.41_0.78_2018.csv', skiprows=2)
df2019 = pd.read_csv('361685_47.41_0.78_2019.csv', skiprows=2)

# concatenate in a whole dataset
df = pd.concat([df2017, df2018, df2019])

# genenarate a datetime index column
df['datetime'] = pd.to_datetime(df['Year'].astype(str) +'-' 
                                + df['Month'].astype(str) 
                                + '-' + df['Day'].astype(str) 
                                + ' ' + df['Hour'].astype(str) 
                                + ':' + df['Minute'].astype(str) + ':00'
                                )

# drop useless columns
df = df.drop(['Year', 'Month', 'Day', 'Hour', 'Minute'], axis=1)

# set datetiem as index
df = df.set_index('datetime')

# accelerate processing by reducing information
df = df.astype(np.float32)

Share link

Anyone who has the link will be able to view this.