Weather Data For Renewable Generation Prediction by Kaggle | Environmental and Climate Sciences

About this Dataset

Weather Data For Renewable Generation Prediction

Overview

This dataset contains comprehensive weather data recorded over three years, from 2017 to 2019, by NASA in Tours, France. It is sourced from the National Solar Radiation Database (NSRDB) and is specifically designed for predicting solar and wind generation. The dataset includes various meteorological measurements and conditions.
Files and Structure

The dataset consists of three CSV files, one for each year from 2017 to 2019. Each file contains detailed weather observations with the following columns:

Year: Year of the observation
Month: Month of the observation
Day: Day of the observation
Hour: Hour of the observation
Minute: Minute of the observation
Temperature (°C): Temperature in Celsius
Clearsky DHI (W/m²): Clearsky diffuse horizontal irradiance in watts per square meter
Clearsky DNI (W/m²): Clearsky direct normal irradiance in watts per square meter
Clearsky GHI (W/m²): Clearsky global horizontal irradiance in watts per square meter
Cloud Type: Cloud type identifier

Location Details

Location ID	Latitude	Longitude	Time Zone	Elevation	Local Time Zone
361685	47.41	0.78	1	54	1

Cloud Type

The Cloud Type column contains identifiers representing different types of clouds observed:

-15	0	1	2	3	4	5	6	7	8	9	10	11	12
N/A	Clear	Probably Clear	Fog	Water	Super-Cooled Water	Mixed	Opaque Ice	Cirrus	Overlapping	Overshooting	Unknown	Dust	Smoke

Fill Flag

The Fill Flag column contains identifiers indicating data quality and the presence of any issues:

0	1	2	3	4	5
N/A	Missing Image	Low Irradiance	Exceeds Clearsky	Missing CLoud Properties	Rayleigh Violation

Data Columns Description

Year, Month, Day: These columns specify the date of the observation.
Hour, Minute: These columns specify the time of the observation.
Temperature (°C): The ambient temperature at the time of observation in degrees Celsius.
Clearsky DHI (W/m²): The amount of solar radiation received per unit area by a surface that is horizontal to the ground from the sky excluding the direct sunlight, measured in watts per square meter.
Clearsky DNI (W/m²): The amount of solar radiation received per unit area by a surface that is always held perpendicular to the rays that come in a straight line from the direction of the sun, measured in watts per square meter.
Clearsky GHI (W/m²): The total amount of solar radiation received per unit area by a horizontal surface, measured in watts per square meter.
*Cloud Type: An identifier representing the type of clouds present at the time of observation.

Usage

This dataset is valuable for researchers and analysts working on solar and wind energy generation prediction, weather forecasting models, climate change research, and other meteorological applications. It provides detailed and granular data over a span of three years, allowing for in-depth analysis and model training.
Acknowledgments

The data is provided by the National Solar Radiation Database (NSRDB). Proper citation and acknowledgment should be given when using this dataset for research and publication purposes.

Convert NSRDB csv to pandas

import pandas as pd
import numpy as np

## Import NSRDB files
df2017 = pd.read_csv('361685_47.41_0.78_2017.csv', skiprows=2)
df2018 = pd.read_csv('361685_47.41_0.78_2018.csv', skiprows=2)
df2019 = pd.read_csv('361685_47.41_0.78_2019.csv', skiprows=2)

## concatenate in a whole dataset
df = pd.concat([df2017, df2018, df2019])

## genenarate a datetime index column
df['datetime'] = pd.to_datetime(df['Year'].astype(str) +'-' 
                                + df['Month'].astype(str) 
                                + '-' + df['Day'].astype(str) 
                                + ' ' + df['Hour'].astype(str) 
                                + ':' + df['Minute'].astype(str) + ':00'
                                )

## drop useless columns
df = df.drop(['Year', 'Month', 'Day', 'Hour', 'Minute'], axis=1)

## set datetiem as index
df = df.set_index('datetime')

## accelerate processing by reducing information
df = df.astype(np.float32)

Tables

N 359516–47–41–0–70–2017

@kaggle.adri1g_nsrdb_tours.n_359516_47_41_0_70_2017

564.27 KB
35042 rows
46 columns


CREATE TABLE n_359516_47_41_0_70_2017 (
  "source" VARCHAR,
  "location_id" VARCHAR,
  "city" VARCHAR,
  "state" VARCHAR,
  "country" VARCHAR,
  "latitude" VARCHAR,
  "longitude" VARCHAR,
  "time_zone" VARCHAR,
  "elevation" VARCHAR,
  "local_time_zone" VARCHAR,
  "clearsky_dhi_units" VARCHAR,
  "clearsky_dni_units" VARCHAR,
  "clearsky_ghi_units" VARCHAR,
  "dew_point_units" VARCHAR,
  "dhi_units" VARCHAR,
  "dni_units" VARCHAR,
  "ghi_units" VARCHAR,
  "solar_zenith_angle_units" VARCHAR,
  "temperature_units" VARCHAR,
  "pressure_units" VARCHAR,
  "relative_humidity_units" VARCHAR,
  "precipitable_water_units" VARCHAR,
  "wind_direction_units" VARCHAR,
  "wind_speed_units" VARCHAR,
  "cloud_type_15" VARCHAR,
  "cloud_type_0" VARCHAR,
  "cloud_type_1" VARCHAR,
  "cloud_type_2" VARCHAR,
  "cloud_type_3" VARCHAR,
  "cloud_type_4" VARCHAR,
  "cloud_type_5" VARCHAR,
  "cloud_type_6" VARCHAR,
  "cloud_type_7" VARCHAR,
  "cloud_type_8" VARCHAR,
  "cloud_type_9" VARCHAR,
  "cloud_type_10" VARCHAR,
  "cloud_type_11" VARCHAR,
  "cloud_type_12" VARCHAR,
  "fill_flag_0" VARCHAR,
  "fill_flag_1" VARCHAR,
  "fill_flag_2" VARCHAR,
  "fill_flag_3" VARCHAR,
  "fill_flag_4" VARCHAR,
  "fill_flag_5" VARCHAR,
  "surface_albedo_units" VARCHAR,
  "version" VARCHAR
);

N 359516–47–41–0–70–2018

@kaggle.adri1g_nsrdb_tours.n_359516_47_41_0_70_2018

564.82 KB
35042 rows
46 columns


CREATE TABLE n_359516_47_41_0_70_2018 (
  "source" VARCHAR,
  "location_id" VARCHAR,
  "city" VARCHAR,
  "state" VARCHAR,
  "country" VARCHAR,
  "latitude" VARCHAR,
  "longitude" VARCHAR,
  "time_zone" VARCHAR,
  "elevation" VARCHAR,
  "local_time_zone" VARCHAR,
  "clearsky_dhi_units" VARCHAR,
  "clearsky_dni_units" VARCHAR,
  "clearsky_ghi_units" VARCHAR,
  "dew_point_units" VARCHAR,
  "dhi_units" VARCHAR,
  "dni_units" VARCHAR,
  "ghi_units" VARCHAR,
  "solar_zenith_angle_units" VARCHAR,
  "temperature_units" VARCHAR,
  "pressure_units" VARCHAR,
  "relative_humidity_units" VARCHAR,
  "precipitable_water_units" VARCHAR,
  "wind_direction_units" VARCHAR,
  "wind_speed_units" VARCHAR,
  "cloud_type_15" VARCHAR,
  "cloud_type_0" VARCHAR,
  "cloud_type_1" VARCHAR,
  "cloud_type_2" VARCHAR,
  "cloud_type_3" VARCHAR,
  "cloud_type_4" VARCHAR,
  "cloud_type_5" VARCHAR,
  "cloud_type_6" VARCHAR,
  "cloud_type_7" VARCHAR,
  "cloud_type_8" VARCHAR,
  "cloud_type_9" VARCHAR,
  "cloud_type_10" VARCHAR,
  "cloud_type_11" VARCHAR,
  "cloud_type_12" VARCHAR,
  "fill_flag_0" VARCHAR,
  "fill_flag_1" VARCHAR,
  "fill_flag_2" VARCHAR,
  "fill_flag_3" VARCHAR,
  "fill_flag_4" VARCHAR,
  "fill_flag_5" VARCHAR,
  "surface_albedo_units" VARCHAR,
  "version" VARCHAR
);

N 359516–47–41–0–70–2019

@kaggle.adri1g_nsrdb_tours.n_359516_47_41_0_70_2019

566.54 KB
35042 rows
46 columns


CREATE TABLE n_359516_47_41_0_70_2019 (
  "source" VARCHAR,
  "location_id" VARCHAR,
  "city" VARCHAR,
  "state" VARCHAR,
  "country" VARCHAR,
  "latitude" VARCHAR,
  "longitude" VARCHAR,
  "time_zone" VARCHAR,
  "elevation" VARCHAR,
  "local_time_zone" VARCHAR,
  "clearsky_dhi_units" VARCHAR,
  "clearsky_dni_units" VARCHAR,
  "clearsky_ghi_units" VARCHAR,
  "dew_point_units" VARCHAR,
  "dhi_units" VARCHAR,
  "dni_units" VARCHAR,
  "ghi_units" VARCHAR,
  "solar_zenith_angle_units" VARCHAR,
  "temperature_units" VARCHAR,
  "pressure_units" VARCHAR,
  "relative_humidity_units" VARCHAR,
  "precipitable_water_units" VARCHAR,
  "wind_direction_units" VARCHAR,
  "wind_speed_units" VARCHAR,
  "cloud_type_15" VARCHAR,
  "cloud_type_0" VARCHAR,
  "cloud_type_1" VARCHAR,
  "cloud_type_2" VARCHAR,
  "cloud_type_3" VARCHAR,
  "cloud_type_4" VARCHAR,
  "cloud_type_5" VARCHAR,
  "cloud_type_6" VARCHAR,
  "cloud_type_7" VARCHAR,
  "cloud_type_8" VARCHAR,
  "cloud_type_9" VARCHAR,
  "cloud_type_10" VARCHAR,
  "cloud_type_11" VARCHAR,
  "cloud_type_12" VARCHAR,
  "fill_flag_0" VARCHAR,
  "fill_flag_1" VARCHAR,
  "fill_flag_2" VARCHAR,
  "fill_flag_3" VARCHAR,
  "fill_flag_4" VARCHAR,
  "fill_flag_5" VARCHAR,
  "surface_albedo_units" VARCHAR,
  "version" VARCHAR
);