Baselight

UCI Air Quality Dataset

Comprehensive Data on Pollutant Concentrations Over Time

@kaggle.dakshbhalala_uci_air_quality_dataset

About this Dataset

UCI Air Quality Dataset

Air Quality Measurements Dataset

Description

This dataset encompasses comprehensive air quality measurements collected over several months, focusing on various pollutants. It is intended for use in predictive modeling and data analysis within the fields of environmental science and public health. The data offers valuable insights into the concentration levels of different gases, making it suitable for both regression and classification tasks in machine learning applications.

Features

Feature Description
Date The date of the measurement.
Time The time of the measurement.
CO(GT) Concentration of carbon monoxide (CO) in the air (µg/m³).
PT08.S1(CO) Sensor measurement for CO concentration.
NMHC(GT) Concentration of non-methane hydrocarbons (NMHC) (µg/m³).
C6H6(GT) Concentration of benzene (C6H6) in the air (µg/m³).
PT08.S2(NMHC) Sensor measurement for NMHC concentration.
NOx(GT) Concentration of nitrogen oxides (NOx) in the air (µg/m³).
PT08.S3(NOx) Sensor measurement for NOx concentration.
NO2(GT) Concentration of nitrogen dioxide (NO2) in the air (µg/m³).

Statistical Overview

The dataset includes frequency distributions for each feature, categorized into specified ranges. Key statistics include:

  • CO(GT): Values can range significantly, with minimums around -200 µg/m³.
  • NOx(GT): Concentration values span various ranges, with some exceeding 2000 µg/m³.

Citation Request

This dataset is publicly available for research purposes. If you use this dataset, please cite it as follows:

[Insert citation details based on the original source of the dataset].

Sources

Created by: [Include authors or organizations responsible for the dataset].

Past Usage

The dataset has been utilized in numerous studies focusing on air quality analysis and its implications for public health. It serves as a foundational resource for applying various data mining techniques to explore pollutant concentrations and their correlations with health outcomes.

Relevant Information

The dataset features temporal measurements related to air quality, enabling the assessment of pollution trends over time. It can be leveraged for both classification and regression tasks, with a focus on data normalization and strategies for handling missing values.

Number of Instances

  • Total Records: 951 (across specified time frames)

Number of Attributes

  • Input Attributes: 10 attributes related to air quality measurements.

Missing Attribute Values

  • Some measurements may be recorded as -200, indicating missing or invalid data points.

Share link

Anyone who has the link will be able to view this.