Dataset
Machine Learning for Hourly Air Pollution Prediction – Global (ML-HAPPG)
Abstract
This dataset contains estimates of air pollution levels across the globe for every hour of the year 2022. It covers five major air pollutants that can affect human health and the environment. The data cover major air pollutants, including Nitrogen Dioxide (NO2), Ozone (O3), Particulate Matter smaller than 10 micrometres (PM10) and smaller than 2.5 micrometres (PM2.5), and Sulphur Dioxide (SO2). Each air pollutant's concentrations are predicted not only as average (mean) values but also include estimates at lower (5th percentile), median (50th percentile), and upper (95th percentile) levels to highlight typical and potential extreme pollution scenarios. The spatial coverage of the dataset includes the entire globe, structured as an evenly spaced grid, with each grid square covering an area of 0.25 degrees (0.25 degrees x 0.25 degrees). Data points correspond to the centre of these grid squares. There is also training data used for the model from real-world monitoring stations.
Details
| Previous Info: |
No news update for this record
|
|---|---|
| Previously used record identifiers: |
No related previous identifiers.
|
| Access rules: |
Public data: access to these data is available to both registered and non-registered users.
Use of these data is covered by the following licence(s): http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/ When using these data you must cite them correctly using the citation given on the CEDA Data Catalogue record. |
| Data lineage: |
These pollution estimates were produced using a supervised machine learning method, which is a computational approach where algorithms are trained to identify patterns in historical data and apply these learned patterns to predict new data points. The predictions incorporated various environmental factors, including weather conditions (e.g., temperature, wind, precipitation), satellite measurements, and emission inventories (datasets detailing pollutants released into the atmosphere). Additionally, the dataset provides uncertainty intervals through percentile-based estimates, giving users insights into the reliability of the predictions. |
| Data Quality: |
The dataset was created by Liam J. Berrisford at the University of Exeter during his PhD studies, supported by the UK Research and Innovation (UKRI) Centre for Doctoral Training in Environmental Intelligence. Full methodological details and data validation information are available in the associated open-access scientific publication. For more information about the data, see the README.md archived alongside this dataset.
This dataset provides hourly predictions of air pollution concentrations for the globe throughout the year 2022. These values are not direct measurements from monitoring stations but rather model-based estimates generated using a supervised machine-learning approach. The model was trained using real observations from the OpenAQ monitoring network, but it is capable of making predictions even in areas without any nearby monitoring stations. This means that the dataset offers complete spatial and temporal coverage, filling in gaps where no sensor data exists.
|
| File Format: |
NetCDF, txt, json
|
Related Documents
Process overview
| Title | Machine-Learning-Based Prediction of Air Pollution Estimates |
| Abstract | The dataset was created using a supervised machine-learning pipeline. The pipeline generates air pollution concentration predictions across a global 0.25 x 0.25 degree grid. |
| Input Description | None |
| Output Description | None |
| Software Reference | None |
- units: µg/m³
- var_id: no2_Prediction_0p05_Quantile
- long_name: 5th percentile of modeled NO₂
- units: µg/m³
- var_id: o3_Prediction_0p05_Quantile
- long_name: 5th percentile of modeled O₃
- units: µg/m³
- var_id: pm10_Prediction_0p05_Quantile
- long_name: 5th percentile of modeled PM₁₀
- units: µg/m³
- var_id: pm2p5_Prediction_0p05_Quantile
- long_name: 5th percentile of modeled PM₂.₅
- units: µg/m³
- var_id: so2_Prediction_0p05_Quantile
- long_name: 5th percentile of modeled SO₂
- units: µg/m³
- var_id: no2_Prediction_0p95_Quantile
- long_name: 95th percentile of modeled NO₂
- units: µg/m³
- var_id: o3_Prediction_0p95_Quantile
- long_name: 95th percentile of modeled O₃
- units: µg/m³
- var_id: pm10_Prediction_0p95_Quantile
- long_name: 95th percentile of modeled PM₁₀
- units: µg/m³
- var_id: pm2p5_Prediction_0p95_Quantile
- long_name: 95th percentile of modeled PM₂.₅
- units: µg/m³
- var_id: so2_Prediction_0p95_Quantile
- long_name: 95th percentile of modeled SO₂
- var_id: Sentinel_5P_AAI
- long_name: Absorbing Aerosol Index
- units: K
- var_id: Temperature_2m
- long_name: Air temperature at 2 m
- units: m
- var_id: Boundary_Layer_Height
- long_name: Atmospheric boundary layer height
- var_id: Year
- long_name: Calendar year
- var_id: Continent
- long_name: Continent name of the monitoring station
- var_id: Country
- long_name: Country name of the monitoring station
- var_id: Day_of_Week_Number
- long_name: Day of week (0=Monday, 6=Sunday)
- units: K
- var_id: Dewpoint_Temperature_2m
- long_name: Dewpoint temperature at 2 m
- units: W/m²
- var_id: Downward_UV_Radiation_At_Surface
- long_name: Downward UV radiation at surface
- units: m/s
- var_id: U_Component_of_Wind_10m
- long_name: East–west wind at 10 m
- units: m/s
- var_id: U_Component_of_Wind_100m
- long_name: East–west wind at 100 m
- var_id: Hour_Number
- long_name: Hour of day (0–23)
- var_id: Week_Number
- long_name: ISO week number (1–53)
- units: m/s
- var_id: Instantaneous_10m_Wind_Gust
- long_name: Instantaneous wind gust at 10 m
- units: hours
- var_id: Timestamp_Local
- long_name: Local timestamp
- units: µg/m³
- var_id: no2_Prediction_Mean
- long_name: Mean of modeled NO₂
- units: µg/m³
- var_id: o3_Prediction_Mean
- long_name: Mean of modeled O₃
- units: µg/m³
- var_id: pm10_Prediction_Mean
- long_name: Mean of modeled PM₁₀
- units: µg/m³
- var_id: pm2p5_Prediction_Mean
- long_name: Mean of modeled PM₂.₅
- units: µg/m³
- var_id: so2_Prediction_Mean
- long_name: Mean of modeled SO₂
- units: µg/m³
- var_id: pm10_Measurement
- long_name: Measured PM₁₀ (particulate matter <10 µm) concentration
- units: µg/m³
- var_id: pm2p5_Measurement
- long_name: Measured PM₂.₅ (particulate matter <2.5 µm) concentration
- units: µg/m³
- var_id: no2_Measurement
- long_name: Measured nitrogen dioxide (NO₂) concentration
- units: µg/m³
- var_id: o3_Measurement
- long_name: Measured ozone (O₃) concentration
- units: µg/m³
- var_id: so2_Measurement
- long_name: Measured sulfur dioxide (SO₂) concentration
- units: µg/m³
- var_id: no2_Prediction_0p5_Quantile
- long_name: Median (50th pct) of modeled NO₂
- units: µg/m³
- var_id: o3_Prediction_0p5_Quantile
- long_name: Median (50th pct) of modeled O₃
- units: µg/m³
- var_id: pm10_Prediction_0p5_Quantile
- long_name: Median (50th pct) of modeled PM₁₀
- units: µg/m³
- var_id: pm2p5_Prediction_0p5_Quantile
- long_name: Median (50th pct) of modeled PM₂.₅
- units: µg/m³
- var_id: so2_Prediction_0p5_Quantile
- long_name: Median (50th pct) of modeled SO₂
- var_id: Month_Number
- long_name: Month (1=January to 12=December)
- units: m/s
- var_id: V_Component_of_Wind_10m
- long_name: North–south wind at 10 m
- units: m/s
- var_id: V_Component_of_Wind_100m
- long_name: North–south wind at 100 m
- units: hPa
- var_id: Surface_Pressure
- long_name: Surface atmospheric pressure
- units: kilotonne
- var_id: Anthropogenic_Emissions_Sum_Sectors_CO
- long_name: Total anthropogenic CO emissions from all sectors
- units: kilotonne
- var_id: Anthropogenic_Emissions_Sum_Sectors_NMVOCs
- long_name: Total anthropogenic NMVOCs emissions from all sectors
- units: kilotonne
- var_id: Anthropogenic_Emissions_Sum_Sectors_NOX
- long_name: Total anthropogenic NOₓ emissions from all sectors
- units: kilotonne
- var_id: Anthropogenic_Emissions_Sum_Sectors_Other_VOCs
- long_name: Total anthropogenic Other VOCs emissions from all sectors
- units: kilotonne
- var_id: Anthropogenic_Emissions_Sum_Sectors_SO2
- long_name: Total anthropogenic SO₂ emissions from all sectors
- units: kilotonne
- var_id: Biogenic_Emissions_Biogenic_CO
- long_name: Total biogenic CO emissions
- units: mol/m²
- var_id: Sentinel_5P_CO
- long_name: Total column CO
- var_id: Sentinel_5P_O3
- units: mol/m²
- long_name: Total column O₃
- var_id: Total_Column_Rain_Water
- units: kg/m²
- long_name: Total column rain water
- units: mol/m²
- var_id: Sentinel_5P_NO2
- long_name: Tropospheric NO₂
- var_id: UTC_Offset
- long_name: UTC offset in hours
- var_id: Global_Model_Grid_ID
- long_name: Unique identifier for each global grid cell
- var_id: Monitoring_Station_ID
- long_name: Unique identifier for each monitoring station
Co-ordinate Variables
- units: degrees_north
- standard_name: latitude
- long_name: Latitude
- var_id: Latitude
- units: degrees_east
- standard_name: longitude
- long_name: Longitude
- var_id: Longitude
- standard_name: time
- units: hours
- var_id: Timestamp_UTC
- long_name: Time (UTC)
- standard_name: time
- units: days
- var_id: Timestamp_UTC
- long_name: Time (UTC, hourly)
Temporal Range
2022-01-01T00:00:00
2022-12-31T00:00:00
Geographic Extent
83.5091° |
||
-179.8750° |
179.8750° |
|
-89.9909° |