DOXA AI
CompetitionsSign in
DOXA AI
Welcome to the home of engaging AI competitions 😎
Resources
About us 👨‍👩‍👧‍👦Company blog 📰Competitions 🏆DOXA CLI 💾
ContactTermsPrivacy PolicyHost a competition 🌍
Copyright © 2025 DOXA AI 🚀
ClimateHack.AIClimateHack.AI
·Finished·
142
Sign in or sign up to participate

ClimateHack.AI 2023: Qualifiers

An international student machine learning competition to develop state-of-the-art solar PV forecasting models 🌍

OverviewScoreboardDataRules
Loading...
Sections
Solar PV Generation dataEUMETSAT Satellite Imagery (v4)Hourly DWD ICON-EU weather forecastsECMWF CAMS Air Quality forecastsStill need more data?

Data

All of the data – including the solar PV, satellite imagery, numerical weather prediction and aerosol data – for this competition can be accessed on Hugging Face. 🤗

Explore and download the data.

There is a large volume of data available for this competition (600 GB in total!), so we suggest that you first start by creating some smaller scale experiments, for example, that only use a month of data, before scaling up.

You do not have to use all of the data, or even all of the data sources; it is more impressive to have a smaller, more performant model that only uses HRV satellite imagery, weather forecasts and PV data but nevertheless matches the performance of a significantly larger model that uses all of the data sources.

Solar PV Generation data

  • Compressed solar PV dataset size: 1.31 GB

This is a dataset collected from 1311 live PV systems in the UK containing solar PV generation data from 2018 to 2021 at temporal resolutions ranging from 2 minutes to 30 minutes. For ClimateHack.AI 2023, we are using 5-minutely solar PV generation data from 993 of these sites across Great Britain.

The original dataset hosted by Open Climate Fix in 5min.parquet contains generation_wh values for the amount of energy generated in a 5-minute period in watt–hours. In the ClimateHack.AI 2023 version of this dataset, this has been transformed into the equivalent average power (in watts) as a proportion of installed capacity (also in watts).

Site metadata

  • latitude: the latitude of the solar site
  • longitude: the longitude of the solar site
  • orientation: the orientation of the solar site in degrees
  • tilt: the tilt of the solar site in degrees
  • kwp: the installed generation capacity of the solar site in kilowatts

EUMETSAT Satellite Imagery (v4)

  • Compressed HRV dataset size: 73 GB
  • Compressed non-HRV dataset size: 189 GB

This is satellite imagery originally from the EUMETSAT Spinning Enhanced Visible and InfraRed Imager (SEVIRI) rapid scanning service (RSS). It is composed of 12 channels: a single high-resolution visible (HRV) channel; and 11 non-HRV channels of visible, infrared and water vapour satellite imagery (IR_016, IR_039, IR_087, IR_097, IR_108, IR_120, IR_134, VIS006, VIS008, WV_062 and WV_073).

Values in this dataset have been scaled to be between zero and one. Unlike the other datasets, which use geodetic coordinates, this dataset is based on a geostationary coordinate grid. Cartopy provides tools to convert between different coordinate systems.

The HRV data has a spatial resolution of 1km (per pixel) and the non-HRV data has a spatial resolution of 3km (although these can very slightly due to the curvature of the Earth). More technical information is available from EUMETSAT (including the data format description)

Hourly DWD ICON-EU weather forecasts

  • Compressed weather dataset size: 78 GB

This numerical weather prediction (NWP) dataset comes from the DWD ICON-EU model.

It contains the following data variables for various altitudes:

  • alb_rad: Surface albedo (%)
  • aswdifd_s: Downward diffusive short wave radiation flux at surface (mean over forecast time) (W / m2)
  • aswdir_s: Downward direct short wave radiation flux at surface (mean over forecast time) (W / m2)
  • cape_con: Convective available potential energy (J / kg)
  • clch: Cloud cover at high levels (0 - 400 hPa) (%)
  • clcl: Cloud cover at low levels (800 hPa - Soil) (%)
  • clcm: Cloud cover at mid levels (400 - 800 hPa) (%)
  • clct: Total cloud cover (%)
  • h_snow: Snow depth (m)
  • omega: Vertical velocity of air pressure (Pa / s) – available for four different pressure levels
    • omega_1000 for 1000 hPa (110 m)
    • omega_950 for 950 hPa (500 m)
    • omega_850 for 850 hPa (1,500 m)
    • omega_700 for 700 hPa (3,000 m)
  • pmsl: Pressure reduced to mean sea level (Pa)
  • relhum_2m: Relative humidity at 2 metres above ground (%)
  • runoff_g: Water runoff (kg / m2)
  • runoff_s: Water runoff from soil (kg / m2)
  • t: Air temperature (K) – available for three different pressure levels
    • t_500 for 500 hPa (5,600m)
    • t_850 for 850 hPa (1,500m)
    • t_950 for 950 hPa (500m)
  • t_2m: Air temperature at 2 metres above ground (K)
  • t_g: Air temperature at ground level (K)
  • td_2m: Dewpoint temperature at 2 metres above ground (K)
  • tot_prec: Total precipitation (kg / m2)
  • u: U component of wind (m / s) – available for four different pressure levels
    • u_50 for 50 hPa (19,300m)
    • u_500 for 500 hPa (5,600m)
    • u_850 for 850 hPa (1,500m)
    • u_950 for 950 hPa (500m)
  • u_10m: U component of wind at 10 metres above ground (m / s)
  • v: V component of wind (m / s) – available for four different pressure levels
    • v_50 for 50 hPa (19,300m)
    • v_500 for 500 hPa (5,600m)
    • v_850 for 850 hPa (1,500m)
    • v_950 for 950 hPa (500m)
  • v_10m: V component of wind at 10 metres above ground (m / s)
  • vmax_10m: Maximum wind at 10 metres above ground (m / s)
  • w_snow: Snow depth water equivalent (kg / m2)
  • ww: Weather interpretation (WMO) (numerical values)
  • z0: Surface roughness (m)

The ClimateHack.AI 2023 version of this dataset has been cropped to Great Britain and always takes the latest available predictions (accounting for the three-hour ICON-EU model initialisation time) between 4am and 10pm of each day.

ECMWF CAMS Air Quality forecasts

Compressed air quality forecast dataset size: 259 GB

This dataset contains physicochemical test forecast data pertaining to different aerosol types at the following altitudes (in metres): 0.0, 50.0, 250.0, 500.0, 1000.0, 2000.0, 3000.0 and 5000.0.

The following data variables are available in this dataset:

  • co_conc: Mass concentration of carbon monoxide in air (µg / m3)
  • dust: Mass concentration of dust in air (µg / m3)
  • nh3_conc: Mass concentration of ammonia in air (µg / m3)
  • nmvoc_conc: Mass concentration of non-methane volatile organic compounds expressed as carbon in air (µg / m3)
  • no2_conc: Mass concentration of nitrogen dioxide in air (µg / m3)
  • no_conc: Mass concentration of nitrogen monoxide in the air (µg / m3)
  • o3_conc: Mass concentration of ozone in air (µg / m3)
  • pans_conc: Mass concentration of acyl peroxy nitrates in air (µg / m3)
  • pm10_conc: Mass concentration of PM10 ambient aerosol in air (µg / m3)
  • pm2p5_conc: Mass concentration of PM2.5 ambient aerosol in air (µg / m3)
  • pmwf_conc: Mass concentration of PM10 aerosol from wildfires in air (µg / m3)
  • sia_conc: Mass concentration of secondary inorganic aerosol in air (µg / m3)
  • so2_conc: Mass concentration of sulfur dioxide in air (µg / m3)

The ClimateHack.AI 2023 version of this dataset has been cropped to Great Britain and always takes the latest available forecasts between 4am and 10pm of each day.

Still need more data?

Depending on your approach, you may need more training data beyond the 600 GB of data that we have made available specifically for ClimateHack.AI 2023. Fortunately, some of the original datasets contain data for years other than 2020 and 2021. If this would be of use to you, feel free to use the original data published by Open Climate Fix. You may just have to crop the data to match the same territorial extent as the solar PV sites and perform some additional preprocessing. You may only use data between 2018 and 2021.