ClimateHack.AI 2023: Qualifiers
An international student machine learning competition to develop state-of-the-art solar PV forecasting models 🌍
An international student machine learning competition to develop state-of-the-art solar PV forecasting models 🌍
All of the data – including the solar PV, satellite imagery, numerical weather prediction and aerosol data – for this competition can be accessed on Hugging Face. 🤗
Explore and download the data.
There is a large volume of data available for this competition (600 GB in total!), so we suggest that you first start by creating some smaller scale experiments, for example, that only use a month of data, before scaling up.
You do not have to use all of the data, or even all of the data sources; it is more impressive to have a smaller, more performant model that only uses HRV satellite imagery, weather forecasts and PV data but nevertheless matches the performance of a significantly larger model that uses all of the data sources.
This is a dataset collected from 1311 live PV systems in the UK containing solar PV generation data from 2018 to 2021 at temporal resolutions ranging from 2 minutes to 30 minutes. For ClimateHack.AI 2023, we are using 5-minutely solar PV generation data from 993 of these sites across Great Britain.
The original dataset hosted by Open Climate Fix in 5min.parquet
contains generation_wh
values for the amount of energy generated in a 5-minute period in watt–hours. In the ClimateHack.AI 2023 version of this dataset, this has been transformed into the equivalent average power (in watts) as a proportion of installed capacity (also in watts).
This is satellite imagery originally from the EUMETSAT Spinning Enhanced Visible and InfraRed Imager (SEVIRI) rapid scanning service (RSS). It is composed of 12 channels: a single high-resolution visible (HRV) channel; and 11 non-HRV channels of visible, infrared and water vapour satellite imagery (IR_016
, IR_039
, IR_087
, IR_097
, IR_108
, IR_120
, IR_134
, VIS006
, VIS008
, WV_062
and WV_073
).
Values in this dataset have been scaled to be between zero and one. Unlike the other datasets, which use geodetic coordinates, this dataset is based on a geostationary coordinate grid. Cartopy provides tools to convert between different coordinate systems.
The HRV data has a spatial resolution of 1km (per pixel) and the non-HRV data has a spatial resolution of 3km (although these can very slightly due to the curvature of the Earth). More technical information is available from EUMETSAT (including the data format description)
This numerical weather prediction (NWP) dataset comes from the DWD ICON-EU model.
It contains the following data variables for various altitudes:
The ClimateHack.AI 2023 version of this dataset has been cropped to Great Britain and always takes the latest available predictions (accounting for the three-hour ICON-EU model initialisation time) between 4am and 10pm of each day.
Compressed air quality forecast dataset size: 259 GB
This dataset contains physicochemical test forecast data pertaining to different aerosol types at the following altitudes (in metres): 0.0, 50.0, 250.0, 500.0, 1000.0, 2000.0, 3000.0 and 5000.0.
The following data variables are available in this dataset:
The ClimateHack.AI 2023 version of this dataset has been cropped to Great Britain and always takes the latest available forecasts between 4am and 10pm of each day.
Depending on your approach, you may need more training data beyond the 600 GB of data that we have made available specifically for ClimateHack.AI 2023. Fortunately, some of the original datasets contain data for years other than 2020 and 2021. If this would be of use to you, feel free to use the original data published by Open Climate Fix. You may just have to crop the data to match the same territorial extent as the solar PV sites and perform some additional preprocessing. You may only use data between 2018 and 2021.