·Finished·

ClimateHack.AI 2023: Qualifiers

An international student machine learning competition to develop state-of-the-art solar PV forecasting models 🌍

ClimateHack.AI is back for 2023-24 🌍

Join the official ClimateHack.AI Discord server to become part of an international community of AI enthusiasts and receive competition announcements. 🚀

Your challenge—should you choose to accept it—is to develop a cutting-edge machine learning model for predicting near-term site-level solar power production using satellite imagery, weather forecasts and air quality data better than the current state of the art before submissions close on Friday 8th March 2024.

Your contributions could directly help cut carbon emissions in Great Britain by up to 100 kilotonnes per year by helping to advance Open Climate Fix's solar PV nowcasting research work for the National Grid Electricity System Operator.

The top three eligible individual participants from each university at the end of the qualifying round will be invited to present as a team at in-person finals at University College London and Harvard University in April 2024.

While submissions are individual in the qualifying round, you are free to work with others, particularly if you seek to be on your university's team. We encourage collaboration; after all, we are ultimately one large community looking to tackle climate change with AI. 😎

Getting Started 📝

To get started with this challenge, read through the information posted here and then check out the starter resources on GitHub! You will be guided through loading and examining the data, training your first model using PV and HRV data, and submitting your model to the platform for evaluation.

https://github.com/climatehackai/getting-started-2023

Download materials 📒

Google Colab

The Jupyter notebooks that form part of the starter resources can be run online using Google Colab, which can be a great way to get started with the challenge.

Exploring the data

https://github.com/climatehackai/getting-started-2023/blob/main/1_data.ipynb

Open in Google Colab 📒

Training your first model

https://github.com/climatehackai/getting-started-2023/blob/main/2_training.ipynb

Open in Google Colab 📒

Motivation 💪

In order to account for the variability of solar photovoltaic (PV) power production, the National Grid Electricity System Operator (ESO) schedules a spinning reserve of natural gas generators, which can take hours to ramp up from a cold start, to operate below their maximum capacity so that there is headroom on the grid that can ramp up rapidly to make up for any shortfalls.

Not only is this expensive, but it contributes to ~100 kilotonnes in excess carbon emissions each year in Great Britain alone. As such, better solar PV forecasting techniques would allow the National Grid ESO to cut their spinning reserve, thereby reducing emissions and helping to improve the deployability of cheaper, greener solar power.

Cloud coverage (especially in areas with variable meteorological environments such as the United Kingdom) can have a outsized impact on solar photovoltaic power yields. By incorporating satellte imagery into near-term forecasting (or "nowcasting") models for solar power generation, it is possible to significantly improve the minute-to-minute accuracy of machine learning-based solar PV models beyond relying on numerical weather predictions alone.

Take a look at this video animation from Open Climate Fix to see this effect in practice:

Another under-researched source of variability could be the presence of aerosols—suspended particulates that can affect the path of sunlight—at different altitudes in the atmosphere. As such, we are making 259 GB of air quality data available (related to dust, NO₂, ozone and more), and you will have the option of integrating this data into the models you train. If this approach proves successful, this could be a key research contribution to come out of this competition!

If you are interested in Open Climate Fix's work so far on solar photovoltaic nowcasting, check out their research report.

Your machine learning challenge ⛳

The aim: develop a model for site-level PV forecasting over the next four hours that is both accurate and performant.

Available features of which you can make use:

The time of the start of the prediction window (which can be at any five-minute interval within an hour, e.g. 11:25)
Site-level UK solar power generation data over the previous hour (as a proportion of installed capacity) at a 5-minute temporal resolution ([12]) along with associated metadata (latitude, longitude, orientation, tilt and installed capacity in kilowatts)
High-resolution visible (HRV) EUMETSAT satellite imagery at a 5-minute temporal resolution over the previous hour centred on the site ([12, 128, 128])
Non-HRV EUMETSAT satellite imagery (11 channels) at a 5-minute temporal resolution over the previous hour centred on the site ([12, 128, 128, 11])
Hourly DWD ICON-EU weather forecasts available for 38 data variables at timesteps [T - 1h, T, T + 1h, T + 2h, T + 3h, T + 4h] centred on the site ([6, 128, 128])
ECMWF CAMS air quality forecasts available for 13 data variables at 8 different altitudes for the same timesteps as the weather data centred on the site ([6, 8, 128, 128])

Note that since the weather and air quality forecasts are hourly, T represents the start of each hour, so as an example, when generating predictions for 11:20–15:15, you will be given weather and air quality forecast data for timesteps 10:00, 11:00, 12:00, 13:00, 14:00 and 15:00.

Target: the expected site-level solar PV power to be generated over the next four hours as a proportion of installed capacity ([48])

Evaluation metric: mean absolute error (over all four hours)

With so many different data sources to choose from, we encourage you to be creative and see how well you can do without necessarily using all of the data sources! Lighter models are ultimately easier to deploy in production.

Your submission will be evaluated on times between sunrise and sunset.

The HRV satellite imagery data will seem very familiar to participants of ClimateHack.AI 2022, which had a future satellite imagery generation challenge based on that channel over the same year range.

For you to be eligible to progress to the finals, your submission should achieve a mean absolute error (4h) that is less than 0.15000, which approximately corresponds to the naïve approach of assuming solar power generation remains constant from the last known value over the four-hour forecast window.

Ideas 💡

To get started, it is worth spending some time getting to know the different data sources available. Once you have decided what data you wish to run your initial experiments with (e.g. non-HRV satellite imagery and the weather features for cloud coverage), you can move onto training some machine learning models!

Here are some ideas that might inspire you:

Convolutional neural networks (based on Conv2d and Conv3d layers)
Fine-tuning pre-trained computer vision models (such as EfficientNets and ResNets)
Transformers (e.g. ViT, Perceiver, PerceiverIO, CaiT)
Taking inspiration from ML-based weather forecasting models (e.g. ConvLSTM, ConvGRU & TrajGRU, MetNet, MetNet-2, MetNet-3, DGMR)
Recurrent neural networks (e.g. PredRNN)
Graph neural networks (e.g. GraphCast)
Generative adversarial networks (e.g. as have been used for precipitation nowcasting)
Generating future satellite imagery with optical flow as an input feature
Building on Open Climate Fix's work

Evaluation ✨

Test features

You can select the following data variables in evaluation:

Solar PV

time, latitude, longitude, orientation, tilt, kwp, pv

Here, time is a timestamp corresponding to the very start of the forecast window for which you are making four hours of solar PV predictions. If you want to use the time data variable, you will need to first cast it to a datetime64[ns], e.g. in run.py

for pv, hrv, time in self.batch(
    features, variables=["pv", "hrv", "time"], batch_size=32
):
    datetimes = time.view("datetime64[ns]")

Python 🐍

Satellite imagery

hrv, nonhrv

Weather forecasts

alb_rad, aswdifd_s, aswdir_s, cape_con, clch, clcl, clcm, clct, h_snow, omega_1000, omega_700, omega_850, omega_950, pmsl, relhum_2m, runoff_g, runoff_s, t_2m, t_500, t_850, t_950, t_g, td_2m, tot_prec, u_10m, u_50, u_500, u_850, u_950, v_10m, v_50, v_500, v_850, v_950, vmax_10m, w_snow, ww, z0

Air quality forecasts

co_conc, dust, nh3_conc, nmvoc_conc, no2_conc, no_conc, o3_conc, pans_conc, pm10_conc, pm2p5_conc, pmwf_conc, sia_conc, so2_conc

Evaluation Environment

By default, submissions use the CPU evaluation environment, which has 8 GiB RAM. If you are running low on memory in evaluation, you may wish to try decreasing your batch size. A GPU evaluation environment, which has 25 GiB RAM and a Nvidia T4 GPU with 16 GB of VRAM is also available. Reach out to us on the Discord server if you have any questions! 😎

We have a number of Python packages pre-installed within the evaluation environment, which includes the following:

antialiased-cnns==0.3, axial-attention==0.6.1, blis==0.7.11, cartopy==0.22.0, dm-reverb==0.13.0, dm-tree==0.1.8, dulwich==0.21.6, einops==0.7.0, fastai==2.7.13, fsspec==2023.10.0, keras==2.14.0, kornia==0.7.0, numba==0.58.1, numpy==1.26.2, opencv-contrib-python-headless==4.8.1.78, opt-einsum==3.3.0, pandas==2.1.3, perceiver-pytorch==0.8.8, pytorch-lightning==2.1.2, scikit-learn==1.4.0, scikit-video==1.1.11, scipy==1.11.4, tensorflow==2.14.0, tensorflow-addons==0.22.0, tensorflow-estimator==2.14.0, tensorflow-io-gcs-filesystem==0.34.0, tensorflow-probability==0.22.1, tf-agents==0.18.0, timm==0.9.12, torch==2.1.1+cpu, torchaudio==2.1.1+cpu, torchmetrics==1.2.0, torchvision==0.16.1+cpu, transformers==4.35.2, trove-classifiers==2023.11.22, wwf==0.0.16

If you need a package that is not available in the evaluation environment, ping us on Discord, and we will see what we can do for you! If you only need a small package (e.g. one that provides a PyTorch model class definition), you might want to consider bundling it as part of your submission instead.

Testing your submission before you upload

As part of the getting-started materials, we have included a local validator along with a small 32-sample dataset in the same format as used in the evaluation environment that you can use to test whether your submission is likely to work when uploaded to the platform. You will still need to create your own validation set to properly evaluate your submissions locally.

Prizes 🏆

The top three eligible participants from each university, along with a club representative, will be invited to present at in-person finals hosted at University College London and Harvard University in early April 2024. The winning teams, as selected by a panel of judges, will receive the following (to be split among team members):

First-place team: £10k / ~$12.7k US / ~$17k CAD
Second-place team: £6k / ~$7.6k US / ~$10k CAD
Third-place team: £4k / ~$5k US / ~$7k CAD

We could not run ClimateHack.AI 2023 without the generous support of Newcross Healthcare, UCL Grand Challenges (Climate Change), PGIM Real Estate and RealAssetX, Climate X, and the competition's challenge provider, Open Climate Fix, to whose solar photovoltaic nowcasting work ClimateHack.AI 2023 contributes.