ClimateHack.AI 2023: Qualifiers
An international student machine learning competition to develop state-of-the-art solar PV forecasting models 🌍
An international student machine learning competition to develop state-of-the-art solar PV forecasting models 🌍
Join the official ClimateHack.AI Discord server to become part of an international community of AI enthusiasts and receive competition announcements. 🚀
Your challenge—should you choose to accept it—is to develop a cutting-edge machine learning model for predicting near-term site-level solar power production using satellite imagery, weather forecasts and air quality data better than the current state of the art before submissions close on Friday 8th March 2024.
Your contributions could directly help cut carbon emissions in Great Britain by up to 100 kilotonnes per year by helping to advance Open Climate Fix's solar PV nowcasting research work for the National Grid Electricity System Operator.
The top three eligible individual participants from each university at the end of the qualifying round will be invited to present as a team at in-person finals at University College London and Harvard University in April 2024.
While submissions are individual in the qualifying round, you are free to work with others, particularly if you seek to be on your university's team. We encourage collaboration; after all, we are ultimately one large community looking to tackle climate change with AI. 😎
To get started with this challenge, read through the information posted here and then check out the starter resources on GitHub! You will be guided through loading and examining the data, training your first model using PV and HRV data, and submitting your model to the platform for evaluation.
https://github.com/climatehackai/getting-started-2023
The Jupyter notebooks that form part of the starter resources can be run online using Google Colab, which can be a great way to get started with the challenge.
https://github.com/climatehackai/getting-started-2023/blob/main/1_data.ipynb
https://github.com/climatehackai/getting-started-2023/blob/main/2_training.ipynb
In order to account for the variability of solar photovoltaic (PV) power production, the National Grid Electricity System Operator (ESO) schedules a spinning reserve of natural gas generators, which can take hours to ramp up from a cold start, to operate below their maximum capacity so that there is headroom on the grid that can ramp up rapidly to make up for any shortfalls.
Not only is this expensive, but it contributes to ~100 kilotonnes in excess carbon emissions each year in Great Britain alone. As such, better solar PV forecasting techniques would allow the National Grid ESO to cut their spinning reserve, thereby reducing emissions and helping to improve the deployability of cheaper, greener solar power.
Cloud coverage (especially in areas with variable meteological environments such as the United Kingdom) can have a outsized impact on solar photovoltaic power yields. By incorporating satellte imagery into near-term forecasting (or "nowcasting") models for solar power generation, it is possible to significantly improve the minute-to-minute accuracy of machine learning-based solar PV models beyond relying on numerical weather predictions alone.
Take a look at this video animation from Open Climate Fix to see this effect in practice:
Another under-researched source of variability could be the presence of aerosols—suspended particulates that can affect the path of sunlight—at different altitudes in the atmosphere. As such, we are making 259 GB of air quality data available (related to dust, NO2, ozone and more), and you will have the option of integrating this data into the models you train. If this approach proves successful, this could be a key research contribution to come out of this competition!
If you are interested in Open Climate Fix's work so far on solar photovoltaic nowcasting, check out their research report.
The aim: develop a model for site-level PV forecasting over the next four hours that is both accurate and performant.
Available features of which you can make use:
[12]
) along with associated metadata (latitude, longitude, orientation, tilt and installed capacity in kilowatts)[12, 128, 128]
)[12, 128, 128, 11]
)[T - 1h, T, T + 1h, T + 2h, T + 3h, T + 4h]
centred on the site ([6, 128, 128]
)[6, 8, 128, 128]
)Note that since the weather and air quality forecasts are hourly, T
represents the start of each hour, so as an example, when generating predictions for 11:20–15:15, you will be given weather and air quality forecast data for timesteps 10:00, 11:00, 12:00, 13:00, 14:00 and 15:00.
Target: the expected site-level solar PV power to be generated over the next four hours as a proportion of installed capacity ([48]
)
Evaluation metric: mean absolute error (over all four hours)
With so many different data sources to choose from, we encourage you to be creative and see how well you can do without necessarily using all of the data sources! Lighter models are ultimately easier to deploy in production.
Your submission will be evaluated on times between sunrise and sunset.
The HRV satellite imagery data will seem very familiar to participants of ClimateHack.AI 2022, which had a future satellite imagery generation challenge based on that channel over the same year range.
For you to be eligible to progress to the finals, your submission should achieve a mean absolute error (4h) that is less than 0.15000
, which approximately corresponds to the naïve approach of assuming solar power generation remains constant from the last known value over the four-hour forecast window.
To get started, it is worth spending some time getting to know the different data sources available. Once you have decided what data you wish to run your initial experiments with (e.g. non-HRV satellite imagery and the weather features for cloud coverage), you can move onto training some machine learning models!
Here are some ideas that might inspire you:
You can select the following data variables in evaluation:
time, latitude, longitude, orientation, tilt, kwp, pv
Here, time
is a timestamp corresponding to the very start of the forecast window for which you are making four hours of solar PV predictions. If you want to use the time
data variable, you will need to first cast it to a datetime64[ns]
, e.g. in run.py
for pv, hrv, time in self.batch(
features, variables=["pv", "hrv", "time"], batch_size=32
):
datetimes = time.view("datetime64[ns]")
hrv, nonhrv
alb_rad, aswdifd_s, aswdir_s, cape_con, clch, clcl, clcm, clct, h_snow, omega_1000, omega_700, omega_850, omega_950, pmsl, relhum_2m, runoff_g, runoff_s, t_2m, t_500, t_850, t_950, t_g, td_2m, tot_prec, u_10m, u_50, u_500, u_850, u_950, v_10m, v_50, v_500, v_850, v_950, vmax_10m, w_snow, ww, z0
co_conc, dust, nh3_conc, nmvoc_conc, no2_conc, no_conc, o3_conc, pans_conc, pm10_conc, pm2p5_conc, pmwf_conc, sia_conc, so2_conc
By default, submissions use the CPU evaluation environment, which has 8 GiB RAM. If you are running low on memory in evaluation, you may wish to try decreasing your batch size. A GPU evaluation environment, which has 25 GiB RAM and a Nvidia T4 GPU with 16 GB of VRAM is also available. Reach out to us on the Discord server if you have any questions! 😎
We have a number of Python packages pre-installed within the evaluation environment, which includes the following:
antialiased-cnns==0.3, axial-attention==0.6.1, blis==0.7.11, cartopy==0.22.0, dm-reverb==0.13.0, dm-tree==0.1.8, dulwich==0.21.6, einops==0.7.0, fastai==2.7.13, fsspec==2023.10.0, keras==2.14.0, kornia==0.7.0, numba==0.58.1, numpy==1.26.2, opencv-contrib-python-headless==4.8.1.78, opt-einsum==3.3.0, pandas==2.1.3, perceiver-pytorch==0.8.8, pytorch-lightning==2.1.2, scikit-learn==1.4.0, scikit-video==1.1.11, scipy==1.11.4, tensorflow==2.14.0, tensorflow-addons==0.22.0, tensorflow-estimator==2.14.0, tensorflow-io-gcs-filesystem==0.34.0, tensorflow-probability==0.22.1, tf-agents==0.18.0, timm==0.9.12, torch==2.1.1+cpu, torchaudio==2.1.1+cpu, torchmetrics==1.2.0, torchvision==0.16.1+cpu, transformers==4.35.2, trove-classifiers==2023.11.22, wwf==0.0.16
If you need a package that is not available in the evaluation environment, ping us on Discord, and we will see what we can do for you! If you only need a small package (e.g. one that provides a PyTorch model class definition), you might want to consider bundling it as part of your submission instead.
As part of the getting-started materials, we have included a local validator along with a small 32-sample dataset in the same format as used in the evaluation environment that you can use to test whether your submission is likely to work when uploaded to the platform. You will still need to create your own validation set to properly evaluate your submissions locally.
The top three eligible participants from each university, along with a club representative, will be invited to present at in-person finals hosted at University College London and Harvard University in early April 2024. The winning teams, as selected by a panel of judges, will receive the following (to be split among team members):
We could not run ClimateHack.AI 2023 without the generous support of Newcross Healthcare, UCL Grand Challenges (Climate Change), PGIM Real Estate and RealAssetX, Climate X, and the competition's challenge provider, Open Climate Fix, to whose solar photovoltaic nowcasting work ClimateHack.AI 2023 contributes.