·Finished·

Question Matching Challenge

A competition to help researchers better harmonise questionnaire items across different studies with NLP 💬

Harmony Matching Algorithm Improvement Challenge

Join the official Harmony community Discord server to become part of the Harmony open-source community, ask questions and receive competition announcements. 🚀

Discord | Twitter | LinkedIn | Website

‼️ While this competition has concluded, submissions for this challenge will temporarily reopen as part of the Harmony workshop at AI UK 2025. You're more than welcome to give submitting a go! ‼️

The Harmony project aims to develop a tool to help researchers make better use of existing data by harmonising questionnaire items and measures across different studies—potentially in different languages—through an approach based on natural language processing (NLP).

Harmony is a collaboration between researchers at Ulster University, University College London, the Universidade Federal de Santa Maria and Fast Data Science. The Harmony project has been funded by Wellcome as part of the Wellcome Data Prize in Mental Health and by the Economic and Social Research Council (ESRC).

Challenge 💡

Your challenge is to develop an improved algorithm for matching psychology survey questions that produces similarity ratings more closely aligned with those given by humans psychologists working in the field and that can be integrated into the Harmony tool. This competition will last approximately two months and finish early January.

To get started, take a look at our tutorial notebook:

https://github.com/DoxaAI/harmony-matching-getting-started/blob/main/getting-started.ipynb

Open in Google Colab 📒

Next, if you want to learn about how to fine-tune a pre-trained model for the competition challenge, take a look at our fine-tuning example notebook:

https://github.com/DoxaAI/harmony-matching-getting-started/blob/main/example-finetuning.ipynb

Open in Google Colab 📒

Dataset 💻

The training dataset consists of pairs of English-language sentences (sentence_1 and sentence_2), as well as a similarity rating between zero and one hundred (human_similarity) that is based on responses to a survey of human psychologists run by the Harmony project.

For reference, the dataset also includes the cosine similarity scores currently being produced by the Harmony tool in its default configuration, which are computed by taking the cosine similarity between the sentence embeddings produced by the popular sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 Huggingface model (~471 MB in size). The Harmony tool currently has a mean absolute error of around 24 against the human-provided similarity scores.

One of the training examples is as follows:

sentence_1                                         Having difficulty concentrating?
sentence_2             I often feel so mixed up that I have difficulty functioning.
human_similarity                                                                 91
cosine_from_harmony                                                        0.443169

The full training dataset is available on GitHub.

Evaluation 📐

When you upload your work to the platform, it will be evaluated against an unseen test set. You will be ranked on the scoreboard based on the mean absolute error between the similarity scores your algorithm produces (in the range [0, 100]) and the corresponding human-provided scores.

The scoreboard is based on your latest submission only. If your submission crashes during evaluation on the test set, you will not appear. The winners of the competition will be determined based on the final scoreboard. ‼️

By default, submissions use the CPU evaluation environment, which has 8 GiB of RAM. If you are running low on memory in evaluation, you may wish to try decreasing your batch size. Reach out to us on the Discord server if you have any questions! 😎

The evaluation environment will come with a number of Python packages pre-installed, including the following:

axial-attention==0.6.1, blis==0.7.11, dm-reverb==0.13.0, dm-tree==0.1.8, einops==0.7.0, fastai==2.7.13, keras==2.14.0, kornia==0.7.0, numba==0.58.1, numpy==1.26.2, opt-einsum==3.3.0, pandas==2.1.3, pytorch-lightning==2.1.2, scikit-learn==1.4.0, scikit-video==1.1.11, scipy==1.11.4, tensorflow==2.14.0, tensorflow-addons==0.22.0,  torch==2.1.1, torchmetrics==1.2.0, transformers==4.35.2, trove-classifiers==2023.11.22, wwf==0.0.16

If you need a package that is not available in the evaluation environment, ping us on the Discord, and we will see what we can do for you! If you only need a small package (e.g. one that provides a PyTorch model class definition), you might want to consider bundling it as part of your submission instead.

Next steps 🚨

Once you have been through the getting-started materials and have made your first submission, there are a number of different approaches that you can take, from adjusting the hyperparameters used in fine-tuning to train the model for longer (for a greater number of epochs) or with a different learning rate, to trying out other pretrained SentenceTransformers models.

Thomas Wood from Fast Data Science has also kindly made an additional tutorial for the challenge.

Choosing a robust model that is usable in a live production application

Harmony runs as a web application that is open for public use, so it is important that the footprint of your model is as small as possible to maximise performance.

Harmony currently makes use of open-source models available on HuggingFace, so you may wish to base your solution on one of those models to make it easier to add your model to the Harmony tool.

We suggest also stress-testing your model by adding it to your own fork of the Harmony Python library at https://github.com/harmonydata/harmony.

Winners 🏆

This competition ended on 10^th January 2025 after having evaluated 276 submissions from 26 participants, and the prizes have been awarded.

Congratulations to the competition's winners:

First place: José Inés Martínez Berard (@oreug) with MAE=20.348, who won a £500 Amazon voucher 🏅
Second place: Rafi Ahmed Riyaz Ahmed Patel (@rafa) with MAE=20.544, who won a £250 Amazon voucher 🥈

The Harmony team will now work with the winners to explore the integration of their solutions into the Harmony tool, which aims to help researchers make better use of existing data by harmonising questionnaire items and measures across different studies using natural language processing.

If you found this challenge interesting, check out the Harmony Questionnaire Parsing Challenge (a token classification competition), which is open until 28th March.