Question Matching Challenge
A competition to help researchers better harmonise questionnaire items across different studies with NLP 💬
A competition to help researchers better harmonise questionnaire items across different studies with NLP 💬
Join the official Harmony community Discord server to become part of the Harmony open-source community, ask questions and receive competition announcements. 🚀
The Harmony project aims to develop a tool to help researchers make better use of existing data by harmonising questionnaire items and measures across different studies—potentially in different languages—through an approach based on natural language processing (NLP).
Harmony is a collaboration between researchers at Ulster University, University College London, the Universidade Federal de Santa Maria and Fast Data Science. The Harmony project has been funded by Wellcome as part of the Wellcome Data Prize in Mental Health and by the Economic and Social Research Council (ESRC).
Your challenge is to develop an improved algorithm for matching psychology survey questions that produces similarity ratings more closely aligned with those given by humans psychologists working in the field and that can be integrated into the Harmony tool. This competition will last approximately two months and finish early January.
To get started, take a look at our tutorial notebook:
https://github.com/DoxaAI/harmony-matching-getting-started/blob/main/getting-started.ipynb
Next, if you want to learn about how to fine-tune a pre-trained model for the competition challenge, take a look at our fine-tuning example notebook:
https://github.com/DoxaAI/harmony-matching-getting-started/blob/main/example-finetuning.ipynb
The training dataset consists of pairs of English-language sentences (sentence_1
and sentence_2
), as well as a similarity rating between zero and one hundred (human_similarity
) that is based on responses to a survey of human psychologists run by the Harmony project.
For reference, the dataset also includes the cosine similarity scores currently being produced by the Harmony tool in its default configuration, which are computed by taking the cosine similarity between the sentence embeddings produced by the popular sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 Huggingface model (~471 MB
in size). The Harmony tool currently has a mean absolute error of around 24
against the human-provided similarity scores.
One of the training examples is as follows:
sentence_1 Having difficulty concentrating?
sentence_2 I often feel so mixed up that I have difficulty functioning.
human_similarity 91
cosine_from_harmony 0.443169
The full training dataset is available on GitHub.
When you upload your work to the platform, it will be evaluated against an unseen test set. You will be ranked on the scoreboard based on the mean absolute error between the similarity scores your algorithm produces (in the range [0, 100]
) and the corresponding human-provided scores.
By default, submissions use the CPU evaluation environment, which has 8 GiB of RAM. If you are running low on memory in evaluation, you may wish to try decreasing your batch size. Reach out to us on the Discord server if you have any questions! 😎
The evaluation environment will come with a number of Python packages pre-installed, including the following:
axial-attention==0.6.1, blis==0.7.11, dm-reverb==0.13.0, dm-tree==0.1.8, einops==0.7.0, fastai==2.7.13, keras==2.14.0, kornia==0.7.0, numba==0.58.1, numpy==1.26.2, opt-einsum==3.3.0, pandas==2.1.3, pytorch-lightning==2.1.2, scikit-learn==1.4.0, scikit-video==1.1.11, scipy==1.11.4, tensorflow==2.14.0, tensorflow-addons==0.22.0, torch==2.1.1, torchmetrics==1.2.0, transformers==4.35.2, trove-classifiers==2023.11.22, wwf==0.0.16
If you need a package that is not available in the evaluation environment, ping us on the Discord, and we will see what we can do for you! If you only need a small package (e.g. one that provides a PyTorch model class definition), you might want to consider bundling it as part of your submission instead.
Once you have been through the getting-started materials and have made your first submission, there are a number of different approaches that you can take, from adjusting the hyperparameters used in fine-tuning to train the model for longer (for a greater number of epochs) or with a different learning rate, to trying out other pretrained SentenceTransformers models.
Thomas Wood from Fast Data Science has also kindly made an additional tutorial for the challenge.
Harmony runs as a web application that is open for public use, so it is important that the footprint of your model is as small as possible to maximise performance.
Harmony currently makes use of open-source models available on HuggingFace, so you may wish to base your solution on one of those models to make it easier to add your model to the Harmony tool.
We suggest also stress-testing your model by adding it to your own fork of the Harmony Python library at https://github.com/harmonydata/harmony.
This competition will have the following prizes as Amazon vouchers: