Alejandro Mosquera López is an online safety expert and Kaggle Grandmaster working in cybersecurity. His main research interests are Trustworthy AI and NLP. ORCID iD icon https://orcid.org/0000-0002-6020-3569

Saturday, February 17, 2024

Detecting LLM hallucinations and overgeneration mistakes @ SemEval 2024

  The modern NLG landscape is plagued by two interlinked problems: On the one hand, our current neural models have a propensity to produce inaccurate but fluent outputs; on the other hand, our metrics are most apt at describing fluency, rather than correctness. This leads neural networks to “hallucinate”, e.g., produce fluent but incorrect outputs that we currently struggle to detect automatically. For many NLG applications, the correctness of an output is however mission-critical. For instance, producing a plausible-sounding translation that is inconsistent with the source text puts in jeopardy the usefulness of a machine translation pipeline. For this reason, SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes aims to foster the growing interest in this topic in the community.

In this competition participants were asked to perform binary classification to identify cases of fluent overgeneration hallucinations in two different setups: model-aware and model-agnostic tracks. In order to do this, they had to detect grammatically sound outputs which contain incorrect or unsupported semantic information, inconsistent with the source input, with or without having access to the model that produced the output.

The evaluated approach using a simple linear combination of reference models ranked 3rd in the model-agnostic track with a 0.826 accuracy.