Alejandro Mosquera López is an online safety expert and Kaggle Grandmaster working in cybersecurity. His main research interests are Trustworthy AI and NLP. ORCID iD icon https://orcid.org/0000-0002-6020-3569

Thursday, February 16, 2023

Pretrained Models with Adversarial Training for Online Sexism Detection @ SemEval 2023

         Abstract 

Adversarial training can provide neural networks with significantly improved resistance to adversarial attacks, thus improving model robustness. However, a major drawback of many existing adversarial training workflows is the computational cost and extra processing time when using data augmentation techniques. This post explores the application of embedding perturbations via the fast gradient method (FGM) when finetuning large language models (LLMs) to short text classification tasks. This adversarial training approach has been evaluated as part of the first sub-task of SemEval 2023-Task 10, focused on explainable detection of sexism in social networks (EDOS). Empirical results show that adversarially finetuned models with FGM had on average a 25% longer training time and 0.2% higher F1 than their respective baselines.