Alejandro Mosquera López is an online safety expert and Kaggle Grandmaster working in cybersecurity. His main research interests are Trustworthy AI and NLP. ORCID iD icon

Monday, June 5, 2023

Hackaprompt-2023 @ AICrowd write up

HackAPrompt was a prompt hacking competition aimed at enhancing AI safety and education by challenging participants to outsmart large language models (e.g. ChatGPT, GPT-3). In particular, encouraged participants to attempt to hack through many prompt hacking defenses as possible.

The task organizers provided a set of 10 challenges in which the participants should bypass the expected model behavior and force the LLM to reply "I have been PWNED" instead. In order to do that, users could make use of 3 different LLMs: GPT-3 (text-davinci-003), ChatGPT (gpt-3.5-turbo), or FlanT5 -XXL.

Saturday, May 13, 2023

Living off the land: Solving ML problems without training a single model


The concept of living off the land is related to surviving on what you can forage, hunt, or grow in nature.

Considering the current Machine Learning landscape, we can draw a parallelism between living off the land and "shopping around" for ready-made models for a given task.  While this has been partially true for some time thanks to model repositories such as HuggingFace, it still required some degree of involvement by applying finetuning or retraining for most advanced use cases. 

However, the appearance of large language models (LLMs) with instruction-following capabilities beyond next-word prediction has opened the doors to many applications that require little supervision, and in some cases, true 100% no-code solutions.

In this post I will be describing a recent "living off the land" approach in order to solve an NLP competitive ML challenge: WASSA 2023, An ACL shared Task on Empathy Emotion and Personality Detection in Interactions

Thursday, February 16, 2023

Pretrained Models with Adversarial Training for Online Sexism Detection @ SemEval 2023


Adversarial training can provide neural networks with significantly improved resistance to adversarial attacks, thus improving model robustness. However, a major drawback of many existing adversarial training workflows is the computational cost and extra processing time when using data augmentation techniques. This post explores the application of embedding perturbations via the fast gradient method (FGM) when finetuning large language models (LLMs) to short text classification tasks. This adversarial training approach has been evaluated as part of the first sub-task of SemEval 2023-Task 10, focused on explainable detection of sexism in social networks (EDOS). Empirical results show that adversarially finetuned models with FGM had on average a 25% longer training time and 0.2% higher F1 than their respective baselines. 

Tuesday, January 24, 2023

The string similarity problem

For two strings A and B (in the ASCII [a-z] range), we define the similarity of the strings to be the length of the longest prefix common to both strings. For example, the similarity of strings "abc" and "abd" is 2, while the similarity of strings "aaa" and "aaab" is 3.

The reader is asked to calculate the sum of similarities of a string S with each of its suffixes. Reference (