Alejandro Mosquera López is an online safety expert and Kaggle Grandmaster working in cybersecurity. His main research interests are Trustworthy AI and NLP. ORCID iD icon https://orcid.org/0000-0002-6020-3569

Saturday, May 13, 2023

Living off the land: Solving ML problems without training a single model

Introduction

The concept of living off the land is related to surviving on what you can forage, hunt, or grow in nature.

Considering the current Machine Learning landscape, we can draw a parallelism between living off the land and "shopping around" for ready-made models for a given task.  While this has been partially true for some time thanks to model repositories such as HuggingFace, it still required some degree of involvement by applying finetuning or retraining for most advanced use cases. 

However, the appearance of large language models (LLMs) with instruction-following capabilities beyond next-word prediction has opened the doors to many applications that require little supervision, and in some cases, true 100% no-code solutions.

In this post I will be describing a recent "living off the land" approach in order to solve an NLP competitive ML challenge: WASSA 2023, An ACL shared Task on Empathy Emotion and Personality Detection in Interactions

The task

Emotion is a concept that is challenging to describe. Yet, as human beings, we understand the emotional effect situations have or could have on us and other people. How can we transfer this knowledge to machines? Is it possible to learn the link between situations and the emotions they trigger in an automatic way? What about empathy and how to link it with the emotions and the personality?

We know that some LLMs have good built-in knowledge of many human-made concepts such as sentiment or polarity among others. Therefore it would be reasonable to attempt solving an emotion classification tasks in a zero-shot fashion. On the other hand, NLP research on emotion detection is a well-worn path and I would expect pre-trained resources to perform reasonably well in this area.

Approach

In order to validate the above hypothesis I decided to solve this emotion detection tasks for a given text without training any model. With that self-imposed limitation in mind, the first step was investigating potential approaches:

  • Next word prediction: If we transform emotion classification into a next word prediction task we could rewrite the original sentence using prompt templates such as:
'This article produces ' + mask + '. {}',
    'The following text causes ' + mask + '. {}',
    'The emotion in this article is ' + mask + ': {}'

          where the mask is an special model-dependent token.

  • Prompt engineering: We can just "ask" a text completion model for the solution e.g.
prompt = "Classify the emotion of a text between the following options: sadness,
neutral, anger, disgust, fear, hope, joy and surprise.\n\nText: \"" + text + "\"\nEmotion:
  • Pre-trained emotion models: Finally, since this is a relatively common NLP task, we can just leverage whatever text classification models are available in the wild.

Evaluation

Using 4 fold cross validation:

ModelApproachF1 (macro)
gpt-3.5-turboPrompt engineering0.2893
AdapterHub/roberta-base-pf-emotionPretrained emotion model0.2842
text-davinci-003Prompt engineering0.2828
j-hartmann/emotion-english-roberta-largePretrained emotion model0.2674
flan-t5-basePrompt engineering0.2396
bert-base-uncasedNext word prediction0.2164
alpaca-lora-7bPrompt engineering0.21
j-hartmann/emotion-english-distilroberta-basePretrained emotion model0.1912
opt-iml-max-1.3bPrompt engineering0.1837
HuggingChatPrompt engineering0.1634
roberta-baseNext word prediction0.1628
text-davinci-002Prompt engineering0.1426
bart-large-mnliNext word prediction0.1412

Results

Below are the results during the training phase of the competition. The final submission was based on a weighted average of the output of the above models.

#UserEntriesDate of Last EntryTeam NameMacro F1-Score Micro F1-Score Micro Jaccard Macro Precision Macro Recall Micro Precision Micro Recall 

1

adityapatkar

1904/17/230.579 (1)0.736 (1)0.583 (1)0.571 (3)0.625 (1)0.729 (2)0.744 (1)

2

gauravk

804/26/23Team Converge0.544 (2)0.703 (2)0.542 (2)0.604 (2)0.537 (2)0.690 (5)0.715 (2)

3

amsqr

304/30/23Alejandro Mosquera0.527 (3)0.670 (4)0.503 (4)0.622 (1)0.500 (5)0.720 (4)0.626 (6)
4Cordyceps1805/05/230.504 (4)0.647 (6)0.478 (6)0.567 (4)0.501 (4)0.628 (7)0.667 (3)
5anedilko104/30/23Bias Busters0.462 (5)0.572 (8)0.400 (8)0.502 (7)0.523 (3)0.542 (8)0.606 (8)
6surajtc204/22/230.425 (6)0.661 (5)0.493 (5)0.525 (6)0.370 (8)0.721 (3)0.610 (7)
7lazyboy.blk1204/22/23Team Name0.402 (7)0.696 (3)0.534 (3)0.556 (5)0.380 (7)0.760 (1)0.642 (5)
8warrior1127505/02/23SAIL0.388 (8)0.548 (9)0.377 (9)0.373 (9)0.483 (6)0.473 (10)0.650 (4)
9hammadfahim404/19/230.248 (9)0.476 (10)0.312 (10)0.399 (8)0.292 (9)0.519 (9)0.439 (10)
10kunwarv4505/01/23VISU_UNiCA0.180 (10)0.595 (7)0.423 (7)0.158 (10)0.213 (10)0.649 (6)0.549 (9)

The dev results were pretty good in comparison with other (likely supervised) models. In the test phase the final submission scored even higher F1 (0.533) but ranked lower against other teams. 

#UserEntriesDate of Last EntryTeam NameMacro F1-Score Micro F1-Score Micro Jaccard Macro Precision Macro Recall Micro Precision Micro Recall 

1

adityapatkar

405/10/230.701 (1)0.750 (1)0.600 (1)0.810 (1)0.677 (2)0.778 (1)0.724 (3)

2

anedilko

205/10/23Bias Busters0.647 (2)0.700 (6)0.538 (6)0.630 (6)0.730 (1)0.626 (8)0.793 (1)

3

luxinxyz

405/09/23tRNA0.644 (3)0.720 (2)0.562 (2)0.721 (4)0.631 (4)0.743 (3)0.698 (4)
4gauravk505/13/23Team Converge0.628 (4)0.707 (4)0.547 (4)0.700 (5)0.622 (5)0.717 (6)0.698 (4)
5lazyboy.blk205/10/23Team Name0.612 (5)0.713 (3)0.554 (3)0.776 (2)0.600 (6)0.770 (2)0.664 (6)
6amsqr405/09/23Alejandro Mosquera0.533 (6)0.673 (7)0.507 (7)0.752 (3)0.479 (8)0.723 (5)0.629 (7)
7surajtc1105/10/230.522 (7)0.622 (8)0.451 (8)0.463 (8)0.668 (3)0.527 (10)0.759 (2)
8alili_wyk405/11/23YNU-HPCC0.514 (8)0.703 (5)0.542 (5)0.575 (7)0.502 (7)0.736 (4)0.672 (5)
9mimmu33023705/11/230.332 (9)0.546 (10)0.376 (10)0.394 (9)0.322 (9)0.590 (9)0.509 (9)
10kunwarv4205/11/23VISU_UNiCA0.284 (10)0.593 (9)0.421 (9)0.282 (11)0.318 (10)0.640 (7)0.552 (8)
11Sidpan2305/12/23SidShank0.263 (11)0.400 (11)0.250 (11)0.299 (10)0.249 (11)0.404 (11)0.397 (10)


Conclusion

Overall, the results above show that is possible to generate strong baselines for some NLP tasks with very little code and without training any model. Such approach ranked 3rd during the dev phase and proved superior to 50% of the competing teams in the final results.


No comments:

Post a Comment