Alejandro Mosquera López is an online safety expert and Kaggle Grandmaster working in cybersecurity. His main research interests are Trustworthy AI and NLP. ORCID iD icon

Wednesday, December 7, 2022

Revisiting the Microsoft Malware Classification Challenge (BIG 2015) in 2022

 In 2015, Microsoft provided the data science community with an unprecedented malware dataset and encouraging open-source progress on effective techniques for grouping variants of malware files into their respective families. Formatted as a Kaggle Competition, it featured a very large (for that time) dataset comprising of almost 40GB of compressed files containing disarmed malware samples and their corresponding disassembled ASM code.

Tuesday, December 6, 2022

On the Intriguing Properties of Backdoored Neural Networks


Malicious actors can alter the expected behavior of a neural network in order to respond to data containing certain triggers only known to the attacker, without disrupting model performance when presented with normal inputs. An adversary will commonly force these misclassifications by either performing trigger injection [19] or dataset poisoning [6]. Less popular techniques that operate at hardware level such as manipulating the binary code of a neural network or the tainting the physical circuitry [26, 8] can be equally effective.

Tuesday, September 27, 2022

AI Village Capture the Flag @ DEFCON write up

 In August 2022 I had the chance to participate in an AI-themed CTF collocated with the DEF CON 30 security (hacking) conference. This was particularly interesting since it was presented in a novel format as a Kaggle competition where the leaderboard was ranked based on the points that each of the discovered flags was providing. Despite entering the competition in its latest stage I did manage to solve all the challenges but two, therefore achieving the second best score (although my final ranking was lower due to submission times being used as tie-breakers). No one was able to find the last flag corresponding to the Crop-2 challenge until after the CTF ended.

Thursday, March 17, 2022

Defending and attacking ML Malware Classifiers for Fun and Profit: 2x prize winner at MLSEC-2021

MLSEC (Machine Learning Security Evasion Competition) is an initiative sponsored by Microsoft and partners CUJO AI, NVIDIA, VMRay, and MRG Effitas with the purpose of raising awareness of the expanding attack surface which is now also affecting AI-powered systems. 

In its 3rd edition the competition allowed defenders and attackers to exercise their security and machine learning skills under a plausible threat model: evading antimalware and anti-phishing filters. In the competition, defenders aimed to detect evasive submissions by using machine learning (ML), and attackers attempted to circumvent those detections.

Towards Machines that Capture and Reason with Science Knowledge

 In 2015 I took part on a machine learning competition hosted on Kaggle aiming to solve a multiple-question 8th grade science test. At that time there weren't large pretrained models to leverage and (unsurprisingly) best performing models were IR-based that would barely achieve a GPA of 1.0 in the US grading system: