AI Safety
A nonsensical trigger sequence 'WTC theoriesclimate Flat Hubbard Principle' is fed into GPT-2, which then generates Flat Earth conspiracy text

GPT-2 Susceptibility to Universal Adversarial Triggers

We demonstrate that universal adversarial triggers can control both the topic and stance of GPT-2’s generated text, revealing security vulnerabilities in deployed language models and proposing constructive applications for bias auditing.

Generative Modeling
Illustration of GAN training process showing adversarial competition between generator and discriminator

Understanding GANs: From Fundamentals to Objective Functions

An in-depth guide to GANs: how two neural networks compete to generate realistic data, the math behind it, and the evolution of objective functions that stabilize training.