Understanding Generative Adversarial Networks (GANs)

Understanding Generative Models

Before diving into Generative Adversarial Networks (GANs), let’s establish what we’re trying to accomplish with generative models.

The core goal: Create a system that can generate new, realistic data that appears to come from the same distribution as our training data.

Think of having a model that can create images, text, or audio that are difficult to distinguish from human-created content. This is what generative modeling aims to achieve.

The Mathematical Foundation

Generative models aim to estimate the probability distribution of real data. If we have parameters $\theta$, we want to find the optimal $\theta^*$ that maximizes the likelihood of observing our real samples:

$$ \theta^* = \arg\max_\theta \prod_{i=1}^{n} p_\theta(x_i) $$

This is equivalent to minimizing the distance between our estimated distribution and the true data distribution. A common distance measure is the Kullback-Leibler Divergence - maximizing log-likelihood equals minimizing KL divergence.

Two Approaches to Generative Modeling

Explicit Distribution Models

These models define an explicit probability distribution and refine it through training.

Example: Variational Auto-Encoders (VAEs) require:

An explicitly assumed prior distribution
A likelihood distribution
A “variational approximation” to evaluate performance

Implicit Distribution Models

These models learn to generate data without explicitly defining a probability distribution. Instead, they sample from their learned distribution indirectly.

This is where GANs shine - they’re implicit generative models that learn through adversarial competition.

Types of deep generative models showing taxonomy — **Taxonomy of Deep Generative Models**: GANs fall into the implicit density category, learning distributions through adversarial training rather than explicit modeling. *Source: NeurIPS 2016 tutorial on Generative Adversarial Networks*

The GAN Architecture: A Game of Deception

Generative Adversarial Networks get their name from three key components:

Generative: They create new data
Adversarial: Two networks compete against each other
Networks: Built using neural networks

The genius lies in the adversarial setup: two neural networks locked in competition, each pushing the other to improve.

Diagram showing data flow through a GAN architecture — **GAN Data Flow**: The generator creates fake samples from random noise, while the discriminator tries to distinguish real from fake data. This adversarial competition drives both networks to improve.

The Generator: The Forger

Role: Create convincing fake data from random noise

The generator network $G$ learns a mapping function: $$z \rightarrow G(z) \approx x_{\text{real}}$$

Where:

$z$ is a random latent vector (the “noise”)
$G(z)$ is the generated sample
The goal is making $G(z)$ indistinguishable from real data

Key insight: The latent space $z$ is continuous, meaning small changes in $z$ produce smooth, meaningful changes in the generated output.

The Discriminator: The Detective

Role: Distinguish between real and generated samples

The discriminator network $D$ outputs a probability: $$D(x) = P(\text{x is real})$$

$D(x) \approx 1$ for real samples
$D(x) \approx 0$ for fake samples

Think of it as an “authenticity detector” that gets better over time.

The Adversarial Competition

This is where the magic happens. The generator and discriminator have directly opposing objectives:

Generator Goal	Discriminator Goal
Fool the discriminator	Correctly classify all samples
Minimize $D(G(z))$	Maximize $D(x_{\text{real}})$ and minimize $D(G(z))$
“Create convincing fakes”	“Never be fooled”

This creates a dynamic where both networks continuously improve:

Generator creates better fakes to fool the discriminator
Discriminator becomes better at detecting fakes
The cycle continues until equilibrium

Illustration of GAN training process showing adversarial competition — **The Adversarial Training Process**: Through competition, both networks improve. The generator learns to create increasingly realistic samples while the discriminator becomes more discerning.

Learning Through Metaphors

Sometimes the best way to understand complex concepts is through relatable analogies. Here are three metaphors that capture different aspects of how GANs work.

🎨 The Art Forger vs. Critic

Generator = Art Forger
Discriminator = Art Critic

A criminal forger tries to create fake masterpieces, while an art critic must identify authentic works. Each interaction teaches both parties:

The forger learns what makes art look authentic
The critic develops a keener eye for detecting fakes
Eventually, the forger becomes so skilled that even experts can’t tell the difference

This captures the adversarial nature and continuous improvement aspect of GANs.

💰 The Counterfeiter vs. Bank Teller

Generator = Counterfeiter
Discriminator = Bank Teller

Day 1: Criminal brings a crayon drawing of a dollar bill. Even a new teller spots this fake.

Day 100: The counterfeiter has learned better techniques. The teller has developed expertise in security features.

Day 1000: The fake money is so convincing that detecting it requires advanced equipment.

This illustrates the progressive improvement and escalating sophistication in both networks.

🦜 The Parrot vs. Sibling

Generator = Pet Parrot
Discriminator = Younger Brother

You sit behind curtains with your parrot. The parrot tries to mimic your voice to fool your brother about which curtain you’re behind. Successful mimicry gets treats.

Initially, the parrot’s attempts are obvious. But through practice, the parrot develops the ability to closely mirror your voice.

This shows how GANs can learn to replicate complex patterns through reward-based competition.

The Mathematical Foundation

Now let’s examine the mathematical framework that makes GANs work. The core of GAN training is solving a minimax optimization problem.

The Minimax Objective

$$ \min_{G} \max_{D} V(D, G) = \mathbb{E}{x \sim p{\text{data}}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_z(z)}[\log(1 - D(G(z)))] $$

Breaking this down:

Component	Meaning	Goal
$\mathbb{E}{x \sim p{\text{data}}(x)}[\log D(x)]$	Expected log-probability for real data	Discriminator wants to maximize this
$\mathbb{E}_{z \sim p_z(z)}[\log(1 - D(G(z)))]$	Expected log-probability for fake data being labeled fake	Discriminator wants to maximize this
		Generator wants to minimize this

Why “Minimax”?

Discriminator ($D$): Tries to maximize the objective → Better at distinguishing real from fake
Generator ($G$): Tries to minimize the objective → Better at fooling the discriminator

The Training Process

The beauty of GANs lies in their alternating optimization:

Fix $G$, train $D$: Make the discriminator optimal for the current generator
Fix $D$, train $G$: Improve the generator against the current discriminator
Repeat: Continue until reaching Nash equilibrium

Theoretical Goal: Nash Equilibrium

At convergence, the discriminator outputs $D(x) = 0.5$ for all samples, meaning it can’t distinguish between real and fake data. This indicates that $p_{\text{generator}} = p_{\text{data}}$ - our generator has learned the true data distribution.

What’s Next?

Now that you understand the fundamentals of GANs, you’re ready to explore the rich ecosystem of GAN variants and improvements. Each addresses specific challenges like training stability, mode collapse, or convergence issues.

Continue your GAN journey: Check out my follow-up post on GAN Objective Functions, where we explore various GAN architectures including WGAN, LSGAN, and many others that have shaped the field.

Acknowledgments: This post was inspired by the excellent survey “How Generative Adversarial Networks and Their Variants Work: An Overview of GAN” - an invaluable resource for understanding the GAN landscape.

Understanding Generative Models#

The Mathematical Foundation#

Two Approaches to Generative Modeling#

Explicit Distribution Models#

Implicit Distribution Models#

The GAN Architecture: A Game of Deception#

The Generator: The Forger#

The Discriminator: The Detective#

The Adversarial Competition#

Learning Through Metaphors#

🎨 The Art Forger vs. Critic#

💰 The Counterfeiter vs. Bank Teller#

🦜 The Parrot vs. Sibling#

The Mathematical Foundation#

The Minimax Objective#

Why “Minimax”?#

The Training Process#

Theoretical Goal: Nash Equilibrium#

What’s Next?#

Understanding Generative Models

The Mathematical Foundation

Two Approaches to Generative Modeling

Explicit Distribution Models

Implicit Distribution Models

The GAN Architecture: A Game of Deception

The Generator: The Forger

The Discriminator: The Detective

The Adversarial Competition

Learning Through Metaphors

🎨 The Art Forger vs. Critic

💰 The Counterfeiter vs. Bank Teller

🦜 The Parrot vs. Sibling

The Mathematical Foundation

The Minimax Objective

Why “Minimax”?

The Training Process

Theoretical Goal: Nash Equilibrium

What’s Next?