Molecular Generation
Bar chart showing CogMol CLaSS enrichment factors across three COVID-19 drug targets

CogMol: Controlled Molecule Generation for COVID-19

CogMol uses a SMILES VAE and multi-attribute controlled sampling (CLaSS) to generate novel, target-specific drug molecules for unseen SARS-CoV-2 proteins without model retraining.

Molecular Generation
Line chart showing curriculum learning converges faster than standard RL for molecular generation

Curriculum Learning for De Novo Drug Design (REINVENT)

Introduces curriculum learning to the REINVENT de novo design platform, decomposing complex drug design objectives into simpler sequential tasks that accelerate agent convergence and improve output quality over standard reinforcement learning.

Molecular Generation
Taxonomy diagram showing four generative model families (VAE, GAN, Diffusion, Flow) connecting to small molecule generation and protein generation subtasks

Generative AI Survey for De Novo Molecule and Protein Design

This survey organizes generative AI for de novo drug design into two themes: small molecule generation (target-agnostic, target-aware, conformation) and protein generation (structure prediction, sequence generation, backbone design, antibody, peptide). It covers four generative model families (VAEs, GANs, diffusion, flow-based), catalogs key datasets and benchmarks, and provides 12 comparative benchmark tables across all subtasks.

Molecular Generation
Bar chart showing Lingo3DMol achieves best Vina docking scores on DUD-E compared to five baselines

Lingo3DMol: Language Model for 3D Molecule Design

Lingo3DMol introduces FSMILES, a fragment-based SMILES representation with local and global coordinates, to generate drug-like 3D molecules in protein pockets via a transformer language model.

Molecular Generation
Schematic of Link-INVENT architecture showing encoder-decoder RNN with reinforcement learning scoring loop

Link-INVENT: RL-Driven Molecular Linker Generation

Link-INVENT is an RNN-based generative model for molecular linker design that uses reinforcement learning with a flexible scoring function, demonstrated on fragment linking, scaffold hopping, and PROTAC design.

Molecular Generation
Bar chart showing PrefixMol Vina scores across different conditioning modes: target, property, combined, and scaffold

PrefixMol: Prefix Embeddings for Drug Molecule Design

PrefixMol prepends learnable condition vectors to a GPT transformer for SMILES generation, enabling joint control over binding pocket targeting and chemical properties like QED, SA, and LogP.

Molecular Generation
Bar chart comparing docking scores of generated vs known ligands for CDK2 and EGFR targets

Protein-to-Drug Molecule Translation via Transformer

Applies the Transformer architecture to generate drug-like molecules conditioned on protein amino acid sequences, treating target-specific de novo drug design as a sequence-to-sequence translation problem.

Molecular Generation
Bar chart comparing PMO benchmark scores with and without chemical quality filters across five generative methods

Re-evaluating Sample Efficiency in Molecule Generation

A critical reassessment of the PMO benchmark for de novo molecule generation, showing that adding molecular weight, LogP, and diversity filters substantially re-ranks generative models, with Augmented Hill-Climb emerging as the top method.

Molecular Generation
Bar chart showing deep generative architecture types for molecular design: RNN, VAE, GAN, RL, and hybrid methods

Review: Deep Learning for Molecular Design (2019)

An early and influential review cataloging 45 papers on deep generative modeling for molecules, comparing RNN, VAE, GAN, and reinforcement learning architectures across SMILES and graph-based representations.

Molecular Generation
Diagram showing the dual formulation of S4 models with convolution during training and recurrence during generation for SMILES-based molecular design

S4 Structured State Space Models for De Novo Drug Design

This paper introduces structured state space sequence (S4) models to chemical language modeling, showing they combine the strengths of LSTMs (efficient recurrent generation) and GPTs (holistic sequence learning) for de novo molecular design.

Molecular Representations
Bar chart showing CLM architecture publication trends from 2020 to 2024, with transformers overtaking RNNs

Systematic Review of Deep Learning CLMs (2020-2024)

PRISMA-based systematic review of 72 papers on chemical language models for molecular generation, comparing architectures and biased methods using MOSES metrics.

Molecular Representations
Taxonomy of transformer-based chemical language models organized by architecture type

Transformer CLMs for SMILES: Literature Review 2024

A comprehensive review of transformer-based chemical language models operating on SMILES, categorizing encoder-only (BERT variants), decoder-only (GPT variants), and encoder-decoder models with analysis of tokenization strategies, pre-training approaches, and future directions.