Survey

Overview of six categories of materials representations for machine learning

Materials Representations for ML Review

A comprehensive review of how solid-state materials can be numerically represented for machine learning, spanning structural features, graph neural networks, compositional descriptors, transfer learning, and generative models for inverse design.

Computational Chemistry

Three-stage progression from task-specific transformers through multimodal models to LLM chemistry agents

Transformers and LLMs for Chemistry Drug Discovery

A review chapter tracing three stages of transformer adoption in chemistry: task-specific single-modality models (reaction prediction, retrosynthesis), multimodal approaches bridging spectra and text, and LLM-powered agents like ChemCrow for general chemical reasoning.

Computational Chemistry

Bar chart comparing small and big foundation models surveyed across property prediction, MLIPs, inverse design, and multi-domain chemistry applications

Foundation Models in Chemistry: A 2025 Perspective

This perspective from Choi et al. reviews foundation models in chemistry, categorizing them as ‘small’ (domain-specific, e.g., property prediction, MLIPs, inverse design) and ‘big’ (multi-domain, e.g., multimodal and LLM-based). It surveys pretraining strategies, key architectures (GNNs and language models), and outlines future directions for scaling, efficiency, and interpretability.

Molecular Generation

Taxonomy diagram showing four generative model families (VAE, GAN, Diffusion, Flow) connecting to small molecule generation and protein generation subtasks

Generative AI Survey for De Novo Molecule and Protein Design

This survey organizes generative AI for de novo drug design into two themes: small molecule generation (target-agnostic, target-aware, conformation) and protein generation (structure prediction, sequence generation, backbone design, antibody, peptide). It covers four generative model families (VAEs, GANs, diffusion, flow-based), catalogs key datasets and benchmarks, and provides 12 comparative benchmark tables across all subtasks.

Computational Chemistry

Conceptual diagram showing natural language prompts flowing into code generation for chemistry tasks

NLP Models That Automate Programming for Chemistry

Hocky and White argue that NLP models capable of generating code from natural language prompts will fundamentally alter how chemists interact with scientific software, reducing barriers to computational research and reshaping programming pedagogy.

Molecular Generation

Bar chart showing deep generative architecture types for molecular design: RNN, VAE, GAN, RL, and hybrid methods

Review: Deep Learning for Molecular Design (2019)

An early and influential review cataloging 45 papers on deep generative modeling for molecules, comparing RNN, VAE, GAN, and reinforcement learning architectures across SMILES and graph-based representations.

Molecular Representations

Radial diagram showing 12 transformer architecture families connected to 5 molecular science application domains

Survey of Transformer Architectures in Molecular Science

Jiang et al. survey 12 families of transformer architectures in molecular science, covering GPT, BERT, BART, graph transformers, Transformer-XL, T5, ViT, DETR, Conformer, CLIP, sparse transformers, and mobile/efficient variants, with detailed algorithmic descriptions and molecular applications.

Molecular Representations

Bar chart showing CLM architecture publication trends from 2020 to 2024, with transformers overtaking RNNs

Systematic Review of Deep Learning CLMs (2020-2024)

PRISMA-based systematic review of 72 papers on chemical language models for molecular generation, comparing architectures and biased methods using MOSES metrics.

Molecular Representations

Taxonomy of transformer-based chemical language models organized by architecture type

Transformer CLMs for SMILES: Literature Review 2024

A comprehensive review of transformer-based chemical language models operating on SMILES, categorizing encoder-only (BERT variants), decoder-only (GPT variants), and encoder-decoder models with analysis of tokenization strategies, pre-training approaches, and future directions.

Predictive Chemistry

Overview of 16 transformer models for molecular property prediction organized by architecture type

Transformers for Molecular Property Prediction Review

Sultan et al. review 16 sequence-based transformer models for molecular property prediction, systematically analyzing seven design decisions (database selection, chemical language, tokenization, positional encoding, model size, pre-training objectives, and fine-tuning strategy) and identifying a critical need for standardized evaluation practices.

Molecular Representations

Taxonomy of molecular representation learning foundation models organized by input modality

Review of Molecular Representation Learning Models

A comprehensive survey classifying molecular representation learning foundation models by input modality (sequence, graph, 3D, image, multimodal) and analyzing four pretraining paradigms for drug discovery tasks.

Molecular Generation

Taxonomy diagram showing the three axes of MolGenSurvey: molecular representations (1D string, 2D graph, 3D geometry), generative methods (deep generative models and combinatorial optimization), and eight generation tasks (1D/2D and 3D)

MolGenSurvey: Systematic Survey of ML for Molecule Design

MolGenSurvey systematically reviews ML models for molecule design, organizing the field by molecular representation (1D/2D/3D), generative method (deep generative models vs. combinatorial optimization), and task type (8 distinct generation/optimization tasks). It catalogs over 100 methods, unifies task definitions via input/output/goal taxonomy, and identifies key challenges including out-of-distribution generation, oracle costs, and lack of unified benchmarks.