Computational Chemistry
ChemBERTa-2 visualization showing flowing SMILES strings in blue tones representing molecular data streams

ChemBERTa-2

Optimizing transformer pretraining for molecules using MLM vs MTR objectives, scaling to 77M compounds from PubChem for …

Generative Modeling
GP-MoLFormer architecture showing large-scale SMILES input, linear-attention transformer decoder, and property optimization via pair-tuning soft prompts

GP-MoLFormer: Molecular Generation via Transformers

A 46.8M parameter transformer for molecular generation trained on 1.1B SMILES, introducing pair-tuning for efficient …

Computational Chemistry
ChemBERTa masked language modeling visualization showing SMILES string CC(=O)O with masked tokens

ChemBERTa: Molecular Property Prediction via Transformers

A systematic evaluation of RoBERTa transformers pretrained on 77M PubChem SMILES for molecular property prediction …

Computational Chemistry
Chemformer pre-training on 100M SMILES strings flowing into BART model, which then enables reaction prediction and property prediction tasks

Chemformer: Pre-trained Transformer for Comp Chem

BART-based Transformer pre-trained on 100M molecules using self-supervision to accelerate convergence on chemical …

Generative Modeling
Visualization of probability density flow from initial distribution ρ₀ to target distribution ρ₁ over time through space

Building Normalizing Flows with Stochastic Interpolants

A continuous-time normalizing flow using stochastic interpolants and quadratic loss to bypass costly ODE …

Generative Modeling
Visualization comparing Optimal Transport (straight paths) vs Diffusion (curved paths) for Flow Matching

Flow Matching for Generative Modeling

A simulation-free framework for training Continuous Normalizing Flows using Conditional Flow Matching and Optimal …

Machine Learning Fundamentals
Comparison of Residual Network vs ODE Network architectures showing discrete layers versus continuous transformations

Neural Ordinary Differential Equations

Introduces ODE-Nets, a continuous-depth neural network model parameterized by ODEs, enabling constant memory …

Generative Modeling
Denoising Score Matching Intuition - Vectors point from corrupted samples back to clean data, approximating the score

Score Matching and Denoising Autoencoders

Theoretical paper proving the equivalence between training Denoising Autoencoders and performing Score Matching on a …

Computational Biology
DynamicFlow illustration showing the transformation from apo pocket to holo pocket with ligand molecule generation

DynamicFlow: Integrating Protein Dynamics into Drug Design

Flow matching model that co-generates ligands and flexible protein pockets, addressing rigid-receptor limitations in …

Computational Chemistry
InstructMol architecture showing molecular graph and text inputs feeding through two-stage training to produce property predictions, descriptions, and reactions

InstructMol: Multi-Modal Molecular Assistant

A multi-modal LLM aligning 2D molecular graphs with text via two-stage instruction tuning for drug discovery tasks.

Computational Biology
InvMSAFold generates diverse protein sequences from structure using a Potts model

InvMSAFold: Fast & Diverse Inverse Folding

A fast, diverse inverse folding method combining deep learning with Potts models to capture full sequence landscapes.

Computational Chemistry
MERMaid pipeline diagram showing PDF processing through VisualHeist segmentation, DataRaider VLM mining, and KGWizard graph construction to produce chemical knowledge graphs

MERMaid: Multimodal Reaction Mining

Vision-language pipeline extracting chemical reaction data from PDF figures and tables into structured knowledge graphs …