Time Series Forecasting
Forecasting comparison of different neural architectures on the Multiscale Lorenz-96 system

Optimizing Sequence Models for Dynamical Systems

Ablation study deconstructing sequence models. Attention-augmented Recurrent Highway Networks outperform Transformers on …

Computational Chemistry
ChemBERTa-3 visualization showing muscular arms lifting a stack of building blocks representing molecular data with SMILES notation, symbolizing the power and scalability of the open-source training framework

ChemBERTa-3: Open Source Training Framework

An open-source framework integrating DeepChem and Ray for training and benchmarking chemical foundation models like …

Computational Chemistry
Chemical structures and molecular representations feeding into a neural network model that processes atomized chemical knowledge

ChemDFM-R: Chemical Reasoner LLM

A 14B-parameter chemical reasoning LLM enhanced with atomized functional group knowledge and mix-sourced distillation …

Computational Chemistry
ChemBERTa-2 visualization showing flowing SMILES strings in blue tones representing molecular data streams

ChemBERTa-2

Optimizing transformer pretraining for molecules using MLM vs MTR objectives, scaling to 77M compounds from PubChem for …

Generative Modeling
GP-MoLFormer architecture showing large-scale SMILES input, linear-attention transformer decoder, and property optimization via pair-tuning soft prompts

GP-MoLFormer: Molecular Generation via Transformers

A 46.8M parameter transformer for molecular generation trained on 1.1B SMILES, introducing pair-tuning for efficient …

Computational Chemistry
ChemBERTa masked language modeling visualization showing SMILES string CC(=O)O with masked tokens

ChemBERTa: Molecular Property Prediction via Transformers

A systematic evaluation of RoBERTa transformers pretrained on 77M PubChem SMILES for molecular property prediction …

Computational Chemistry
Chemformer pre-training on 100M SMILES strings flowing into BART model, which then enables reaction prediction and property prediction tasks

Chemformer: Pre-trained Transformer for Comp Chem

BART-based Transformer pre-trained on 100M molecules using self-supervision to accelerate convergence on chemical …

Generative Modeling
Visualization of probability density flow from initial distribution ρ₀ to target distribution ρ₁ over time through space

Building Normalizing Flows with Stochastic Interpolants

A continuous-time normalizing flow using stochastic interpolants and quadratic loss to bypass costly ODE …

Generative Modeling
Visualization comparing Optimal Transport (straight paths) vs Diffusion (curved paths) for Flow Matching

Flow Matching for Generative Modeling

A simulation-free framework for training Continuous Normalizing Flows using Conditional Flow Matching and Optimal …

Machine Learning Fundamentals
Comparison of Residual Network vs ODE Network architectures showing discrete layers versus continuous transformations

Neural Ordinary Differential Equations

Introduces ODE-Nets, a continuous-depth neural network model parameterized by ODEs, enabling constant memory …

Generative Modeling
Denoising Score Matching Intuition - Vectors point from corrupted samples back to clean data, approximating the score

Score Matching and Denoising Autoencoders

Theoretical paper proving the equivalence between training Denoising Autoencoders and performing Score Matching on a …

Generative Modeling
Forward and Reverse SDE trajectories showing the diffusion process from data to noise and back

Score-Based Generative Modeling with SDEs

Unified SDE framework for score-based generative models, introducing Predictor-Corrector samplers and achieving SOTA on …