Machine Learning
Comparison of Residual Network vs ODE Network architectures showing discrete layers versus continuous transformations

Neural ODEs: Continuous-Depth Deep Learning Models

This paper replaces discrete network layers with continuous ordinary differential equations (ODEs), allowing for adaptive computation depth and constant memory cost during training via the adjoint sensitivity method. It introduces Continuous Normalizing Flows and latent ODEs for time-series.

Generative Modeling
Denoising Score Matching Intuition - Vectors point from corrupted samples back to clean data, approximating the score

Score Matching and Denoising Autoencoders: A Connection

This paper provides a rigorous probabilistic foundation for Denoising Autoencoders by proving they are mathematically equivalent to Score Matching on a kernel-smoothed data distribution. It derives a specific energy function for DAEs and justifies the use of tied weights.

Generative Modeling
Forward and Reverse SDE trajectories showing the diffusion process from data to noise and back

Score-Based Generative Modeling with SDEs (Song 2021)

This paper unifies previous score-based methods (SMLD and DDPM) under a continuous-time SDE framework. It introduces Predictor-Corrector samplers for improved generation and Probability Flow ODEs for near-exact likelihood computation, setting new records on CIFAR-10.

Computational Chemistry
ChemDFM-X architecture showing five modalities (2D graphs, 3D conformations, images, MS2 spectra, IR spectra) feeding through separate encoders into unified LLM decoder

ChemDFM-X: Multimodal Foundation Model for Chemistry

ChemDFM-X is a multimodal chemical foundation model that integrates five non-text modalities (2D graphs, 3D conformations, images, MS2 spectra, IR spectra) into a single LLM decoder. It overcomes data scarcity by generating a 7.6M instruction-tuning dataset through approximate calculations and model predictions, establishing strong baseline performance across multiple modalities.

Computational Biology
DynamicFlow illustration showing the transformation from apo pocket to holo pocket with ligand molecule generation

DynamicFlow: Integrating Protein Dynamics into Drug Design

This paper introduces DynamicFlow, a full-atom stochastic flow matching model that simultaneously generates ligand molecules and transforms protein pockets from apo to holo states. It also contributes a new dataset of MD-simulated apo-holo pairs derived from MISATO.

Optical Chemical Structure Recognition
Comparative analysis of image-to-sequence OCSR methods

Image-to-Sequence OCSR: A Comparative Analysis

Deep dive into 24 image-to-sequence OCSR methods (2019-2025), comparing encoder-decoder architectures, molecular string representations, training scale, and hardware requirements.

Computational Chemistry
InstructMol architecture showing molecular graph and text inputs feeding through two-stage training to produce property predictions, descriptions, and reactions

InstructMol: Multi-Modal Molecular LLM for Drug Discovery

InstructMol integrates a pre-trained molecular graph encoder (MoleculeSTM) with a Vicuna-7B LLM using a linear projector. It employs a two-stage training process (alignment pre-training followed by task-specific instruction tuning with LoRA) to excel at property prediction, description generation, and reaction analysis.

Computational Biology
InvMSAFold generates diverse protein sequences from structure using a Potts model

InvMSAFold: Generative Inverse Folding with Potts Models

InvMSAFold replaces autoregressive decoding with a Potts model parameter generator, enabling diverse protein sequence sampling orders of magnitude faster than ESM-IF1.

Molecular Simulation
MOFFlow assembles metal nodes and organic linkers into Metal-Organic Framework structures

MOFFlow: Flow Matching for MOF Structure Prediction

MOFFlow is the first deep generative model tailored for Metal-Organic Framework (MOF) structure prediction. It utilizes Riemannian flow matching on SE(3) to assemble rigid building blocks (metal nodes and organic linkers), achieving higher accuracy and scalability than atom-based methods on large systems.

Computational Chemistry
Diagram showing text, molecular structures, and reactions feeding into a multimodal index and search system that outputs passages with context

Multimodal Search in Chemical Documents and Reactions

This paper presents a multimodal search system that facilitates passage-level retrieval of chemical reactions and molecular structures by linking diagrams, text, and reaction records extracted from scientific PDFs.

Molecular Representations
Diagram showing molecular structure passing through a neural network to produce IUPAC chemical nomenclature document

STOUT V2.0: Transformer-Based SMILES to IUPAC Translation

STOUT V2.0 uses Transformers trained on ~1 billion SMILES-IUPAC pairs to accurately translate chemical structures into systematic names (and vice-versa), outperforming its RNN predecessor.

Molecular Representations
Vintage wooden device labeled 'The Molecular Interpreter - Model 1974' with vacuum tubes, showing SMILES to IUPAC name translation

STOUT: SMILES to IUPAC Names via Neural Machine Translation

STOUT (SMILES-TO-IUPAC-name translator) uses neural machine translation to convert chemical line notations to IUPAC names and vice versa, achieving ~90% BLEU score. It addresses the lack of open-source tools for algorithmic IUPAC naming.