Computational Chemistry
AdaptMol domain adaptation pipeline showing encoder-decoder with MMD alignment between labeled source and unlabeled target domain images

AdaptMol: Domain Adaptation for Molecular OCSR (2026)

AdaptMol combines an end-to-end graph reconstruction model with unsupervised domain adaptation via class-conditional MMD on bond features and SMILES-validated self-training. Achieves 82.6% accuracy on hand-drawn molecules (10.7 points above prior best) while maintaining state-of-the-art results on four literature benchmarks, using only 4,080 real hand-drawn images for adaptation.

Generative Modeling
Diagram showing consistency models mapping points on a PF ODE trajectory to the same origin

Consistency Models: Fast One-Step Diffusion Generation

This paper introduces consistency models, a new family of generative models that map any point on a Probability Flow ODE trajectory to its origin. They support fast one-step generation by design, while allowing multi-step sampling for improved quality and zero-shot editing tasks like inpainting and colorization.

Generative Modeling
D3PM forward and reverse processes on a quantized swiss roll with uniform, Gaussian, and absorbing transition matrices

D3PM: Discrete Denoising Diffusion Probabilistic Models

This paper introduces Discrete Denoising Diffusion Probabilistic Models (D3PMs), which generalize diffusion to discrete state-spaces using structured Markov transition matrices. D3PMs include uniform, absorbing-state, and discretized Gaussian corruption processes, drawing a connection between diffusion and masked language models.

Computational Chemistry
GraSP feed-forward architecture showing GNN, FiLM-conditioned CNN, and MLP classification head

GraSP: Graph Recognition via Subgraph Prediction (2026)

GraSP introduces a general framework for recognizing graphs in images by framing it as sequential subgraph prediction with a binary classifier. A GNN conditions a CNN via FiLM layers to predict whether a candidate graph is a subgraph of the target. Applied to OCSR on QM9, GraSP achieves 67.5% accuracy with no domain-specific modifications.

Generative Modeling
LDM architecture diagram showing conditioning via concatenation and cross-attention

Latent Diffusion Models for High-Res Image Synthesis

This paper introduces Latent Diffusion Models (LDMs), which apply denoising diffusion in the latent space of pretrained autoencoders. By separating perceptual compression from generative learning and adding cross-attention conditioning, LDMs achieve FID 1.50 on Places inpainting and FID 3.60 on ImageNet class-conditional synthesis, with competitive text-to-image generation, at a fraction of the compute cost of pixel-space diffusion.

Machine Learning Fundamentals
Three-panel diagram showing an original sequence, its time-warped version, and the gate values derived from requiring time warping invariance

Can Recurrent Neural Networks Warp Time? (ICLR 2018)

Tallec and Ollivier show that requiring invariance to time transformations in recurrent models leads to gating mechanisms, recovering key LSTM components from first principles. They propose the chrono initialization for gate biases that improves learning of long-term dependencies.

Machine Learning Fundamentals
Graph network block diagram showing input graph transformed through edge, node, and global update steps to produce an updated graph

Relational Inductive Biases in Deep Learning (2018)

Battaglia et al. argue that combinatorial generalization requires structured representations, systematically analyze the relational inductive biases in standard deep learning architectures (MLPs, CNNs, RNNs), and present the graph network as a unifying framework that generalizes and extends prior graph neural network approaches.

Machine Learning Fundamentals
Log-log plot comparing scaling laws across six architectures showing the vanilla Transformer has the steepest slope

Scaling Laws vs Model Architectures: Inductive Bias

Tay et al. systematically compare scaling laws across ten diverse architectures (Transformers, Switch Transformers, Performers, MLP-Mixers, and others), finding that the vanilla Transformer has the best scaling coefficient and that the best-performing architecture changes across compute regions.

Machine Learning Fundamentals
SE(3)-Transformer architecture showing invariant attention weights modulating equivariant value messages on a 3D point cloud

SE(3)-Transformers: Equivariant Attention for 3D Data

Fuchs et al. introduce the SE(3)-Transformer, which combines self-attention with SE(3)-equivariance for 3D point clouds and graphs. Invariant attention weights modulate equivariant value messages from tensor field networks, resolving angular filter constraints while enabling data-adaptive, anisotropic processing.

Machine Learning Fundamentals
Comparison of planar CNN (translation only) versus spherical CNN (SO(3)-equivariant) showing how filters rotate on the sphere

Spherical CNNs: Rotation-Equivariant Networks on the Sphere

Cohen et al. introduce Spherical CNNs that achieve SO(3)-equivariance by defining cross-correlation on the sphere and rotation group, computed efficiently via generalized FFT algorithms from non-commutative harmonic analysis.

Machine Learning Fundamentals
The three quarks of attention: multiplexing (additive), output gating (multiplicative output), and synaptic gating (multiplicative weight)

The Quarks of Attention: Building Blocks of Attention

Baldi and Vershynin systematically classify the fundamental building blocks of attention (activation attention, output gating, synaptic gating) by source, target, and mechanism, then prove capacity bounds showing that gating introduces quadratic terms sparsely, gaining expressiveness without the full cost of polynomial activations.

Computational Chemistry
ChemBERTa-3 visualization showing muscular arms lifting a stack of building blocks representing molecular data with SMILES notation, symbolizing the power and scalability of the open-source training framework

ChemBERTa-3: Open Source Chemical Foundation Models

ChemBERTa-3 provides a unified, scalable infrastructure for pretraining and benchmarking chemical foundation models. It addresses reproducibility gaps in previous studies like MoLFormer through standardized scaffold splitting and open-source tooling.