Method Papers: New Algorithms, Architectures, and Mechanisms

Graph-grammar expansion of a carbon fixation reaction network with ILP flow queries selecting short autocatalytic cycles

Graph Grammar and ILP for Carbon Fixation Pathways

A graph-grammar cheminformatics workflow expands the carbon fixation reaction network, then uses integer linear programming flow queries to surface short autocatalytic pathways producing Acetyl-CoA and Malate with efficiencies approaching the CETCH cycle.

Molecular Simulation

Schematic of polyalanine 1-mer functional groups interacting with water through CCSD(T)-fit 2-body PIPs.

MB-nrg in Solution: Polyalanine in Water with CCSD(T) PEFs

Building on the gas-phase MB-nrg PEF for polyalanine, Ruihan Zhou and Francesco Paesani add machine-learned 2-body terms for each backbone functional group interacting with water, fit to BSSE-corrected DLPNO-CCSD(T)/aug-cc-pVTZ data, then validate the resulting potential against alanine dipeptide-water dimer scans, free-energy surfaces in explicit MB-pol water, and hydration radial distribution functions.

Molecular Simulation

Schematic of polyalanine decomposed into overlapping n-mer building blocks fit to CCSD(T) energies.

MB-nrg: CCSD(T)-Accurate Potentials for Polyalanine

Ruihan Zhou and co-authors extend the MB-nrg many-body formalism to covalently bonded biomolecules by fragmenting polyalanine into functional-group n-mers, fitting permutationally invariant polynomials to DLPNO-CCSD(T)/aug-cc-pVTZ reference energies, and reproducing alanine dipeptide Ramachandran surfaces, harmonic frequencies, and AceAla9Nme secondary-structure dynamics more faithfully than Amber ff14SB and ff19SB.

Predictive Chemistry

Three panels comparing sampling strategies in a multi-modal fitness landscape: exhaustive enumeration, genetic algorithm clustering around few peaks, and ACSESS covering all peaks with fewer evaluations

ACSESS: Diverse Optimal Molecules in the SMU

Property-optimizing ACSESS combines diversity-biased sampling with iterative fitness thresholding to discover diverse sets of molecules with favorable properties. Tested on GDB-9 (dipole moment optimization) and NKp fitness landscapes, it outperforms standard genetic algorithms in diversity while matching or exceeding their fitness, using only ~30,000 evaluations to navigate a 300,000-molecule space.

Predictive Chemistry

Diagram showing AllChem's combinatorial synthon assembly pipeline: 7,000 building blocks transformed by 100 reactions into 5 million synthons, which combine in A-B-C topology to represent 10^20 structures

AllChem: Generating and Searching 10^20 Structures

AllChem generates ~5 million synthons by recursively applying ~100 reactions to ~7,000 building blocks, combinatorially representing up to 10^20 complete structures with an A-B-C topology. Topomer shape similarity enables efficient searching of this space, and every hit comes with a proposed synthetic route.

Molecular Simulation

Diagram showing conformation autoencoder architecture with internal coordinate encoding and decoding

Conformation Autoencoder for 3D Molecules

A conformation autoencoder converts molecular 3D arrangements into fixed-size latent representations using internal coordinates and graph neural networks, enabling conformer generation and spatial property optimization.

Machine Learning

Three-panel diagram showing DGCNN point cloud processing: input space k-NN graph, EdgeConv operation, and semantic feature space clustering

DGCNN: Dynamic Graph CNN for Point Cloud Learning

DGCNN introduces the EdgeConv operator, which constructs k-nearest neighbor graphs dynamically in feature space at each network layer. This enables the model to capture both local geometry and long-range semantic relationships for point cloud classification and segmentation.

Time Series Forecasting

LSTNet architecture diagram showing convolutional, recurrent, recurrent-skip, and autoregressive components

LSTNet: Long- and Short-Term Time Series Network

LSTNet is a deep learning framework for multivariate time series forecasting that uses convolutional layers for local dependencies, a recurrent-skip component for periodic long-term patterns, and an autoregressive component for scale robustness.

Predictive Chemistry

Six molecules with atoms colored by divalent (blue, simple) vs non-divalent (red, complex) nodes, showing increasing MC1 complexity from hexane to pivaloyl methylamine

Molecular Complexity from the GDB Chemical Space

Buehler and Reymond introduce two molecular complexity measures, MC1 (fraction of non-divalent nodes) and MC2 (count of non-divalent nodes excluding carboxyl groups), derived from analyzing synthesizability patterns in GDB-enumerated molecules. They compare these measures against existing complexity scores across GDB-13s, ZINC, ChEMBL, and COCONUT.

Scientific Computing

Side-by-side search tree diagrams comparing nauty depth-first and Traces breadth-first traversal strategies for graph isomorphism

nauty and Traces: Graph Isomorphism Algorithms

An updated description of nauty and introduction of Traces, two programs for graph isomorphism testing and canonical labeling using the individualization-refinement paradigm.

Natural Language Processing

SpeechT5 architecture diagram showing shared encoder-decoder with speech and text pre/post-nets

SpeechT5: Unified Speech-Text Pre-Training Framework

SpeechT5 proposes a unified encoder-decoder pre-training framework that jointly learns from unlabeled speech and text data, achieving strong results on ASR, TTS, speech translation, voice conversion, speech enhancement, and speaker identification.

Predictive Chemistry

Three-stage canonical generation pipeline (geng, vcolg, multig) alongside a log-scale speed comparison showing Surge outperforming MOLGEN 5.0 by 42-161x across natural product molecular formulas

Surge: Fastest Open-Source Chemical Graph Generator

Surge is a constitutional isomer generator based on the canonical generation path method, using nauty for graph automorphism computation. Its three-stage pipeline (simple graph generation, vertex coloring for atom assignment, edge multiplicity for bond orders) generates 7-22 million molecules per second, outperforming MOLGEN 5.0 by 42-161x on natural product molecular formulas.