Molecular Simulation
Schematic of polyalanine 1-mer functional groups interacting with water through CCSD(T)-fit 2-body PIPs.

MB-nrg in Solution: Polyalanine in Water with CCSD(T) PEFs

Building on the gas-phase MB-nrg PEF for polyalanine, Ruihan Zhou and Francesco Paesani add machine-learned 2-body terms for each backbone functional group interacting with water, fit to BSSE-corrected DLPNO-CCSD(T)/aug-cc-pVTZ data, then validate the resulting potential against alanine dipeptide-water dimer scans, free-energy surfaces in explicit MB-pol water, and hydration radial distribution functions.

Molecular Simulation
Schematic of polyalanine decomposed into overlapping n-mer building blocks fit to CCSD(T) energies.

MB-nrg: CCSD(T)-Accurate Potentials for Polyalanine

Ruihan Zhou and co-authors extend the MB-nrg many-body formalism to covalently bonded biomolecules by fragmenting polyalanine into functional-group n-mers, fitting permutationally invariant polynomials to DLPNO-CCSD(T)/aug-cc-pVTZ reference energies, and reproducing alanine dipeptide Ramachandran surfaces, harmonic frequencies, and AceAla9Nme secondary-structure dynamics more faithfully than Amber ff14SB and ff19SB.

Molecular Simulation
Pipeline showing atoms converted to smooth density, symmetrized via Haar integration, and projected to invariant features

Atom-Density Representations for Machine Learning

Introduces a Dirac notation formalism for atomic environments that unifies SOAP power spectra, Behler-Parrinello symmetry functions, and other density-based structural representations under a single theoretical framework.

Time Series Forecasting
LSTNet architecture diagram showing convolutional, recurrent, recurrent-skip, and autoregressive components

LSTNet: Long- and Short-Term Time Series Network

LSTNet is a deep learning framework for multivariate time series forecasting that uses convolutional layers for local dependencies, a recurrent-skip component for periodic long-term patterns, and an autoregressive component for scale robustness.

Natural Language Processing
Diagram showing the three-step nested pipeline from small-scale training to large-model loss prediction across data mixtures

Data Mixing Laws for LM Pretraining Optimization

Ye et al. find that language model loss on each domain follows an exponential function of training mixture proportions. By nesting data mixing laws with scaling laws for steps and model size, small-scale experiments can predict and optimize mixtures for large models, achieving 48% training efficiency gains.

Natural Language Processing
Bar chart comparing baseline and DoReMi domain weights across 12 Pile domains, showing Pile-CC upweighted 5.4x

DoReMi: Optimizing Data Mixtures for LM Pretraining

Xie et al. propose DoReMi, which trains a 280M proxy model using Group DRO to find optimal domain mixture weights, then uses those weights to train an 8B model 2.6x faster with 6.5% better downstream accuracy.

Natural Language Processing
Chart showing effective data as a function of epochs with exponential decay, with the 4-epoch safe zone and 16-epoch half-life marked

Scaling Data-Constrained Language Models

Muennighoff et al. train 400+ models to study how data repetition affects scaling. They propose a data-constrained scaling law with exponential decay for repeated tokens, finding that up to 4 epochs have negligible impact on loss, returns diminish around 16 epochs, and code augmentation provides a 2x effective data boost.

Natural Language Processing
Bar chart comparing average benchmark accuracy across seven domain combination configurations showing diversity improves performance

SlimPajama-DC: Data Combinations for LLM Training

Shen et al. empirically analyze how different domain combinations and deduplication strategies in the SlimPajama dataset affect 1.3B model performance. Global deduplication across sources outperforms local deduplication, and increasing domain diversity consistently improves average accuracy, with findings transferring to 7B scale.

Natural Language Processing
Table comparing multi-task mixing strategies showing examples-proportional and temperature-scaled mixing results

T5: Exploring Transfer Learning Limits

Raffel et al. introduce T5, a unified text-to-text framework for NLP transfer learning. Through systematic ablation of architectures, pre-training objectives, datasets, and multi-task mixing strategies, they identify best practices and scale to 11B parameters, achieving state-of-the-art results across multiple benchmarks.

Molecular Simulation
Diagram showing the Ewald decomposition of long-range interactions into short-range and Fourier-space components for molecular graph neural networks

Ewald Message Passing for Molecular Graphs

Proposes Ewald message passing, a Fourier-space scheme inspired by Ewald summation that captures long-range interactions in molecular graphs. The method is architecture-agnostic and improves energy MAEs by 10% on OC20 and 16% on OE62 across four baseline GNN models.

Machine Learning
Diagram showing the Lagrangian Neural Network pipeline from coordinates through a learned Lagrangian to energy-conserving dynamics

Lagrangian Neural Networks for Physics

Lagrangian Neural Networks (LNNs) use neural networks to parameterize arbitrary Lagrangians, enabling energy-conserving learned dynamics without canonical coordinates. Unlike Hamiltonian approaches, LNNs handle relativistic systems and extend to graphs via Lagrangian Graph Networks.

Machine Learning
Visualization of Liquid-S4 kernel decomposition showing input signal, S4 kernel, liquid kernel, and combined output

Liquid-S4: Input-Dependent State-Space Models

Liquid-S4 extends the S4 framework by incorporating a linearized liquid time-constant formulation that introduces input-dependent state transitions. This yields an additional convolutional kernel capturing input correlations, improving generalization across long-range sequence tasks.