Learning Smooth Interatomic Potentials with eSEN

Paper Overview

This is a method paper. It addresses a critical disconnect in the evaluation of Machine Learning Interatomic Potentials (MLIPs) and introduces a novel architecture, eSEN, designed based on insights from this analysis. The paper proposes a new standard for evaluating MLIPs beyond simple test-set errors.

The Energy Conservation Gap in MLIP Evaluation

The motivation addresses a well-known but under-addressed problem in the field: improvements in standard MLIP metrics (lower energy/force MAE on static test sets) do not reliably translate to better performance on complex downstream tasks like molecular dynamics (MD) simulations, materials stability prediction, or phonon calculations. The authors seek to understand why this gap exists and how to design models that are both accurate on test sets and physically reliable in practical scientific workflows.

The eSEN Architecture and Continuous Representation

The novelty is twofold, spanning both a conceptual framework for evaluation and a new model architecture:

Energy Conservation as a Diagnostic Test: The core conceptual contribution is using an MLIP’s ability to conserve energy in out-of-distribution MD simulations as a crucial diagnostic test. The authors demonstrate that for models passing this test, a strong correlation between test-set error and downstream task performance is restored.
The eSEN Architecture: The paper introduces the equivariant Smooth Energy Network (eSEN), designed with specific choices to ensure a smooth and well-behaved Potential Energy Surface (PES):
- Strictly Conservative Forces: Forces are computed exclusively as the negative gradient of energy ($F = -\nabla E$), using conservative force prediction instead of faster direct-force prediction heads.
- Continuous Representations: Maintains strict equivariance and smoothness by using equivariant gated non-linearities instead of discretizing spherical harmonic representations during nodewise processing.
- Smooth PES Construction: Critical design choices include using distance cutoffs, polynomial envelope functions ensuring derivatives go to zero at cutoffs, and limited radial basis functions to avoid overly sensitive PES.
Efficient Training Strategy: A two-stage training regimen with fast pre-training using a non-conservative direct-force model, followed by fine-tuning to enforce energy conservation. This captures the efficiency of direct-force training while ensuring physical robustness.

Evaluating OOD Energy Conservation and Physical Properties

The paper presents a comprehensive experimental validation:

Ablation Studies on Energy Conservation: MD simulations on out-of-distribution systems (TM23 and MD22 datasets) systematically tested key design choices (direct-force vs. conservative, representation discretization, neighbor limits, envelope functions). This empirically demonstrated which choices lead to energy drift despite negligible impact on test-set MAE.
Physical Property Prediction Benchmarks: The eSEN model was evaluated on challenging downstream tasks:
- Matbench-Discovery: Materials stability and thermal conductivity prediction, where eSEN achieved the highest F1 score among compliant models and excelled at both metrics simultaneously.
- MDR Phonon Benchmark: Predicting phonon properties that test accurate second and third-order derivatives of the PES. eSEN achieved state-of-the-art results, particularly outperforming direct-force models.
- SPICE-MACE-OFF: Standard energy and force prediction on organic molecules, demonstrating that physical plausibility design choices enhanced raw accuracy.
Correlation Analysis: Explicit plots of test-set energy MAE versus performance on downstream benchmarks showed weak overall correlation that becomes strong and predictive when restricted to models passing the energy conservation test.

Outcomes and Conclusions

Primary Conclusion: Energy conservation is a critical, practical property for MLIPs. Using it as a filter re-establishes test-set error as a reliable proxy for model development, dramatically accelerating the innovation cycle. Models that are not conservative, even with low test error, are unreliable for many critical scientific applications.
Model Performance: The eSEN architecture outperforms base models across diverse tasks, from energy/force prediction to geometry optimization, phonon calculations, and thermal conductivity prediction.
Actionable Design Principles: The paper provides experimentally-validated architectural choices that promote physical plausibility. Seemingly minor details, like how atomic neighbors are selected, can have profound impacts on a model’s utility in simulations.
Efficient Path to Robust Models: The direct-force pre-training plus conservative fine-tuning strategy offers a practical method for developing physically robust models without incurring the full computational cost of conservative training from scratch.

Reproducibility Details

Models

The eSEN architecture builds on components from eSCN (Equivariant Spherical Channel Network) and Equiformer, combining them with design choices that prioritize smoothness and energy conservation. The implementation integrates into the standard fairchem Open Catalyst experimental framework.

Layer Structure

Edgewise Convolution: Uses SO2 convolution layers (from eSCN) with an envelope function applied. Source and target embeddings are concatenated before convolution.
Nodewise Feed-Forward: Two equivariant linear layers with an intermediate SiLU-based gated non-linearity (from Equiformer).
Normalization: Equivariant Layer Normalization (from Equiformer).

Smoothness Design Choices

Several architectural decisions distinguish eSEN from prior work:

No Grid Projection: eSEN performs operations directly in the spherical harmonic space to maintain equivariance and energy conservation, bypassing the projection of spherical harmonics to spatial grids for non-linearity.
Distance Cutoff for Graph Construction: Uses a strict distance cutoff (6 Å for MPTrj models, 5 Å for SPICE models). Neighbor limits introduce discontinuities that break energy conservation.
Polynomial Envelope Functions: Ensures derivatives go to zero smoothly at the cutoff radius.

Algorithms

Two-Stage Training (eSEN-30M-MP)

Direct-Force Pre-training (60 epochs): Uses DeNS (Denoising Non-equilibrium Structures) to reduce overfitting. This stage is fast because it does not require backpropagation through energy gradients.
Conservative Fine-tuning (40 epochs): The direct-force head is removed, and forces are calculated via gradients ($F = -\nabla E$). This enforces energy conservation.

Important: DeNS is used exclusively during the direct-force pre-training stage, with a noising probability of 0.5, a standard deviation of 0.1 Å for the added Gaussian noise, and a DeNS loss coefficient of 10. The fine-tuning strategy reduces the wall-clock time for model training by 40% compared to training a conservative model from scratch for the same number of total epochs.

Optimization

Optimizer: AdamW with cosine learning rate scheduler
Max Learning Rate: $4 \times 10^{-4}$
Batch Size: 512 (for MPTrj models)
Weight Decay: $1 \times 10^{-3}$
Gradient Clipping: Norm of 100
Warmup: 0.1 epochs with a factor of 0.2

Loss Function

A composite loss combining per-atom energy MAE, force $L_2$ loss, and stress MAE:

$$ \begin{aligned} \mathcal{L} = \lambda_{\text{e}} \frac{1}{N} \sum_{i=1}^N \lvert E_{i} - \hat{E}_{i} \rvert + \lambda_{\text{f}} \frac{1}{3N} \sum_{i=1}^N \lVert \mathbf{F}_{i} - \hat{\mathbf{F}}_{i} \rVert_2^2 + \lambda_{\text{s}} \lVert \mathbf{S} - \hat{\mathbf{S}} \rVert_1 \end{aligned} $$

For MPTrj-30M, the weighting coefficients are set to $\lambda_{\text{e}} = 20$, $\lambda_{\text{f}} = 20$, and $\lambda_{\text{s}} = 5$.

Data

Training Data

Inorganic: MPTrj (Materials Project Trajectory) dataset
Organic: SPICE-MACE-OFF dataset

Test Data Construction

MPTrj Testing: Since MPTrj lacks an official test split, the authors created a test set using 5,000 random samples from the subsampled Alexandria (sAlex) dataset to ensure fair comparison.
Out-of-Distribution Conservation Testing:
- Inorganic: TM23 dataset (transition metal defects). Simulation: 100 ps, 5 fs timestep.
- Organic: MD22 dataset (large molecules). Simulation: 100 ps, 1 fs timestep.

Hardware

Compute for training operations predominantly utilizes 80GB NVIDIA A100 GPUs.

Inference Efficiency

For a periodic system of 216 atoms on a single A100 (PyTorch 2.4.0, CUDA 12.1, no compile/torchscript), the 2-layer eSEN models achieve approximately 0.4 million steps per day (3.2M parameters) and 0.8 million steps per day (6.5M parameters), comparable to MACE-OFF-L at 0.7 million steps per day.

Evaluation

The paper evaluated eSEN across three major benchmark tasks. Key evaluation metrics included energy MAE (meV/atom), force MAE (meV/Å), stress MAE (meV/Å/atom), F1 score for stability prediction, $\kappa_{\text{SRME}}$ for thermal conductivity, and phonon frequency accuracy.

Ablation Test-Set MAE (Table 1)

Design choices that dramatically affect energy conservation have negligible impact on static test-set MAE, which is precisely why test-set error alone is misleading. All models are 2-layer with 3.2M parameters, $L_{\text{max}} = 2$, $M_{\text{max}} = 2$:

Model	Energy MAE	Force MAE	Stress MAE
eSEN (default)	17.02	43.96	0.14
eSEN, direct-force	18.66	43.62	0.16
eSEN, neighbor limit	17.30	44.11	0.14
eSEN, no envelope	17.60	44.69	0.14
eSEN, $N_{\text{basis}} = 512$	19.87	48.29	0.15
eSEN, Bessel	17.65	44.83	0.15
eSEN, discrete, res=6	17.05	43.10	0.14
eSEN, discrete, res=10	17.11	43.13	0.14
eSEN, discrete, res=14	17.12	43.09	0.14

Energy MAE in meV/atom. Force MAE in meV/Å. Stress MAE in meV/Å/atom.

Matbench-Discovery (Tables 2 and 3)

Compliant models (trained only on MPTrj or its subset), unique prototype split:

Model	F1	DAF	$\kappa_{\text{SRME}}$	RMSD
eSEN-30M-MP	0.831	5.260	0.340	0.0752
eqV2-S-DeNS	0.815	5.042	1.676	0.0757
MatRIS-MP	0.809	5.049	0.861	0.0773
AlphaNet-MP	0.799	4.863	1.31	0.1067
DPA3-v2-MP	0.786	4.822	0.959	0.0823
ORB v2 MPtrj	0.765	4.702	1.725	0.1007
SevenNet-13i5	0.760	4.629	0.550	0.0847
GRACE-2L-MPtrj	0.691	4.163	0.525	0.0897
MACE-MP-0	0.669	3.777	0.647	0.0915
CHGNet	0.613	3.361	1.717	0.0949
M3GNet	0.569	2.882	1.412	0.1117

eSEN-30M-MP excels at both F1 and $\kappa_{\text{SRME}}$ simultaneously, while all previous models only achieve SOTA on one or the other.

Non-compliant models (trained on additional datasets):

Model	F1	$\kappa_{\text{SRME}}$	RMSD
eSEN-30M-OAM	0.925	0.170	0.0608
eqV2-M-OAM	0.917	1.771	0.0691
ORB v3	0.905	0.210	0.0750
SevenNet-MF-ompa	0.901	0.317	0.0639
DPA3-v2-OpenLAM	0.890	0.687	0.0679
GRACE-2L-OAM	0.880	0.294	0.0666
MatterSim-v1-5M	0.862	0.574	0.0733
MACE-MPA-0	0.852	0.412	0.0731

The eSEN-30M-OAM model starts from eSEN-30M-OMat (trained on OMat24), then is fine-tuned for 1 epoch on a dataset combining sAlex and 8 copies of MPTrj.

MDR Phonon Benchmark (Table 4)

Metrics: maximum phonon frequency MAE($\omega_{\text{max}}$) in K, vibrational entropy MAE($S$) in J/K/mol, Helmholtz free energy MAE($F$) in kJ/mol, heat capacity MAE($C_V$) in J/K/mol.

Model	MAE($\omega_{\text{max}}$)	MAE($S$)	MAE($F$)	MAE($C_V$)
eSEN-30M-MP	21	13	5	4
SevenNet-13i5	26	28	10	5
GRACE-2L (r6)	40	25	9	5
SevenNet-0	40	48	19	9
MACE	61	60	24	13
CHGNet	89	114	45	21
M3GNet	98	150	56	22

Direct-force models show dramatically worse performance at the standard 0.01 Å displacement (e.g., eqV2-S-DeNS: 280/224/54/94) but improve at larger displacements (0.2 Å: 58/26/8/8), revealing that their PES is rough near energy minima.

SPICE-MACE-OFF (Table 5)

Test set MAE for organic molecule energy/force prediction. Energy MAE in meV/atom, force MAE in meV/Å:

Dataset	MACE-4.7M (E/F)	EscAIP-45M* (E/F)	eSEN-3.2M (E/F)	eSEN-6.5M (E/F)
PubChem	0.88 / 14.75	0.53 / 5.86	0.22 / 6.10	0.15 / 4.21
DES370K M.	0.59 / 6.58	0.41 / 3.48	0.17 / 1.85	0.13 / 1.24
DES370K D.	0.54 / 6.62	0.38 / 2.18	0.20 / 2.77	0.15 / 2.12
Dipeptides	0.42 / 10.19	0.31 / 5.21	0.10 / 3.04	0.07 / 2.00
Sol. AA	0.98 / 19.43	0.61 / 11.52	0.30 / 5.76	0.25 / 3.68
Water	0.83 / 13.57	0.72 / 10.31	0.24 / 3.88	0.15 / 2.50
QMugs	0.45 / 16.93	0.41 / 8.74	0.16 / 5.70	0.12 / 3.78

*EscAIP-45M is a direct-force model. eSEN-6.5M outperforms MACE-OFF-L and EscAIP on all test splits. The smaller eSEN-3.2M has inference efficiency comparable to MACE-4.7M while achieving lower MAE.

Why These Design Choices Matter

Bounded Energy Derivatives and the Verlet Integrator

The theoretical foundation for why smoothness matters comes from Theorem 5.1 of Hairer et al. (2003). For the Verlet integrator (the standard NVE integrator), the total energy drift satisfies:

$$ |E(\mathbf{r}_T, \mathbf{a}) - E(\mathbf{r}_0, \mathbf{a})| \leq C \Delta t^2 + C_N \Delta t^N T $$

where $T$ is the total simulation time ($T \leq \Delta t^{-N}$), $N$ is the highest order for which the $N$th derivative of $E$ is continuously differentiable with bounded derivative, and $C$, $C_N$ are constants independent of $T$ and $\Delta t$. The first term is a time-independent fluctuation of $O(\Delta t^2)$; the second term governs long-term conservation. This means the PES must be continuously differentiable to high order, with bounded derivatives, for energy conservation in long-time simulations.

Architectural Choices That Break Conservation

The authors provide theoretical justification for why specific architectural choices break energy conservation:

Max Neighbor Limit (KNN): Introduces discontinuity in the PES. If a neighbor at distance $r$ moves to $r + \epsilon$ and drops out of the top-$K$, the energy changes discontinuously.
Grid Discretization: Projecting spherical harmonics to a spatial grid introduces discretization errors in energy gradients that break conservation. This can be mitigated with higher-resolution grids but not eliminated.
Direct-Force Prediction: Imposes no mathematical constraint that forces must be the gradient of an energy scalar field. In other words, $\nabla \times \mathbf{F} \neq 0$ is permitted, violating the requirement for a conservative force field.

Displacement Sensitivity in Phonon Calculations

An important empirical finding concerns how displacement values affect phonon predictions. Conservative models (eSEN, MACE) show convergent phonon band structures as displacement decreases toward zero. In contrast, direct-force models (eqV2-S-DeNS) fail to converge, exhibiting missing acoustic branches and spurious imaginary frequencies at small displacements. While direct-force models achieve competitive thermodynamic property accuracy at large displacements (0.2 Å), this is deceptive: the underlying phonon band structures remain inaccurate, and the apparent accuracy comes from Boltzmann-weighted integrals smoothing over errors.

Paper Information

Citation: Fu, X., Wood, B. M., Barroso-Luque, L., Levine, D. S., Gao, M., Dzamba, M., & Zitnick, C. L. (2025). Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction. Proceedings of the 42nd International Conference on Machine Learning (ICML).

Publication: ICML 2025

@inproceedings{fu2025learning,
  title={Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction},
  author={Fu, Xiang and Wood, Brandon M. and Barroso-Luque, Luis and Levine, Daniel S. and Gao, Meng and Dzamba, Misko and Zitnick, C. Lawrence},
  booktitle={Proceedings of the 42nd International Conference on Machine Learning},
  year={2025}
}

Additional Resources:

Paper Overview#

The Energy Conservation Gap in MLIP Evaluation#

The eSEN Architecture and Continuous Representation#

Evaluating OOD Energy Conservation and Physical Properties#

Outcomes and Conclusions#

Reproducibility Details#

Models#

Layer Structure#

Smoothness Design Choices#

Algorithms#

Two-Stage Training (eSEN-30M-MP)#

Optimization#

Loss Function#

Data#

Training Data#

Test Data Construction#

Hardware#

Inference Efficiency#

Evaluation#

Ablation Test-Set MAE (Table 1)#

Matbench-Discovery (Tables 2 and 3)#

MDR Phonon Benchmark (Table 4)#

SPICE-MACE-OFF (Table 5)#

Why These Design Choices Matter#

Bounded Energy Derivatives and the Verlet Integrator#

Architectural Choices That Break Conservation#

Displacement Sensitivity in Phonon Calculations#

Paper Information#