Paper Information
Citation: Fu, X., Wood, B. M., Barroso-Luque, L., Levine, D. S., Gao, M., Dzamba, M., & Zitnick, C. L. (2025). Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction. Proceedings of the 42nd International Conference on Machine Learning (ICML).
Publication: ICML 2025
Additional Resources:
- ICML 2025 poster page
- OpenReview forum
- PDF on OpenReview
- OMAT24 model on Hugging Face
- Code on GitHub (fairchem)
What kind of paper is this?
This is a method paper. It addresses a critical disconnect in the evaluation of Machine Learning Interatomic Potentials (MLIPs) and introduces a novel architecture, eSEN, designed based on insights from this analysis. The paper proposes a new standard for evaluating MLIPs beyond simple test-set errors.
What is the motivation?
The motivation addresses a well-known but under-addressed problem in the field: improvements in standard MLIP metrics (lower energy/force MAE on static test sets) do not reliably translate to better performance on complex downstream tasks like molecular dynamics (MD) simulations, materials stability prediction, or phonon calculations. The authors seek to understand why this gap exists and how to design models that are both accurate on test sets and physically reliable in practical scientific workflows.
What is the novelty here?
The novelty is twofold, spanning both a conceptual framework for evaluation and a new model architecture:
Energy Conservation as a Diagnostic Test: The core conceptual contribution is using an MLIP’s ability to conserve energy in out-of-distribution MD simulations as a crucial diagnostic test. The authors demonstrate that for models passing this test, a strong correlation between test-set error and downstream task performance is restored.
The eSEN Architecture: The paper introduces the equivariant Smooth Energy Network (eSEN), designed with specific choices to ensure a smooth and well-behaved Potential Energy Surface (PES):
- Strictly Conservative Forces: Forces are computed exclusively as the negative gradient of energy ($F = -\nabla E$), avoiding faster but non-conservative direct-force prediction heads.
- Continuous Representations: Avoids discretizing spherical harmonic representations during nodewise processing, using equivariant gated non-linearities to maintain strict equivariance and smoothness.
- Smooth PES Construction: Critical design choices include using distance cutoffs instead of fixed neighbor counts, polynomial envelope functions ensuring derivatives go to zero at cutoffs, and limited radial basis functions to avoid overly sensitive PES.
Efficient Training Strategy: A two-stage training regimen with fast pre-training using a non-conservative direct-force model, followed by fine-tuning to enforce energy conservation. This captures the efficiency of direct-force training while ensuring physical robustness.
What experiments were performed?
The paper presents a comprehensive experimental validation:
Ablation Studies on Energy Conservation: MD simulations on out-of-distribution systems (TM23 and MD22 datasets) systematically tested key design choices (direct-force vs. conservative, representation discretization, neighbor limits, envelope functions). This empirically demonstrated which choices lead to energy drift despite negligible impact on test-set MAE.
Physical Property Prediction Benchmarks: The eSEN model was evaluated on challenging downstream tasks:
- Matbench-Discovery: Materials stability and thermal conductivity prediction, where eSEN achieved the highest F1 score among compliant models and excelled at both metrics simultaneously.
- MDR Phonon Benchmark: Predicting phonon properties that test accurate second and third-order derivatives of the PES. eSEN achieved state-of-the-art results, particularly outperforming direct-force models.
- SPICE-MACE-OFF: Standard energy and force prediction on organic molecules, demonstrating that physical plausibility design choices enhanced rather than compromised raw accuracy.
Correlation Analysis: Explicit plots of test-set energy MAE versus performance on downstream benchmarks showed weak overall correlation that becomes strong and predictive when restricted to models passing the energy conservation test.
What outcomes/conclusions?
Primary Conclusion: Energy conservation is a critical, practical property for MLIPs. Using it as a filter re-establishes test-set error as a reliable proxy for model development, dramatically accelerating the innovation cycle. Models that are not conservative, even with low test error, are unreliable for many critical scientific applications.
Model Performance: The eSEN architecture achieves state-of-the-art performance across diverse tasks, from energy/force prediction to geometry optimization, phonon calculations, and thermal conductivity prediction.
Actionable Design Principles: The paper provides experimentally-validated architectural choices that promote physical plausibility. Seemingly minor details, like how atomic neighbors are selected, can have profound impacts on a model’s utility in simulations.
Efficient Path to Robust Models: The direct-force pre-training plus conservative fine-tuning strategy offers a practical method for developing physically robust models without incurring the full computational cost of conservative training from scratch.
Reproducibility Details
Models
The eSEN architecture builds on components from eSCN (Equivariant Spherical Channel Network) and Equiformer, combining them with design choices that prioritize smoothness and energy conservation.
Layer Structure
- Edgewise Convolution: Uses
SO2convolution layers (from eSCN) with an envelope function applied. Source and target embeddings are concatenated before convolution. - Nodewise Feed-Forward: Two equivariant linear layers with an intermediate SiLU-based gated non-linearity (from Equiformer).
- Normalization: Equivariant Layer Normalization (from Equiformer).
Smoothness Design Choices
Several architectural decisions distinguish eSEN from prior work:
- No Grid Projection: Unlike eSCN and EquiformerV2, eSEN avoids projecting spherical harmonics to spatial grids for non-linearity. Instead, it performs operations directly in the spherical harmonic space to maintain equivariance and energy conservation.
- Distance Cutoff for Graph Construction: Uses a strict distance cutoff (6 Å) rather than a maximum neighbor limit (KNN). Neighbor limits introduce discontinuities that break energy conservation.
- Polynomial Envelope Functions: Ensures derivatives go to zero smoothly at the cutoff radius.
Algorithms
Two-Stage Training (eSEN-30M-MP)
- Direct-Force Pre-training (60 epochs): Uses DeNS (Denoising Non-equilibrium Structures) to reduce overfitting. This stage is fast because it does not require backpropagation through energy gradients.
- Conservative Fine-tuning (40 epochs): The direct-force head is removed, and forces are calculated via gradients ($F = -\nabla E$). This enforces energy conservation.
Important: DeNS is only used during the direct-force pre-training stage. It is not applied during conservative fine-tuning. An ablation study showed that pre-training without DeNS results in worse final performance after fine-tuning (validation energy MAE drops from 19.3 to 17.6 meV/atom when DeNS is used).
Optimization
- Optimizer: AdamW with cosine learning rate scheduler
- Max Learning Rate: $4 \times 10^{-4}$
- Batch Size: 512 (for MPTrj models)
- Weight Decay: $1 \times 10^{-3}$
- Gradient Clipping: Norm of 100
- Warmup: 0.1 epochs with a factor of 0.2
Loss Function
A composite loss combining:
- Per-atom energy MAE
- Force $L_2$ loss
- Stress MAE
For MPTrj-30M, the loss coefficients are: Energy (20), Force (20), Stress (5).
Data
Training Data
- Inorganic: MPTrj (Materials Project Trajectory) dataset
- Organic: SPICE-MACE-OFF dataset
Test Data Construction
- MPTrj Testing: Since MPTrj lacks an official test split, the authors created a test set using 5,000 random samples from the subsampled Alexandria (sAlex) dataset to ensure fair comparison.
- Out-of-Distribution Conservation Testing:
- Inorganic: TM23 dataset (transition metal defects). Simulation: 100 ps, 5 fs timestep.
- Organic: MD22 dataset (large molecules). Simulation: 100 ps, 1 fs timestep.
Hardware
Inference benchmarking was performed on a single 80GB NVIDIA A100 GPU.
Inference Efficiency
A 2-layer eSEN model (3.2M parameters) can simulate approximately 0.8 million steps per day for a periodic system of 216 atoms on a single A100. This confirms eSEN has comparable efficiency to existing equivariant models while achieving higher accuracy.
Training Stability
Error bars across 3 random seeds for MPTrj training confirm the method’s stability:
- Energy MAE: $19.67 \pm 0.23$ meV/atom
- Forces MAE: $43.85 \pm 0.058$ meV/Å
- Stress MAE: $0.16 \pm 0.00038$ meV/ų
Evaluation
The paper evaluated eSEN across three major benchmark tasks:
- Matbench-Discovery: Materials stability and thermal conductivity prediction, where eSEN achieved the highest F1 score among compliant models.
- MDR Phonon Benchmark: Predicting phonon properties that test accurate second and third-order derivatives of the PES. eSEN achieved state-of-the-art results.
- SPICE-MACE-OFF: Standard energy and force prediction on organic molecules.
Key evaluation metrics included energy MAE, force MAE, stress MAE, F1 score for stability prediction, and phonon frequency accuracy.
Why These Design Choices Matter
The authors provide theoretical justification for why specific architectural choices break energy conservation:
- Max Neighbor Limit (KNN): Introduces discontinuity in the PES. If a neighbor at distance $r$ moves to $r + \epsilon$ and drops out of the top-$K$, the energy changes discontinuously.
- Grid Discretization: Projecting spherical harmonics to a spatial grid introduces discretization errors in energy gradients that break conservation.
- Direct-Force Prediction: Imposes no mathematical constraint that forces must be the gradient of an energy scalar field. In other words, $\nabla \times F \neq 0$ is permitted, violating the requirement for a conservative force field.
