Paper Information
Citation: Bigi, F., Langer, M. F., & Ceriotti, M. (2025). The dark side of the forces: assessing non-conservative force models for atomistic machine learning. Proceedings of the 42nd International Conference on Machine Learning (ICML).
Publication: ICML 2025
Additional Resources:
- ICML 2025 poster page
- PDF on OpenReview
- Zenodo repository
- MTS Inference Tutorial
- Conservative Fine-Tuning Tutorial
What kind of paper is this?
This is a Method paper that rigorously evaluates existing non-conservative force prediction approaches and proposes hybrid solutions combining the speed benefits of direct force prediction with the physical correctness of conservative models.
What is the motivation?
Many recent machine learning interatomic potential (MLIP) architectures predict forces directly ($F_\theta(r)$) rather than computing them as derivatives of energy ($F = -\nabla E_\theta(r)$). This “non-conservative” approach avoids the computational overhead of automatic differentiation, yielding faster inference (typically ~2x speedup). However, it sacrifices energy conservation and rotational constraints, potentially destabilizing molecular dynamics simulations. The field lacks rigorous quantification of when this trade-off breaks down and how to mitigate the failures.
What is the novelty here?
Three key contributions:
Jacobian Asymmetry Metric ($\lambda$): A quantitative diagnostic for non-conservation. Since conservative forces derive from a scalar field, their Jacobian (the Hessian of energy) must be symmetric. The normalized norm of the antisymmetric part quantifies the degree of violation.
Systematic Failure Mode Catalog: First comprehensive demonstration that non-conservative models cause runaway heating in NVE ensembles (temperature drifts of ~7,000-70,000 K/ns) and equipartition violations in NVT ensembles where different atom types equilibrate to different temperatures - a physical impossibility.
Hybrid Training and Inference Protocol: A practical workflow that combines fast direct-force prediction with conservative corrections:
- Training: Pre-train on direct forces, then fine-tune on energy gradients (3-4x faster than training conservative models from scratch)
- Inference: Multiple Time-Stepping (MTS) where fast non-conservative forces are periodically corrected by slower conservative forces
What experiments were performed?
The evaluation systematically tests five state-of-the-art models across multiple simulation scenarios:
Models tested:
- PET-C/PET-NC (Point Edge Transformer, conservative and non-conservative variants)
- ORB-v2 (non-conservative, trained on Alexandria/MPtrj)
- EquiformerV2 (non-conservative Transformer)
- MACE-MP-0 (conservative message-passing)
- SevenNet (conservative message-passing)
- SOAP-BPNN (legacy descriptor-based baseline)
Test scenarios:
- NVE stability tests on bulk liquid water, graphene, amorphous carbon, FCC aluminum, benzene, alanine dipeptide, and benzene on graphene
- Thermostat artifact analysis with Langevin and GLE-RESPA thermostats
- Geometry optimization on QM9 molecules using FIRE and L-BFGS
- MTS validation on OC20 catalysis dataset
- Species-resolved temperature measurements for equipartition testing
Key metrics:
- Jacobian asymmetry ($\lambda$)
- Kinetic temperature drift in NVE
- Velocity-velocity correlations
- Radial distribution functions
- Species-resolved temperatures
- Inference speed benchmarks
What outcomes/conclusions?
Purely non-conservative models are unsuitable for production simulations due to uncontrollable unphysical artifacts that no thermostat can correct. Key findings:
Performance failures:
- Non-conservative models (PET-NC, ORB, EquiformerV2) exhibited catastrophic temperature drift (7,000+ billion K/s) in NVE simulations
- Strong thermostats damped diffusion by ~5x, negating speed benefits
- Advanced GLE-RESPA thermostats failed to control drift (ORB reached 1181K vs. 300K target)
- Equipartition violations: O and H atoms equilibrated at different temperatures (10% deviation)
- Geometry optimization algorithms (FIRE, L-BFGS) failed to converge
Hybrid solution success:
- MTS with non-conservative forces corrected every 8 steps achieved conservative stability with only ~20% overhead
- Conservative fine-tuning reduced training time by >3x compared to training from scratch
- Validated on OC20 catalysis benchmark
Recommendation: The optimal production path is hybrid architectures using direct forces for acceleration (via MTS and pre-training) while anchoring models in conservative energy surfaces. This captures computational benefits without sacrificing physical reliability.
Reproducibility Details
Data
Primary training/evaluation:
- Bulk Liquid Water (Cheng et al., 2019): ~100k structures, revPBE0-D3 calculations, chosen for rigorous thermodynamic testing
Generalization tests:
- Graphene, amorphous carbon, FCC aluminum, benzene, alanine dipeptide in water, benzene adsorbed on graphene
Benchmarks:
- QM9: Geometry optimization tests
- OC20 (Open Catalyst): Oxygen on alloy surfaces for MTS validation
All datasets publicly available through cited sources.
Models
Point Edge Transformer (PET) variants:
- PET-C (Conservative): Forces via energy backpropagation
- PET-NC (Non-Conservative): Direct force prediction head, slightly higher parameter count
Baseline comparisons:
| Model | Type | Training Data | Notes |
|---|---|---|---|
| ORB-v2 | Non-conservative | Alexandria/MPtrj | Rotationally unconstrained |
| EquiformerV2 | Non-conservative | - | Equivariant Transformer |
| MACE-MP-0 | Conservative | MPtrj | Equivariant message-passing |
| SevenNet | Conservative | - | Equivariant message-passing |
| SOAP-BPNN | Conservative | - | Legacy descriptor baseline |
Training details:
- Loss functions: PET-C uses joint Energy + Force $L^2$ loss; PET-NC uses Force-only $L^2$ loss
- Fine-tuning protocol: PET-NC converted to conservative via energy head fine-tuning
- MTS configuration: Non-conservative forces with conservative corrections every 8 steps ($M=8$)
Evaluation
Metrics:
- Jacobian asymmetry ($\lambda$): Quantifies non-conservation via antisymmetric component
- Temperature drift: NVE ensemble stability
- Velocity-velocity correlation ($\hat{c}_{vv}(\omega)$): Thermostat artifact detection
- Radial distribution functions ($g(r)$): Structural accuracy
- Species-resolved temperature: Equipartition testing
- Inference speed: Wall-clock time per MD step
Key results:
| Model | Speed (ms/step) | NVE Stability | Notes |
|---|---|---|---|
| PET-NC | 8.58 | Failed | ~7,000 billion K/s drift |
| PET-C | 19.4 | Stable | 2.2x slower than PET-NC |
| ORB | 11.9 | Failed | Reached 1181K (target 300K) |
| SevenNet | 52.8 | Stable | Conservative baseline |
| PET Hybrid (MTS) | ~10.3 | Stable | Only 20% overhead vs. pure NC |
Thermostat artifacts:
- Langevin ($\tau=10$ fs) dampened diffusion by ~5x
- GLE-RESPA failed to control non-conservative drift
- Equipartition violations: O/H temperature difference up to 10%
Optimization failures:
- FIRE and L-BFGS algorithms failed to converge with non-conservative forces on QM9 benchmarks
Hardware
Compute resources:
- Nvidia H100 GPUs for training and benchmarking
- Conservative fine-tuning achieved >3x training speedup vs. training conservative models from scratch
Reproduction resources:
