Paper Information

Citation: Bigi, F., Langer, M. F., & Ceriotti, M. (2025). The dark side of the forces: assessing non-conservative force models for atomistic machine learning. Proceedings of the 42nd International Conference on Machine Learning (ICML).

Publication: ICML 2025

Additional Resources:

What kind of paper is this?

This is a Method paper that rigorously evaluates existing non-conservative force prediction approaches and proposes hybrid solutions combining the speed benefits of direct force prediction with the physical correctness of conservative models.

What is the motivation?

Many recent machine learning interatomic potential (MLIP) architectures predict forces directly ($F_\theta(r)$) rather than computing them as derivatives of energy ($F = -\nabla E_\theta(r)$). This “non-conservative” approach avoids the computational overhead of automatic differentiation, yielding faster inference (typically ~2x speedup). However, it sacrifices energy conservation and rotational constraints, potentially destabilizing molecular dynamics simulations. The field lacks rigorous quantification of when this trade-off breaks down and how to mitigate the failures.

What is the novelty here?

Three key contributions:

  1. Jacobian Asymmetry Metric ($\lambda$): A quantitative diagnostic for non-conservation. Since conservative forces derive from a scalar field, their Jacobian (the Hessian of energy) must be symmetric. The normalized norm of the antisymmetric part quantifies the degree of violation.

  2. Systematic Failure Mode Catalog: First comprehensive demonstration that non-conservative models cause runaway heating in NVE ensembles (temperature drifts of ~7,000-70,000 K/ns) and equipartition violations in NVT ensembles where different atom types equilibrate to different temperatures - a physical impossibility.

  3. Hybrid Training and Inference Protocol: A practical workflow that combines fast direct-force prediction with conservative corrections:

    • Training: Pre-train on direct forces, then fine-tune on energy gradients (3-4x faster than training conservative models from scratch)
    • Inference: Multiple Time-Stepping (MTS) where fast non-conservative forces are periodically corrected by slower conservative forces

What experiments were performed?

The evaluation systematically tests five state-of-the-art models across multiple simulation scenarios:

Models tested:

  • PET-C/PET-NC (Point Edge Transformer, conservative and non-conservative variants)
  • ORB-v2 (non-conservative, trained on Alexandria/MPtrj)
  • EquiformerV2 (non-conservative Transformer)
  • MACE-MP-0 (conservative message-passing)
  • SevenNet (conservative message-passing)
  • SOAP-BPNN (legacy descriptor-based baseline)

Test scenarios:

  1. NVE stability tests on bulk liquid water, graphene, amorphous carbon, FCC aluminum, benzene, alanine dipeptide, and benzene on graphene
  2. Thermostat artifact analysis with Langevin and GLE-RESPA thermostats
  3. Geometry optimization on QM9 molecules using FIRE and L-BFGS
  4. MTS validation on OC20 catalysis dataset
  5. Species-resolved temperature measurements for equipartition testing

Key metrics:

  • Jacobian asymmetry ($\lambda$)
  • Kinetic temperature drift in NVE
  • Velocity-velocity correlations
  • Radial distribution functions
  • Species-resolved temperatures
  • Inference speed benchmarks

What outcomes/conclusions?

Purely non-conservative models are unsuitable for production simulations due to uncontrollable unphysical artifacts that no thermostat can correct. Key findings:

Performance failures:

  • Non-conservative models (PET-NC, ORB, EquiformerV2) exhibited catastrophic temperature drift (7,000+ billion K/s) in NVE simulations
  • Strong thermostats damped diffusion by ~5x, negating speed benefits
  • Advanced GLE-RESPA thermostats failed to control drift (ORB reached 1181K vs. 300K target)
  • Equipartition violations: O and H atoms equilibrated at different temperatures (10% deviation)
  • Geometry optimization algorithms (FIRE, L-BFGS) failed to converge

Hybrid solution success:

  • MTS with non-conservative forces corrected every 8 steps achieved conservative stability with only ~20% overhead
  • Conservative fine-tuning reduced training time by >3x compared to training from scratch
  • Validated on OC20 catalysis benchmark

Recommendation: The optimal production path is hybrid architectures using direct forces for acceleration (via MTS and pre-training) while anchoring models in conservative energy surfaces. This captures computational benefits without sacrificing physical reliability.

Reproducibility Details

Data

Primary training/evaluation:

  • Bulk Liquid Water (Cheng et al., 2019): ~100k structures, revPBE0-D3 calculations, chosen for rigorous thermodynamic testing

Generalization tests:

  • Graphene, amorphous carbon, FCC aluminum, benzene, alanine dipeptide in water, benzene adsorbed on graphene

Benchmarks:

  • QM9: Geometry optimization tests
  • OC20 (Open Catalyst): Oxygen on alloy surfaces for MTS validation

All datasets publicly available through cited sources.

Models

Point Edge Transformer (PET) variants:

  • PET-C (Conservative): Forces via energy backpropagation
  • PET-NC (Non-Conservative): Direct force prediction head, slightly higher parameter count

Baseline comparisons:

ModelTypeTraining DataNotes
ORB-v2Non-conservativeAlexandria/MPtrjRotationally unconstrained
EquiformerV2Non-conservative-Equivariant Transformer
MACE-MP-0ConservativeMPtrjEquivariant message-passing
SevenNetConservative-Equivariant message-passing
SOAP-BPNNConservative-Legacy descriptor baseline

Training details:

  • Loss functions: PET-C uses joint Energy + Force $L^2$ loss; PET-NC uses Force-only $L^2$ loss
  • Fine-tuning protocol: PET-NC converted to conservative via energy head fine-tuning
  • MTS configuration: Non-conservative forces with conservative corrections every 8 steps ($M=8$)

Evaluation

Metrics:

  1. Jacobian asymmetry ($\lambda$): Quantifies non-conservation via antisymmetric component
  2. Temperature drift: NVE ensemble stability
  3. Velocity-velocity correlation ($\hat{c}_{vv}(\omega)$): Thermostat artifact detection
  4. Radial distribution functions ($g(r)$): Structural accuracy
  5. Species-resolved temperature: Equipartition testing
  6. Inference speed: Wall-clock time per MD step

Key results:

ModelSpeed (ms/step)NVE StabilityNotes
PET-NC8.58Failed~7,000 billion K/s drift
PET-C19.4Stable2.2x slower than PET-NC
ORB11.9FailedReached 1181K (target 300K)
SevenNet52.8StableConservative baseline
PET Hybrid (MTS)~10.3StableOnly 20% overhead vs. pure NC

Thermostat artifacts:

  • Langevin ($\tau=10$ fs) dampened diffusion by ~5x
  • GLE-RESPA failed to control non-conservative drift
  • Equipartition violations: O/H temperature difference up to 10%

Optimization failures:

  • FIRE and L-BFGS algorithms failed to converge with non-conservative forces on QM9 benchmarks

Hardware

Compute resources:

  • Nvidia H100 GPUs for training and benchmarking
  • Conservative fine-tuning achieved >3x training speedup vs. training conservative models from scratch

Reproduction resources: