Introduction

Understanding how individual atoms move on crystal surfaces is fundamental to materials science, catalysis, and nanotechnology. This atomic-scale motion, called adatom diffusion, drives processes like thin film growth and surface chemical reactions.

While learning molecular dynamics simulations for my graduate work, I discovered these simulations generate valuable training data for machine learning models. This tutorial walks through simulating copper adatom diffusion on a Cu(100) surface using LAMMPS, building on Eric N. Hahn’s excellent adatom tutorial.

What you’ll learn:

  • Setting up LAMMPS for surface diffusion simulations
  • Understanding simulation parameters and their impact
  • Visualizing results with Ovito
  • Analyzing trajectory data for ML applications
  • Connecting simulation data to machine learning workflows

In follow-up posts, I’ll explore different elements (like Pt) and show how atomic properties affect diffusion behavior—valuable insights for training element-aware ML models.

Update: I’ve now written about Pt adatom diffusion, which shows how heavier atoms behave quite differently on surfaces—great for understanding element-specific effects in ML models.

Understanding Adatoms and Surface Diffusion

What is an Adatom?

An adatom (adsorbed atom) sits on a crystal surface but isn’t incorporated into the bulk structure. Unlike bulk atoms coordinated by multiple neighbors, adatoms have fewer bonds, making them highly mobile and reactive.

Ball model representation of a real (atomically rough) crystal surface with steps, kinks, adatoms, and vacancies in a closely-packed crystalline material. Adsorbed molecules, substitutional and interstitial atoms are also illustrated.
Ball model representation of a real (atomically rough) crystal surface with steps, kinks, adatoms, and vacancies in a closely-packed crystalline material. Adsorbed molecules, substitutional and interstitial atoms are also illustrated. (CC-BY-SA-4.0: ShutterWaves)

Why Study Adatom Diffusion?

Adatom diffusion is important for several technological processes:

  • Thin film growth: Adatoms are the building blocks of deposited films
  • Catalysis: Many reactions happen at these mobile surface atoms
  • Corrosion: How surface atoms move affects material degradation
  • Self-assembly: Adatom movement enables formation of ordered structures

From a machine learning perspective, adatom diffusion is an ideal test case because:

  • Well-understood physics provides ground truth for validation
  • Small system size enables extensive simulation
  • Behavior varies significantly with temperature and atomic species
  • Systematic data generation across different conditions

Why Cu(100)?

Cu(100) surfaces are well-studied in literature, making them excellent benchmarks. The face-centered cubic (fcc) structure creates clear diffusion pathways, and copper’s moderate binding energy lets us observe diffusion at reasonable temperatures without extreme computational demands.

Simulation Overview

Before diving into the code details, let’s understand the simulation design:

Key Simulation Parameters

ParameterValueWhy this choice
System size8×8×6 unit cellsLarge enough to avoid edge effects while keeping simulation time reasonable
EnsembleNVT (constant volume, temperature)Appropriate for surface studies where pressure isn’t the focus
PotentialEAM (Embedded Atom Method)Captures metallic bonding better than simple pair potentials
Time step5 fsSmall enough for numerical stability while allowing reasonable run times
Duration500 psLong enough to see multiple diffusion events
Temperature600 K (bulk), 850 K (surface)Creates thermal gradient that drives surface motion

Simulation Strategy

The approach uses a thermal gradient setup:

  • Bottom layers: Fixed to represent bulk crystal
  • Middle layers: Heated to 850 K for thermal energy
  • Top layers and adatom: Equilibrate to ~600 K for diffusion
  • This mimics real heat flow from substrate to surface

The complete LAMMPS script implementing this approach:

### Original Created by Eric N. Hahn  ###
### [email protected] ###

### Modifications by Hunter Heidenreich, CSE lab (Harvard, 2023)
### [email protected]
### 2023-09-01

### Simulating adatoms ###
### Version 0.2 ###


units metal
dimension 3
boundary p p s
atom_style atomic

lattice fcc 3.614
variable cubel equal 4
variable fixer1 equal "v_cubel+2"
variable fixer2 equal "v_cubel+1.49"
region  box block -${cubel} ${cubel} -${cubel} ${cubel} -${fixer1} 1 units lattice
region cbox block -${cubel} ${cubel} -${cubel} ${cubel} -${fixer1} 0 units lattice
create_box 1 box
create_atoms 1 region cbox
create_atoms 1 single -0.5 0 0.5 units lattice
region hold block INF INF INF INF -${fixer1} -${fixer2} units lattice
region temp block INF INF INF INF -${fixer2} -${cubel} units lattice
group hold region hold
group temp region temp

pair_style eam/alloy
pair_coeff * * Cu01.eam.alloy Cu

timestep        0.005
compute         new all temp
velocity        temp create 600 12345
fix heater temp temp/rescale 1 850 850 5 1
fix nve all nve
fix freeze hold setforce 0 0 0

variable e     equal pe
variable k     equal ke
variable t     equal etotal
variable T     equal temp
fix energy all ave/time 1 50 50 v_k v_e v_t v_T file energy_avg.txt

minimize 1.0e-4 1.0e-6 1000 10000

dump eve all custom 5 dump.lammpstrj id type xu yu zu   # fx fy fz  # uncomment for forces
dump_modify eve sort id

thermo 50
run 100000  # 100_000 * 5 fs = 500 ps

Line-by-Line Breakdown

Let’s examine each part of the LAMMPS script:

Simulation Setup

Units

units metal

Sets simulation units to “metal” units—a standard choice for metallic systems. Key conversions: length in Å, energy in eV, time in ps. Full details in the LAMMPS documentation.

dimension 3

Sets 3D simulation.

boundary p p s

Boundary conditions: periodic in x,y (infinite surface) and shrink-wrapped in z (finite surface height). This allows the adatom to potentially leave the surface if needed.

atom_style atomic

Uses “atomic” style—atoms as point masses without internal structure. Standard for metallic systems.

Lattice

lattice fcc 3.614

Defines face-centered cubic lattice with experimental Cu lattice constant (3.614 Å).

variable cubel equal 4
variable fixer1 equal "v_cubel+2"
variable fixer2 equal "v_cubel+1.49"

Define variables for simulation box dimensions. cubel=4 sets system size, while fixer1 and fixer2 define the frozen and heated regions.

region  box block -${cubel} ${cubel} -${cubel} ${cubel} -${fixer1} 1 units lattice
region cbox block -${cubel} ${cubel} -${cubel} ${cubel} -${fixer1} 0 units lattice

Define regions: box for the entire simulation volume and cbox for crystal creation (excludes the surface layer where we’ll place the adatom).

create_box 1 box
create_atoms 1 region cbox
create_atoms 1 single -0.5 0 0.5 units lattice

Create simulation box, populate with Cu atoms, then add single adatom at specified position.

region hold block INF INF INF INF -${fixer1} -${fixer2} units lattice
region temp block INF INF INF INF -${fixer2} -${cubel} units lattice
group hold region hold
group temp region temp

Define atom groups: hold (frozen bottom layers) and temp (heated middle layers for thermal energy).

pair_style eam/alloy
pair_coeff * * Cu01.eam.alloy Cu

Use Embedded Atom Method (EAM) potential for metallic bonding. The Cu01.eam.alloy potential from Mishin et al. is available from the NIST repository.

timestep        0.005

5 femtosecond timestep—small enough for numerical stability.

Initial Conditions

velocity        temp create 600 12345

Initialize velocities for 600 K temperature using random seed 12345.

fix heater temp temp/rescale 1 850 850 5 1
fix nve all nve
fix freeze hold setforce 0 0 0

Three fixes control dynamics:

  • heater: Maintains 850 K in middle layers
  • nve: Velocity Verlet integration for all atoms
  • freeze: Sets forces to zero for bottom atoms
variable e     equal pe
variable k     equal ke
variable t     equal etotal
variable T     equal temp
fix energy all ave/time 1 50 50 v_k v_e v_t v_T file energy_avg.txt

Track energies and temperature, averaging every 50 timesteps and writing to file.

Execution

Minimization

minimize 1.0e-4 1.0e-6 1000 10000

Relax initial structure. Should converge quickly, indicating the system is already well-optimized.

Output Setup

dump eve all custom 5 dump.lammpstrj id type xu yu zu   # fx fy fz  # uncomment for forces
dump_modify eve sort id

Write atomic positions every 5 timesteps, sorted by atom ID. Uncomment force components if needed for analysis.

Production Run

thermo 50
run 100000  # 100_000 * 5 fs = 500 ps

Run simulation for 500 ps with thermo output every 50 steps.

Visualization and Analysis

Visualize results using Ovito, a free atomistic visualization tool:

  1. Open the trajectory file in Ovito
  2. Color atoms by z-coordinate
  3. Restrict height range to 0-2 Å for surface focus
  4. Animate to observe diffusion events

Analysis Results

The simulation generates rich data for machine learning applications:

Energy Analysis

Energy fluctuations reveal thermal motion patterns:

Average kinetic energy, potential energy, total energy, and temperature over time.
Energy and temperature evolution over 500 ps simulation.

Skipping initial equilibration (first 30 steps), these fluctuations enable:

  • Anomaly detection: Identifying unusual diffusion events
  • Temperature prediction: Estimating local temperature from atomic motion
  • Stability analysis: Detecting equilibrium states

Trajectory Analysis

Adatom motion reveals diffusion mechanisms:

x and y coordinates of the adatom over time.
Adatom surface trajectory showing random walk behavior.

This data enables:

  • Path prediction: Training models for future position forecasting
  • Diffusion coefficient estimation: Learning temperature-mobility relationships
  • Transition state identification: Detecting hops between stable sites
z coordinate of the adatom over time.
Height fluctuations revealing exchange events with surface atoms.

Z-coordinate data shows exchange events where the adatom swaps with surface atoms—crucial for surface chemistry understanding. This enables:

  • Event classification: Distinguishing diffusion vs. exchange mechanisms
  • Activation barrier estimation: Learning energy landscapes from fluctuations
  • Surface coordination analysis: Correlating height with local environment

Machine Learning Applications

This simulation produces multiple data types for ML training:

  1. Coordinate trajectories: Neural network potential inputs or graph neural network features
  2. Energy time series: Regression model features for system property prediction
  3. Event annotations: Supervised learning labels for diffusion mechanism classification
  4. Environmental descriptors: Local atomic arrangement features

Systematic MD simulations generate large, labeled datasets across varied conditions—exactly what modern ML models require.

Code and Data

The complete simulation scripts and analysis tools are available for reproducibility:

Energy Analysis Script

# Hunter Heidenreich, 2023
# Plots the energy of a simulation over time.

import matplotlib.pyplot as plt
from argparse import ArgumentParser

if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--input', type=str, required=True)
    parser.add_argument('--output', type=str, required=True)
    parser.add_argument('--skip', type=int, default=1)
    args = parser.parse_args()

    # Parse energy data
    data = {'ts': [], 'kes': [], 'pes': [], 'tes': [], 'Ts': []}
    with open(args.input, 'r') as f:
        for line in f:
            if line.startswith('#') or not line.strip():
                continue
            t, v_k, v_e, v_t, v_T = map(float, line.split())
            data['ts'].append(t)
            data['kes'].append(v_k)
            data['pes'].append(v_e)
            data['tes'].append(v_t)
            data['Ts'].append(v_T)

    # Skip initial equilibration
    for key in data:
        data[key] = data[key][args.skip:]

    # Create subplots
    fig, axs = plt.subplots(2, 2, figsize=(16, 12))

    plots = [('Kinetic Energy', 'kes'), ('Potential Energy', 'pes'),
             ('Total Energy', 'tes'), ('Temperature', 'Ts')]

    for ax, (title, key) in zip(axs.flat, plots):
        ax.plot(data['ts'], data[key])
        ax.set_xlabel('TimeStep')
        ax.set_ylabel(title)
        ax.set_title(title)

    plt.tight_layout()
    plt.savefig(args.output, dpi=300, bbox_inches='tight')

Trajectory Analysis Script

# Hunter Heidenreich, 2023
# Plots the coordinates of the adatom.

import matplotlib.pyplot as plt
from argparse import ArgumentParser

if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--input', type=str, required=True)
    parser.add_argument('--output', type=str, required=True)
    parser.add_argument('--id', type=int, default=1665,
                       help='Atom ID to track')
    parser.add_argument('--do_z', action='store_true',
                       help='Plot z-coordinate instead of xy scatter')
    args = parser.parse_args()

    coords = {'x': [], 'y': [], 'z': []}
    with open(args.input, 'r') as f:
        for line in f:
            if line.startswith(f'{args.id} '):
                x, y, z = map(float, line.split()[2:5])
                coords['x'].append(x)
                coords['y'].append(y)
                coords['z'].append(z)

    plt.figure(figsize=(10, 8))
    if args.do_z:
        plt.plot(range(len(coords['z'])), coords['z'], 'b-', linewidth=1)
        plt.xlabel('Simulation Step')
        plt.ylabel('Z Coordinate (Å)')
        plt.title(f'Height vs. Time for Adatom {args.id}')
        plt.grid(True, alpha=0.3)
    else:
        plt.scatter(coords['x'], coords['y'], s=1, alpha=0.7, c='red')
        plt.xlabel('X Coordinate (Å)')
        plt.ylabel('Y Coordinate (Å)')
        plt.title(f'XY Trajectory for Adatom {args.id}')
        plt.axis('equal')
        plt.grid(True, alpha=0.3)

    plt.savefig(args.output, dpi=300, bbox_inches='tight')

Summary and Next Steps

This tutorial demonstrates how molecular dynamics generates valuable ML training data for materials science. Cu adatom diffusion provides an ideal starting point because it:

  • Has interpretable physics: Well-understood mechanisms enable ML validation
  • Shows diverse behaviors: Temperature-dependent dynamics create rich datasets
  • Scales efficiently: Small systems allow extensive parameter exploration
  • Connects to applications: Direct relevance to catalysis and surface engineering

What’s Next

Future posts will extend this framework:

  1. Pt adatom diffusion: How heavier atoms behave differently ✅
  2. Mixed-metal surfaces: Alloy effects on diffusion pathways
  3. Stepped surfaces: How defects alter atomic mobility
  4. ML implementation: Training neural networks on simulation data

Broader Applications

These simulation techniques enable various ML applications:

  • Neural network potentials: Replacing expensive quantum calculations with trained models
  • Rare event sampling: ML-enhanced diffusion pathway identification
  • Catalyst design: Predicting surface modification effects on reactivity
  • Materials discovery: Screening alloy compositions for desired properties

Getting Started

To reproduce these simulations:

  1. Install LAMMPS with EAM potential support
  2. Download Cu01.eam.alloy from the NIST repository
  3. Run the provided script and analyze with the Python tools
  4. Experiment with different temperatures, orientations, or elements

Questions about the simulation setup or interested in applying these techniques to your research? Feel free to reach out—I’m always happy to discuss molecular dynamics and machine learning applications.

References