Introduction

Understanding how individual atoms move on crystal surfaces is fundamental to materials science, catalysis, and nanotechnology. This atomic-scale motion, called adatom diffusion, plays a key role in processes like thin film growth and chemical reactions on surfaces.

As I’ve been learning molecular dynamics simulations for my graduate work, I’ve found that these simulations can generate valuable data for training machine learning models. This post walks through simulating copper adatom diffusion on a Cu(100) surface using LAMMPS, building on the excellent tutorial by Eric N. Hahn.

What you’ll learn:

  • How to set up LAMMPS for surface diffusion simulations
  • What each simulation parameter does and why it matters
  • Basic visualization techniques using Ovito
  • Methods for analyzing trajectory data
  • How this data could be used in ML applications

In follow-up posts, I’ll show how changing to different elements (like Pt) leads to different diffusion behaviors—which could be valuable for training models to understand how atomic properties affect surface dynamics.

Update: I’ve now written about Pt adatom diffusion, which shows how heavier atoms behave quite differently on surfaces—great for understanding element-specific effects in ML models. See the complete LAMMPS Adatom Diffusion series for the full learning path.

Understanding Adatoms and Surface Diffusion

What is an Adatom?

An adatom (adsorbed atom) is an atom that sits on a crystal surface but isn’t incorporated into the bulk crystal structure. Unlike bulk atoms that are coordinated by multiple neighbors, adatoms typically have fewer bonds, making them highly mobile and reactive.

Ball model representation of a real (atomically rough) crystal surface with steps, kinks, adatoms, and vacancies in a closely-packed crystalline material. Adsorbed molecules, substitutional and interstitial atoms are also illustrated.

Ball model representation of a real (atomically rough) crystal surface with steps, kinks, adatoms, and vacancies in a closely-packed crystalline material. Adsorbed molecules, substitutional and interstitial atoms are also illustrated. (CC-BY-SA-4.0: ShutterWaves)

Why Study Adatom Diffusion?

Adatom diffusion is important for several technological processes:

  • Thin film growth: Adatoms are the building blocks of deposited films
  • Catalysis: Many reactions happen at these mobile surface atoms
  • Corrosion: How surface atoms move affects material degradation
  • Self-assembly: Adatom movement enables formation of ordered structures

From a machine learning perspective, adatom diffusion is a good test case because:

  • The underlying physics is well-understood, giving us ground truth for validation
  • Systems are small enough to simulate extensively
  • Behavior changes significantly with temperature and atomic species
  • We can generate data systematically across different conditions

Why Cu(100)?

Cu(100) surfaces are well-studied in the literature, making them good benchmarks. The face-centered cubic (fcc) structure creates clear diffusion pathways, and copper’s moderate binding energy means we can observe diffusion at reasonable temperatures without requiring extreme computational resources.

Simulation Overview

Before diving into the code details, let’s understand the simulation design:

Key Simulation Parameters

ParameterValueWhy this choice
System size8×8×6 unit cellsLarge enough to avoid edge effects while keeping simulation time reasonable
EnsembleNVT (constant volume, temperature)Appropriate for surface studies where pressure isn’t the focus
PotentialEAM (Embedded Atom Method)Captures metallic bonding better than simple pair potentials
Time step5 fsSmall enough for numerical stability while allowing reasonable run times
Duration500 psLong enough to see multiple diffusion events
Temperature600 K (bulk), 850 K (surface)Creates thermal gradient that drives surface motion

Simulation Strategy

The approach uses a thermal gradient setup:

  • Bottom layers are held fixed to represent the bulk crystal
  • Middle layers are heated to 850 K to provide thermal energy
  • Top layers and adatom equilibrate to ~600 K, enabling diffusion
  • This mimics real conditions where heat flows from substrate to surface

The complete LAMMPS script implementing this approach:

### Original Created by Eric N. Hahn  ###
### [email protected] ###

### Modifications by Hunter Heidenreich, CSE lab (Harvard, 2023)
### [email protected]
### 2023-09-01

### Simulating adatoms ###
### Version 0.2 ###


units metal
dimension 3
boundary p p s
atom_style atomic

lattice fcc 3.614
variable cubel equal 4
variable fixer1 equal "v_cubel+2"
variable fixer2 equal "v_cubel+1.49"
region  box block -${cubel} ${cubel} -${cubel} ${cubel} -${fixer1} 1 units lattice
region cbox block -${cubel} ${cubel} -${cubel} ${cubel} -${fixer1} 0 units lattice
create_box 1 box
create_atoms 1 region cbox
create_atoms 1 single -0.5 0 0.5 units lattice
region hold block INF INF INF INF -${fixer1} -${fixer2} units lattice
region temp block INF INF INF INF -${fixer2} -${cubel} units lattice
group hold region hold
group temp region temp

pair_style eam/alloy
pair_coeff * * Cu01.eam.alloy Cu

timestep        0.005
compute         new all temp
velocity        temp create 600 12345
fix heater temp temp/rescale 1 850 850 5 1
fix nve all nve
fix freeze hold setforce 0 0 0

variable e     equal pe
variable k     equal ke
variable t     equal etotal
variable T     equal temp
fix energy all ave/time 1 50 50 v_k v_e v_t v_T file energy_avg.txt

minimize 1.0e-4 1.0e-6 1000 10000

dump eve all custom 5 dump.lammpstrj id type xu yu zu   # fx fy fz  # uncomment for forces
dump_modify eve sort id

thermo 50
run 100000  # 100_000 * 5 fs = 500 ps

Line-by-Line Breakdown

Now let’s walk through what each part of the script does:

Simulation Setup

Units

units metal

This line sets the units for the simulation to metal units. This is a common choice for metal simulations. The full list of units can be found in the LAMMPS documentation. For our purposes: length is in Å, energy in eV, and time in ps.

Dimension and Boundary Conditions

dimension 3

This line sets the dimensionality of the simulation to 3D.

boundary p p s

This line sets the boundary conditions for the simulation. The first two letters indicate periodic boundary conditions in the x and y directions. This means we’re simulating an infinite surface in x and y directions.

The third letter indicates a “shrink-wrapped” boundary condition in the z direction. This means the surface is finite in z, and atoms can move out of the simulation box. This is important because we want the adatom to be able to diffuse off the surface if needed.

Atom Style

atom_style atomic

This line sets the atom style to atomic. This means atoms are represented as points in space without internal structure. This is the most common atom style for metal simulations.

Lattice

lattice fcc 3.614

This line sets the lattice type and lattice constant for the simulation. The lattice type is face-centered cubic (fcc), and the lattice constant is 3.614 Å. This is the experimental lattice constant for Cu.

Variables

variable cubel equal 4
variable fixer1 equal "v_cubel+2"
variable fixer2 equal "v_cubel+1.49"

These lines define variables that we will use later in the script. The first line defines a variable called cubel that is equal to 4. The second line defines a variable called fixer1 that is equal to cubel + 2. The third line defines a variable called fixer2 that is equal to cubel + 1.49. These variables are used to define the size of the simulation box, and they are used to define the regions that we will use to hold the adatom in place.

Regions

region  box block -${cubel} ${cubel} -${cubel} ${cubel} -${fixer1} 1 units lattice
region cbox block -${cubel} ${cubel} -${cubel} ${cubel} -${fixer1} 0 units lattice

These lines define regions that we will use to hold the adatom in place.

The first line defines a region called box that is a block of atoms. The block is defined by the minimum and maximum x, y, and z coordinates. The minimum x and y coordinates are -cubel, and the maximum x and y coordinates are cubel; hence, the block is a square with side length 2 * cubel, centered at the origin. The minimum z coordinate is -fixer1, and the maximum z coordinate is 1. Note, the second line defines a region called cbox that is a block of atoms.

Create Atoms

create_box 1 box
create_atoms 1 region cbox
create_atoms 1 single -0.5 0 0.5 units lattice

These lines create the simulation box and the atoms in the simulation box. The first line creates the simulation box. The second line creates atoms in the simulation box, specifically the bulk atoms. The third line creates an atom at a specific location in the simulation box, specifically the adatom.

Groups of Atoms

region hold block INF INF INF INF -${fixer1} -${fixer2} units lattice
region temp block INF INF INF INF -${fixer2} -${cubel} units lattice
group hold region hold
group temp region temp

Here, we define two critical regions of the simulation box. The first region is called hold, and it defines the bottom of the simulation box where atoms are held in place. The second region is called temp, and it defines the lowest layer of the simulation box where atoms are allowed to move and are velocity rescaled to maintain a constant temperature. The last two lines define groups of atoms based on the regions that we just defined.

Pair Style and Coefficients

pair_style eam/alloy
pair_coeff * * Cu01.eam.alloy Cu

These lines set the pair style and pair coefficients for the simulation. The first line sets the pair style to embedded atom method (EAM) with alloy support.

The second line sets the pair coefficients for all atom pairs to use the Cu EAM potential. This force-field comes from Mishin et al., and can be downloaded from the NIST Interatomic Potentials Repository.

Time Step

timestep        0.005

This line sets the time step for the simulation to 0.005 ps. This is a common choice for metal simulations—small enough to ensure stability.

Velocity

velocity        temp create 600 12345

This line sets the initial velocities for atoms in the simulation. We’re initializing velocities to 600 K using random seed 12345.

Fixes

fix heater temp temp/rescale 1 850 850 5 1
fix nve all nve
fix freeze hold setforce 0 0 0

These lines define the fixes that control the simulation dynamics:

  • The first line creates a “heater” that maintains atoms in the temp group at 850 K
  • The second line integrates the equations of motion using the velocity Verlet algorithm
  • The third line freezes atoms in the hold group by setting their forces to zero
variable e     equal pe
variable k     equal ke
variable t     equal etotal
variable T     equal temp
fix energy all ave/time 1 50 50 v_k v_e v_t v_T file energy_avg.txt

These lines define a fix called energy that calculates the average kinetic energy, potential energy, total energy, and temperature of the simulation every 50 time steps. The average values are written to the file energy_avg.txt, where we average every 50 values over 50 time steps.

Run

Relaxation

minimize 1.0e-4 1.0e-6 1000 10000

While we could run the simulation without relaxing the system, it is generally a good idea to relax a system before running a full simulation. You should see this minimize within a step or two, indicating that the system is already fairly relaxed.

Dumping Atoms

dump eve all custom 5 dump.lammpstrj id type xu yu zu   # fx fy fz  # uncomment for forces
dump_modify eve sort id

Before we run the simulation, we define a dump file that will be used to write the positions of the atoms in the simulation. The first line defines a dump file called eve that will be written every 5 time steps. The second line sorts the atoms in the dump file by their ID, which is useful for visualization/post-processing.

If you wish to dump the forces on the atoms in the simulation, you can uncomment the # fx fy fz in the first line.

Run

thermo 50
run 100000  # 100_000 * 5 fs = 500 ps

These lines run the simulation for 100,000 time steps, which corresponds to 500 ps. The first line prints the thermo output every 50 time steps. This prints the temperature, pressure, and other useful information about the simulation.

Visualization and Analysis

After running the simulation, we can visualize the results using Ovito, a free visualization tool for atomistic simulations. The basic steps are:

  • Open the dump file in Ovito
  • Color atoms by their z-coordinate
  • Restrict the height range to 0-2 Å to focus on surface atoms
  • Animate the simulation

What the Data Shows

The trajectory data from this simulation provides useful information for machine learning applications:

Energy Analysis

We can plot the average kinetic energy, potential energy, total energy, and temperature over time.

Average kinetic energy, potential energy, total energy, and temperature over time.

Average kinetic energy, potential energy, total energy, and temperature over time.

We skip the first 30 time steps to remove the initial relaxation period. The energy fluctuations represent thermal motion and could be used for:

  • Anomaly detection: Finding unusual diffusion events
  • Temperature prediction: Training models to estimate local temperature from atomic motions
  • Stability analysis: Detecting when surfaces reach equilibrium

Trajectory Analysis

We can plot the x and y coordinates of the adatom over time.

x and y coordinates of the adatom over time.

x and y coordinates of the adatom over time.

This trajectory data could be valuable for:

  • Path prediction: Training models to predict future atomic positions
  • Diffusion coefficient estimation: Learning relationships between temperature and mobility
  • Transition state identification: Detecting hopping events between stable sites

We can also plot the z coordinate of the adatom over time.

z coordinate of the adatom over time.

z coordinate of the adatom over time.

This plot shows exchange events where our initial adatom swaps positions with a surface atom—an important process in surface chemistry. The z-coordinate data enables:

  • Event classification: Distinguishing between simple diffusion and exchange mechanisms
  • Activation barrier estimation: Learning energy landscapes from thermal fluctuations
  • Surface coordination analysis: Understanding how atom height correlates with local environment

Potential ML Applications

This simulation produces several types of data useful for machine learning:

  1. Coordinate trajectories: Direct input for neural network potentials or graph neural networks
  2. Energy time series: Features for regression models predicting system properties
  3. Event annotations: Labels for supervised learning of diffusion mechanisms
  4. Environmental descriptors: Local atomic arrangements as input features

The systematic nature of MD simulations allows generation of large, labeled datasets across different conditions—which is exactly what modern ML models need.

Code and Data

The complete simulation scripts and analysis tools are available for reproducibility:

Energy Analysis Script

# Hunter Heidenreich, 2023
# Plots the energy of a simulation over time.

import matplotlib.pyplot as plt
from argparse import ArgumentParser

if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--input', type=str, required=True)
    parser.add_argument('--output', type=str, required=True)
    parser.add_argument('--skip', type=int, default=1)
    args = parser.parse_args()

    # Parse energy data
    data = {'ts': [], 'kes': [], 'pes': [], 'tes': [], 'Ts': []}
    with open(args.input, 'r') as f:
        for line in f:
            if line.startswith('#') or not line.strip():
                continue
            t, v_k, v_e, v_t, v_T = map(float, line.split())
            data['ts'].append(t)
            data['kes'].append(v_k)
            data['pes'].append(v_e)
            data['tes'].append(v_t)
            data['Ts'].append(v_T)

    # Skip initial equilibration
    for key in data:
        data[key] = data[key][args.skip:]

    # Create subplots
    fig, axs = plt.subplots(2, 2, figsize=(16, 12))

    plots = [('Kinetic Energy', 'kes'), ('Potential Energy', 'pes'),
             ('Total Energy', 'tes'), ('Temperature', 'Ts')]

    for ax, (title, key) in zip(axs.flat, plots):
        ax.plot(data['ts'], data[key])
        ax.set_xlabel('TimeStep')
        ax.set_ylabel(title)
        ax.set_title(title)

    plt.tight_layout()
    plt.savefig(args.output, dpi=300, bbox_inches='tight')

Trajectory Analysis Script

# Hunter Heidenreich, 2023
# Plots the coordinates of the adatom.

import matplotlib.pyplot as plt
from argparse import ArgumentParser

if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--input', type=str, required=True)
    parser.add_argument('--output', type=str, required=True)
    parser.add_argument('--id', type=int, default=1665,
                       help='Atom ID to track')
    parser.add_argument('--do_z', action='store_true',
                       help='Plot z-coordinate instead of xy scatter')
    args = parser.parse_args()

    coords = {'x': [], 'y': [], 'z': []}
    with open(args.input, 'r') as f:
        for line in f:
            if line.startswith(f'{args.id} '):
                x, y, z = map(float, line.split()[2:5])
                coords['x'].append(x)
                coords['y'].append(y)
                coords['z'].append(z)

    plt.figure(figsize=(10, 8))
    if args.do_z:
        plt.plot(range(len(coords['z'])), coords['z'], 'b-', linewidth=1)
        plt.xlabel('Simulation Step')
        plt.ylabel('Z Coordinate (Å)')
        plt.title(f'Height vs. Time for Adatom {args.id}')
        plt.grid(True, alpha=0.3)
    else:
        plt.scatter(coords['x'], coords['y'], s=1, alpha=0.7, c='red')
        plt.xlabel('X Coordinate (Å)')
        plt.ylabel('Y Coordinate (Å)')
        plt.title(f'XY Trajectory for Adatom {args.id}')
        plt.axis('equal')
        plt.grid(True, alpha=0.3)

    plt.savefig(args.output, dpi=300, bbox_inches='tight')

Summary and Next Steps

This tutorial shows how molecular dynamics simulations can generate useful training data for machine learning applications in materials science. The Cu adatom diffusion system is a good starting point because it:

  • Has interpretable physics: Well-understood mechanisms make it easier to validate ML predictions
  • Shows diverse behaviors: Temperature-dependent dynamics create rich datasets
  • Scales efficiently: Small system size allows extensive parameter exploration
  • Connects to real applications: Direct relevance to catalysis and surface engineering

What’s Next

In upcoming posts, I’ll explore how this framework extends to other systems:

  1. Pt adatom diffusion: How heavier atoms behave differently ✅
  2. Mixed-metal surfaces: How alloy effects change diffusion pathways
  3. Stepped surfaces: How surface defects alter atomic mobility
  4. Machine learning applications: Actually training neural networks on the generated data

Broader Applications

The simulation techniques shown here could be useful for various ML applications:

  • Neural network potentials: Training on trajectory data to replace expensive quantum calculations
  • Rare event sampling: Using ML to identify and enhance diffusion pathways
  • Catalyst design: Predicting how surface modifications affect reactivity
  • Materials discovery: Screening new alloy compositions for desired surface properties

Getting Started

To try these simulations yourself:

  1. Install LAMMPS with EAM potential support
  2. Download the Cu01.eam.alloy potential file from NIST
  3. Run the provided script and analyze using the Python tools above
  4. Experiment with different temperatures, surface orientations, or atomic species

Questions about the simulation setup or interested in applying these techniques to your research? Feel free to reach out—I’m always happy to discuss molecular dynamics and machine learning applications.

References