Introduction
Understanding how individual atoms move on crystal surfaces is fundamental to materials science, catalysis, and nanotechnology. This atomic-scale motion, called adatom diffusion, plays a key role in processes like thin film growth and chemical reactions on surfaces.
As I’ve been learning molecular dynamics simulations for my graduate work, I’ve found that these simulations can generate valuable data for training machine learning models. This post walks through simulating copper adatom diffusion on a Cu(100) surface using LAMMPS, building on the excellent tutorial by Eric N. Hahn.
What you’ll learn:
- How to set up LAMMPS for surface diffusion simulations
- What each simulation parameter does and why it matters
- Basic visualization techniques using Ovito
- Methods for analyzing trajectory data
- How this data could be used in ML applications
In follow-up posts, I’ll show how changing to different elements (like Pt) leads to different diffusion behaviors—which could be valuable for training models to understand how atomic properties affect surface dynamics.
Update: I’ve now written about Pt adatom diffusion, which shows how heavier atoms behave quite differently on surfaces—great for understanding element-specific effects in ML models. See the complete LAMMPS Adatom Diffusion series for the full learning path.
Understanding Adatoms and Surface Diffusion
What is an Adatom?
An adatom (adsorbed atom) is an atom that sits on a crystal surface but isn’t incorporated into the bulk crystal structure. Unlike bulk atoms that are coordinated by multiple neighbors, adatoms typically have fewer bonds, making them highly mobile and reactive.

Ball model representation of a real (atomically rough) crystal surface with steps, kinks, adatoms, and vacancies in a closely-packed crystalline material. Adsorbed molecules, substitutional and interstitial atoms are also illustrated. (CC-BY-SA-4.0: ShutterWaves)
Why Study Adatom Diffusion?
Adatom diffusion is important for several technological processes:
- Thin film growth: Adatoms are the building blocks of deposited films
- Catalysis: Many reactions happen at these mobile surface atoms
- Corrosion: How surface atoms move affects material degradation
- Self-assembly: Adatom movement enables formation of ordered structures
From a machine learning perspective, adatom diffusion is a good test case because:
- The underlying physics is well-understood, giving us ground truth for validation
- Systems are small enough to simulate extensively
- Behavior changes significantly with temperature and atomic species
- We can generate data systematically across different conditions
Why Cu(100)?
Cu(100) surfaces are well-studied in the literature, making them good benchmarks. The face-centered cubic (fcc) structure creates clear diffusion pathways, and copper’s moderate binding energy means we can observe diffusion at reasonable temperatures without requiring extreme computational resources.
Simulation Overview
Before diving into the code details, let’s understand the simulation design:
Key Simulation Parameters
Parameter | Value | Why this choice |
---|---|---|
System size | 8×8×6 unit cells | Large enough to avoid edge effects while keeping simulation time reasonable |
Ensemble | NVT (constant volume, temperature) | Appropriate for surface studies where pressure isn’t the focus |
Potential | EAM (Embedded Atom Method) | Captures metallic bonding better than simple pair potentials |
Time step | 5 fs | Small enough for numerical stability while allowing reasonable run times |
Duration | 500 ps | Long enough to see multiple diffusion events |
Temperature | 600 K (bulk), 850 K (surface) | Creates thermal gradient that drives surface motion |
Simulation Strategy
The approach uses a thermal gradient setup:
- Bottom layers are held fixed to represent the bulk crystal
- Middle layers are heated to 850 K to provide thermal energy
- Top layers and adatom equilibrate to ~600 K, enabling diffusion
- This mimics real conditions where heat flows from substrate to surface
The complete LAMMPS script implementing this approach:
### Original Created by Eric N. Hahn ###
### [email protected] ###
### Modifications by Hunter Heidenreich, CSE lab (Harvard, 2023)
### [email protected]
### 2023-09-01
### Simulating adatoms ###
### Version 0.2 ###
units metal
dimension 3
boundary p p s
atom_style atomic
lattice fcc 3.614
variable cubel equal 4
variable fixer1 equal "v_cubel+2"
variable fixer2 equal "v_cubel+1.49"
region box block -${cubel} ${cubel} -${cubel} ${cubel} -${fixer1} 1 units lattice
region cbox block -${cubel} ${cubel} -${cubel} ${cubel} -${fixer1} 0 units lattice
create_box 1 box
create_atoms 1 region cbox
create_atoms 1 single -0.5 0 0.5 units lattice
region hold block INF INF INF INF -${fixer1} -${fixer2} units lattice
region temp block INF INF INF INF -${fixer2} -${cubel} units lattice
group hold region hold
group temp region temp
pair_style eam/alloy
pair_coeff * * Cu01.eam.alloy Cu
timestep 0.005
compute new all temp
velocity temp create 600 12345
fix heater temp temp/rescale 1 850 850 5 1
fix nve all nve
fix freeze hold setforce 0 0 0
variable e equal pe
variable k equal ke
variable t equal etotal
variable T equal temp
fix energy all ave/time 1 50 50 v_k v_e v_t v_T file energy_avg.txt
minimize 1.0e-4 1.0e-6 1000 10000
dump eve all custom 5 dump.lammpstrj id type xu yu zu # fx fy fz # uncomment for forces
dump_modify eve sort id
thermo 50
run 100000 # 100_000 * 5 fs = 500 ps
Line-by-Line Breakdown
Now let’s walk through what each part of the script does:
Simulation Setup
Units
units metal
This line sets the units for the simulation to metal units. This is a common choice for metal simulations. The full list of units can be found in the LAMMPS documentation. For our purposes: length is in Å, energy in eV, and time in ps.
Dimension and Boundary Conditions
dimension 3
This line sets the dimensionality of the simulation to 3D.
boundary p p s
This line sets the boundary conditions for the simulation. The first two letters indicate periodic boundary conditions in the x and y directions. This means we’re simulating an infinite surface in x and y directions.
The third letter indicates a “shrink-wrapped” boundary condition in the z direction. This means the surface is finite in z, and atoms can move out of the simulation box. This is important because we want the adatom to be able to diffuse off the surface if needed.
Atom Style
atom_style atomic
This line sets the atom style to atomic. This means atoms are represented as points in space without internal structure. This is the most common atom style for metal simulations.
Lattice
lattice fcc 3.614
This line sets the lattice type and lattice constant for the simulation. The lattice type is face-centered cubic (fcc), and the lattice constant is 3.614 Å. This is the experimental lattice constant for Cu.
Variables
variable cubel equal 4
variable fixer1 equal "v_cubel+2"
variable fixer2 equal "v_cubel+1.49"
These lines define variables that we will use later in the script.
The first line defines a variable called cubel
that is equal to 4.
The second line defines a variable called fixer1
that is equal to cubel + 2
.
The third line defines a variable called fixer2
that is equal to cubel + 1.49
.
These variables are used to define the size of the simulation box,
and they are used to define the regions that we will use to hold the adatom in place.
Regions
region box block -${cubel} ${cubel} -${cubel} ${cubel} -${fixer1} 1 units lattice
region cbox block -${cubel} ${cubel} -${cubel} ${cubel} -${fixer1} 0 units lattice
These lines define regions that we will use to hold the adatom in place.
The first line defines a region called box
that is a block of atoms.
The block is defined by the minimum and maximum x, y, and z coordinates.
The minimum x and y coordinates are -cubel
,
and the maximum x and y coordinates are cubel
;
hence, the block is a square with side length 2 * cubel
,
centered at the origin.
The minimum z coordinate is -fixer1
,
and the maximum z coordinate is 1
.
Note, the second line defines a region called cbox
that is a block of atoms.
Create Atoms
create_box 1 box
create_atoms 1 region cbox
create_atoms 1 single -0.5 0 0.5 units lattice
These lines create the simulation box and the atoms in the simulation box. The first line creates the simulation box. The second line creates atoms in the simulation box, specifically the bulk atoms. The third line creates an atom at a specific location in the simulation box, specifically the adatom.
Groups of Atoms
region hold block INF INF INF INF -${fixer1} -${fixer2} units lattice
region temp block INF INF INF INF -${fixer2} -${cubel} units lattice
group hold region hold
group temp region temp
Here, we define two critical regions of the simulation box.
The first region is called hold
, and it defines the bottom of the simulation box where atoms are held in place.
The second region is called temp
, and it defines the lowest layer of the simulation box where atoms are allowed to move
and are velocity rescaled to maintain a constant temperature.
The last two lines define groups of atoms based on the regions that we just defined.
Pair Style and Coefficients
pair_style eam/alloy
pair_coeff * * Cu01.eam.alloy Cu
These lines set the pair style and pair coefficients for the simulation. The first line sets the pair style to embedded atom method (EAM) with alloy support.
The second line sets the pair coefficients for all atom pairs to use the Cu EAM potential. This force-field comes from Mishin et al., and can be downloaded from the NIST Interatomic Potentials Repository.
Time Step
timestep 0.005
This line sets the time step for the simulation to 0.005 ps. This is a common choice for metal simulations—small enough to ensure stability.
Velocity
velocity temp create 600 12345
This line sets the initial velocities for atoms in the simulation. We’re initializing velocities to 600 K using random seed 12345.
Fixes
fix heater temp temp/rescale 1 850 850 5 1
fix nve all nve
fix freeze hold setforce 0 0 0
These lines define the fixes that control the simulation dynamics:
- The first line creates a “heater” that maintains atoms in the
temp
group at 850 K - The second line integrates the equations of motion using the velocity Verlet algorithm
- The third line freezes atoms in the
hold
group by setting their forces to zero
variable e equal pe
variable k equal ke
variable t equal etotal
variable T equal temp
fix energy all ave/time 1 50 50 v_k v_e v_t v_T file energy_avg.txt
These lines define a fix called energy
that calculates the average kinetic energy, potential energy, total energy, and temperature of the simulation every 50 time steps.
The average values are written to the file energy_avg.txt
,
where we average every 50 values over 50 time steps.
Run
Relaxation
minimize 1.0e-4 1.0e-6 1000 10000
While we could run the simulation without relaxing the system, it is generally a good idea to relax a system before running a full simulation. You should see this minimize within a step or two, indicating that the system is already fairly relaxed.
Dumping Atoms
dump eve all custom 5 dump.lammpstrj id type xu yu zu # fx fy fz # uncomment for forces
dump_modify eve sort id
Before we run the simulation, we define a dump file that will be used to write the positions of the atoms in the simulation.
The first line defines a dump file called eve
that will be written every 5 time steps.
The second line sorts the atoms in the dump file by their ID, which is useful for visualization/post-processing.
If you wish to dump the forces on the atoms in the simulation,
you can uncomment the # fx fy fz
in the first line.
Run
thermo 50
run 100000 # 100_000 * 5 fs = 500 ps
These lines run the simulation for 100,000 time steps, which corresponds to 500 ps. The first line prints the thermo output every 50 time steps. This prints the temperature, pressure, and other useful information about the simulation.
Visualization and Analysis
After running the simulation, we can visualize the results using Ovito, a free visualization tool for atomistic simulations. The basic steps are:
- Open the dump file in Ovito
- Color atoms by their z-coordinate
- Restrict the height range to 0-2 Å to focus on surface atoms
- Animate the simulation
What the Data Shows
The trajectory data from this simulation provides useful information for machine learning applications:
Energy Analysis
We can plot the average kinetic energy, potential energy, total energy, and temperature over time.

Average kinetic energy, potential energy, total energy, and temperature over time.
We skip the first 30 time steps to remove the initial relaxation period. The energy fluctuations represent thermal motion and could be used for:
- Anomaly detection: Finding unusual diffusion events
- Temperature prediction: Training models to estimate local temperature from atomic motions
- Stability analysis: Detecting when surfaces reach equilibrium
Trajectory Analysis
We can plot the x and y coordinates of the adatom over time.

x and y coordinates of the adatom over time.
This trajectory data could be valuable for:
- Path prediction: Training models to predict future atomic positions
- Diffusion coefficient estimation: Learning relationships between temperature and mobility
- Transition state identification: Detecting hopping events between stable sites
We can also plot the z coordinate of the adatom over time.

z coordinate of the adatom over time.
This plot shows exchange events where our initial adatom swaps positions with a surface atom—an important process in surface chemistry. The z-coordinate data enables:
- Event classification: Distinguishing between simple diffusion and exchange mechanisms
- Activation barrier estimation: Learning energy landscapes from thermal fluctuations
- Surface coordination analysis: Understanding how atom height correlates with local environment
Potential ML Applications
This simulation produces several types of data useful for machine learning:
- Coordinate trajectories: Direct input for neural network potentials or graph neural networks
- Energy time series: Features for regression models predicting system properties
- Event annotations: Labels for supervised learning of diffusion mechanisms
- Environmental descriptors: Local atomic arrangements as input features
The systematic nature of MD simulations allows generation of large, labeled datasets across different conditions—which is exactly what modern ML models need.
Code and Data
The complete simulation scripts and analysis tools are available for reproducibility:
Energy Analysis Script
# Hunter Heidenreich, 2023
# Plots the energy of a simulation over time.
import matplotlib.pyplot as plt
from argparse import ArgumentParser
if __name__ == '__main__':
parser = ArgumentParser()
parser.add_argument('--input', type=str, required=True)
parser.add_argument('--output', type=str, required=True)
parser.add_argument('--skip', type=int, default=1)
args = parser.parse_args()
# Parse energy data
data = {'ts': [], 'kes': [], 'pes': [], 'tes': [], 'Ts': []}
with open(args.input, 'r') as f:
for line in f:
if line.startswith('#') or not line.strip():
continue
t, v_k, v_e, v_t, v_T = map(float, line.split())
data['ts'].append(t)
data['kes'].append(v_k)
data['pes'].append(v_e)
data['tes'].append(v_t)
data['Ts'].append(v_T)
# Skip initial equilibration
for key in data:
data[key] = data[key][args.skip:]
# Create subplots
fig, axs = plt.subplots(2, 2, figsize=(16, 12))
plots = [('Kinetic Energy', 'kes'), ('Potential Energy', 'pes'),
('Total Energy', 'tes'), ('Temperature', 'Ts')]
for ax, (title, key) in zip(axs.flat, plots):
ax.plot(data['ts'], data[key])
ax.set_xlabel('TimeStep')
ax.set_ylabel(title)
ax.set_title(title)
plt.tight_layout()
plt.savefig(args.output, dpi=300, bbox_inches='tight')
Trajectory Analysis Script
# Hunter Heidenreich, 2023
# Plots the coordinates of the adatom.
import matplotlib.pyplot as plt
from argparse import ArgumentParser
if __name__ == '__main__':
parser = ArgumentParser()
parser.add_argument('--input', type=str, required=True)
parser.add_argument('--output', type=str, required=True)
parser.add_argument('--id', type=int, default=1665,
help='Atom ID to track')
parser.add_argument('--do_z', action='store_true',
help='Plot z-coordinate instead of xy scatter')
args = parser.parse_args()
coords = {'x': [], 'y': [], 'z': []}
with open(args.input, 'r') as f:
for line in f:
if line.startswith(f'{args.id} '):
x, y, z = map(float, line.split()[2:5])
coords['x'].append(x)
coords['y'].append(y)
coords['z'].append(z)
plt.figure(figsize=(10, 8))
if args.do_z:
plt.plot(range(len(coords['z'])), coords['z'], 'b-', linewidth=1)
plt.xlabel('Simulation Step')
plt.ylabel('Z Coordinate (Å)')
plt.title(f'Height vs. Time for Adatom {args.id}')
plt.grid(True, alpha=0.3)
else:
plt.scatter(coords['x'], coords['y'], s=1, alpha=0.7, c='red')
plt.xlabel('X Coordinate (Å)')
plt.ylabel('Y Coordinate (Å)')
plt.title(f'XY Trajectory for Adatom {args.id}')
plt.axis('equal')
plt.grid(True, alpha=0.3)
plt.savefig(args.output, dpi=300, bbox_inches='tight')
Summary and Next Steps
This tutorial shows how molecular dynamics simulations can generate useful training data for machine learning applications in materials science. The Cu adatom diffusion system is a good starting point because it:
- Has interpretable physics: Well-understood mechanisms make it easier to validate ML predictions
- Shows diverse behaviors: Temperature-dependent dynamics create rich datasets
- Scales efficiently: Small system size allows extensive parameter exploration
- Connects to real applications: Direct relevance to catalysis and surface engineering
What’s Next
In upcoming posts, I’ll explore how this framework extends to other systems:
- Pt adatom diffusion: How heavier atoms behave differently ✅
- Mixed-metal surfaces: How alloy effects change diffusion pathways
- Stepped surfaces: How surface defects alter atomic mobility
- Machine learning applications: Actually training neural networks on the generated data
Broader Applications
The simulation techniques shown here could be useful for various ML applications:
- Neural network potentials: Training on trajectory data to replace expensive quantum calculations
- Rare event sampling: Using ML to identify and enhance diffusion pathways
- Catalyst design: Predicting how surface modifications affect reactivity
- Materials discovery: Screening new alloy compositions for desired surface properties
Getting Started
To try these simulations yourself:
- Install LAMMPS with EAM potential support
- Download the Cu01.eam.alloy potential file from NIST
- Run the provided script and analyze using the Python tools above
- Experiment with different temperatures, surface orientations, or atomic species
Questions about the simulation setup or interested in applying these techniques to your research? Feel free to reach out—I’m always happy to discuss molecular dynamics and machine learning applications.