Abstract

GROMACS simulation workflows for generating amino acid dipeptide trajectories across nine different residue types. Moving beyond typical alanine-only approaches to create diverse molecular dynamics datasets for machine learning training on protein dynamics.

What I Built

Simulation Workflows

  • Nine amino acid types: Aromatic (Phe, Trp), branched (Ile, Val, Leu), flexible (Gly, Ala), constrained (Pro), and special chemistry (Met)
  • Automated GROMACS scripts: Complete workflows from energy minimization through production runs
  • High-resolution output: 1 ps timesteps optimized for ML applications
  • Consistent protocols: Same simulation conditions across all amino acid types

Analysis Tools

  • Trajectory processing: Tools for handling and analyzing molecular dynamics output
  • Chemical diversity: Systematic comparison of how different side chains affect dynamics
  • ML-ready datasets: Formatted for training neural networks on protein behavior

Key Results

  • Chemical diversity: Different amino acid types showed distinct dynamics patterns
  • Systematic differences: Observable variations between flexible (Gly), constrained (Pro), aromatic (Phe, Trp), and branched (Val, Ile, Leu) residues
  • Training data quality: High-resolution trajectories suitable for ML models learning structure-dynamics relationships
  • Reproducible workflows: Automated scripts enable consistent dataset generation

Impact

  • ML training data: Diverse trajectory datasets for neural networks studying protein dynamics
  • Method development: Foundation for generating training data for larger protein systems
  • Educational value: Simple systems for learning molecular dynamics principles
  • Reproducible science: Automated workflows others can extend to additional amino acids