Abstract
GROMACS simulation workflows for generating amino acid dipeptide trajectories across nine different residue types. Moving beyond typical alanine-only approaches to create diverse molecular dynamics datasets for machine learning training on protein dynamics.
What I Built
Simulation Workflows
- Nine amino acid types: Aromatic (Phe, Trp), branched (Ile, Val, Leu), flexible (Gly, Ala), constrained (Pro), and special chemistry (Met)
- Automated GROMACS scripts: Complete workflows from energy minimization through production runs
- High-resolution output: 1 ps timesteps optimized for ML applications
- Consistent protocols: Same simulation conditions across all amino acid types
Analysis Tools
- Trajectory processing: Tools for handling and analyzing molecular dynamics output
- Chemical diversity: Systematic comparison of how different side chains affect dynamics
- ML-ready datasets: Formatted for training neural networks on protein behavior
Key Results
- Chemical diversity: Different amino acid types showed distinct dynamics patterns
- Systematic differences: Observable variations between flexible (Gly), constrained (Pro), aromatic (Phe, Trp), and branched (Val, Ile, Leu) residues
- Training data quality: High-resolution trajectories suitable for ML models learning structure-dynamics relationships
- Reproducible workflows: Automated scripts enable consistent dataset generation
Impact
- ML training data: Diverse trajectory datasets for neural networks studying protein dynamics
- Method development: Foundation for generating training data for larger protein systems
- Educational value: Simple systems for learning molecular dynamics principles
- Reproducible science: Automated workflows others can extend to additional amino acids