Biomolecular Simulation (Lab 1)

Name:

Department:

Michigan State University

Outline of the class:

  1. Run a simulation (name: sim1) with a small polypeptide angiotensin (1N9U PDB). It is a 10 amino acid long polypeptide. The provided input pdb has water and 0.15 mM of KCL.

    • You will add the necessary reporters for storing (i) temperature, (ii) Potential energy and (iii) Total energy in a file.

      • Hint: You can check the code block from Tuesday’s lecture.

    • Visualize.


  1. A new simulation (name: pullforce) with the same polypeptide.

    • Here we will add a ‘pulling force’ to the sytem. The first residue will be pulled to positive x,y,z axes. The last residue will be pulled to the negative x,y,z axes. The pulling force will be a equal contribution of both the forces. And hence we would expect with time, the protein to strech an uncoil. The code block is provided.

    • Visualize and see the difference.

    • You can play with the parameters of the pulling force and see how they are impacting the ‘pulling’. {Do this after you are done with the next part}


  1. Using mdtraj as a tool for analysis:

    • Loading MD simulation data by mdtraj, PBC removal and recenter.

    • How we can make atom selection with mdtraj.

    • Plot radius of gyration for both sim1 and pullforce and compare.

    • Plot RMSD for both sim1 and pullforce and compare.

    • Plot atom-atom distance and compare.


Softwares needed: OpenMM and MDTRAJ

Keep track of the filenames that you are generating

# Import the required python modules

import matplotlib.pyplot as plt
import mdtraj
import nglview
import numpy as np
from openmm.app import *
from openmm import *
from openmm.unit import *
from sys import stdout
import time
# Total steps of the simulations
N_STEPS_SIM=20000

# reporting interval
N_STEPS_REPORT=100
### SIM_1

# loading the input pdb.
pdb = PDBFile('angiotensin.pdb')

# defining the forcefield 
forcefield = ForceField('amber14-all.xml', 'amber14/tip3pfb.xml')

# building the system using the forcefield and pdb topology
system = forcefield.createSystem(pdb.topology, nonbondedMethod=PME,
        nonbondedCutoff=1*nanometer, constraints=HBonds)

# defining the integrator
integrator = LangevinMiddleIntegrator(300*kelvin, 1/picosecond, 0.002*picoseconds)

# defining the platform
platform = Platform.getPlatformByName('CPU')

# creating the simulation object
simulation = Simulation(pdb.topology, system, integrator,platform)

# setting the initial positions to pdb positions
simulation.context.setPositions(pdb.positions)

# energy before minimization
state0 = simulation.context.getState(getEnergy=True)
print(state0.getPotentialEnergy())

# minimize the energy
simulation.minimizeEnergy(tolerance= 5*kilojoule/mole, maxIterations=500)

# print the energy afterwards
state1 = simulation.context.getState(getEnergy=True)
print(state1.getPotentialEnergy())

# defining a DCD reporter which will save the trajectory
simulation.reporters.append(DCDReporter(f'angio_{int(N_STEPS_SIM/N_STEPS_REPORT)}frames.dcd', N_STEPS_REPORT))

# printing the PE and Temp to screen
simulation.reporters.append(StateDataReporter(stdout, 500, step=True,
        potentialEnergy=True, totalEnergy=True, temperature=True, speed=True))

################
# Here you will add the necessary reporters for storing
# (i) temperature, (ii) Potential energy and (iii) Total energy in a file.


################


# Lets go!!!
t_start = time.time()
simulation.step(N_STEPS_SIM)
t_end = time.time()
print('time taken in seconds')
print(t_end - t_start)

# Dont forget to run after adding the lines!! :)
# Change the filename  
view = nglview.show_mdtraj(mdtraj.load("filename", top = "angiotensin.pdb"))

#Comment the next line out to see only the angiotensin molecule and ions (no water).
view.add_line("water")

view

Note: You can build the system with AMBER, CHARMM or GROMACS generated input files as well, instead of starting with a PDB.

The example codes and instructions are provided in the following links:

# PULLFORCE simulation: 
#All important objects are named by '_1'. 
# Don't overwrite the objects of the previous simulation.

N_STEPS_SIM=20000

N_STEPS_REPORT=100

pdb_1 = PDBFile('angiotensin.pdb')

# defining the forcefield 
forcefield_1 = ForceField('amber14-all.xml', 'amber14/tip3pfb.xml')

# building the system using the forcefield and pdb topology
system_1 = forcefield_1.createSystem(pdb_1.topology, nonbondedMethod=PME,
        nonbondedCutoff=1*nanometer, constraints=HBonds)

########### Lets mess up with the system ###################################
## Add a pulling force that will pull the two end of the protein in two different directions:
# Force constant of the force (# You can play with this parameter)

pull_force_constant = 1000.0 * kilojoules_per_mole / 1.0 * nanometers 

#This is a list of the indices of first residue of the protein
# not an efficient way, right?
# We will learn a more efficient way later

group1 = [0,1,2,3,4,5,6,7,8,9,10,11,12,13] 

# This is a list of the indices of the last residue of the protein

group2 = [162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181]

# Pulling in negative x direction not in y or z

fx = -1.0               # pull in x
fy = fz = -1.0          # pull in y and z, you can switch this on/off later simulations

# defining a custom centroid bond force from openmm (we have to provide the energy function)
# x1,y1,z1 is for group1
# x2,y2,z2 is for group2 (openmm can figure out these notations)
# Note that we are adding fx to one group and (-fx) to another

pullforce = CustomCentroidBondForce(2, "pull_force_constant * (x1*fx + y1*fy + z1*fz) + pull_force_constant * (x2*(-fx) + y1*(-fy) + z1*(-fz))" )

# global parameters for calculation of the energy, openmm needs these few lines 

pullforce.addGlobalParameter('pull_force_constant', pull_force_constant)
pullforce.addGlobalParameter('fx', fx)
pullforce.addGlobalParameter('fy', fy)
pullforce.addGlobalParameter('fz', fz)

# defining the group indices and adding them to the pullforce

group1_idx = pullforce.addGroup(group1)
group2_idx = pullforce.addGroup(group2)
                                     
# defining the hypothetical bond that will pull the groups

pullforce.addBond([group1_idx, group2_idx])

# finally adding them to the system

system_1.addForce(pullforce)
######################################## pullforce adding section ends here ###################

# defining the integrator
integrator_1 = LangevinMiddleIntegrator(300*kelvin, 1/picosecond, 0.002*picoseconds)

# defining the platform
platform_1 = Platform.getPlatformByName('CPU')

# creating the simulation object
simulation_1 = Simulation(pdb_1.topology, system_1, integrator_1,platform_1)

# setting the initial positions to pdb positions
simulation_1.context.setPositions(pdb_1.positions)

simulation_1.reporters.append(DCDReporter(f'angio_{int(N_STEPS_SIM/N_STEPS_REPORT)}frames_pullforce.dcd', N_STEPS_REPORT))

# printing the PE and Temp to screen
simulation_1.reporters.append(StateDataReporter(stdout, 500, step=True,
        potentialEnergy=True, totalEnergy=True, temperature=True, speed=True))

t_start = time.time()
simulation_1.step(N_STEPS_SIM)
t_end = time.time()
print('time taken in seconds')
print(t_end - t_start)
# add the filenames here
# you can play with nglview with a bit of guidance from this webpage:
# https://projects.volkamerlab.org/teachopencadd/talktorials/T017_advanced_nglview_usage.html#Basic-API-usage
view = nglview.show_mdtraj(mdtraj.load("filename", top = "pdb filename"))
#view.add_ball_and_stick()
#view.add_line("water")
view

MDTRAJ is nice tool that can do various analysis for us.

The API and the utility is here: https://mdtraj.org/1.9.4/index.html

We can have a better way to grab the indices for forces in the previous code.

#loading a trajectory (change the filenmaes if you need to):
trj_sim1 = mdtraj.load("angio_200frames.dcd", top = "angiotensin.pdb").remove_solvent()
trj_pullforce = mdtraj.load("angio_200frames_pullforce.dcd", top = "angiotensin.pdb").remove_solvent()
#viewing in nglview
nglview.show_mdtraj(trj_sim1)
#viewing the pullforce simulation
nglview.show_mdtraj(trj_pullforce)
## Recenter and apply periodic boundary conditions to the molecules in each frame of the trajectory.
trj_sim1 = trj_sim1.image_molecules(anchor_molecules=[set(trj_sim1.top.residue(0).atoms)])
trj_pullforce = trj_pullforce.image_molecules(anchor_molecules=[set(trj_sim1.top.residue(0).atoms)])
# what does printing trj give us?
trj_sim1
# How does the topology look like
trj_sim1.top
#list of residues
list(trj_sim1.top.residues)
##### Check youself the list of residues in pullforce simulation.
# list of atoms in the first residue
list(trj_sim1.top.residue(0).atoms)
# Find out the list of atoms in the last residue of pullforce simulation here
print(trj_sim1.top.residue(0).atom(0))
print(trj_sim1.top.residue(0).atom(1))
# Index of any atom in the topology can be found out like this
trj_sim1.top.residue(9).atom(0).index
## Now lets see a better way to print group1 in the pullforce example

# creating an empty list
group1_list=[]

# loop over all the atoms of the residue0 
for i in range(len(list(trj_sim1.top.residue(0).atoms))):
    
    #put the indices in a list
    group1_list.append(trj_sim1.top.residue(0).atom(i).index)

print(group1_list)
######### Write the similar code block for group2 in the pullforce example here ############
### There are other ways to select atoms/groups/residues as well

trj_sim1.top.select('resname HIS')

# See this directly gives the indices as array as well
# We can add or/and etc in the select arg string
trj_sim1.top.select('resname HIS or resname ASP')
# print all the backbone atom indices
trj.top.select('name N or name H or name C or name O')
####### Write a similar code block to print all the atom indices for group1 and group2 using this tool ########
##### Computing some properties ####

#Compute the radius of gyration for every frame.
rg_sim1 = mdtraj.compute_rg(trj_sim1)
rg_pull = mdtraj.compute_rg(trj_pullforce)

#print(rg_pull)
#print(rg_sim1)
plt.plot(rg_sim1, label='Normal simulation')
plt.plot(rg_pull, label='Pullforce simulation')
plt.legend()
plt.xlabel('Number of snapshots')
plt.ylabel('R_g in nanometer')
plt.show()
### Next we will calculate the RMSD of the polypeptide

## The reference struture will be first frame of the trajectory.

# Hence, slice a trajectory to extract the first frame.
trj_sim1_frame1 =  trj_sim1.slice([0])
print(trj_sim1_frame1)
# Do the same for pullforce: Store the first frame of trj_pullforce in trj_pullforce_frame1
#Calculating RMSD:
sim1_rmsd = mdtraj.rmsd(trj_sim1, trj_sim1_frame1)
####### calculate the rmsd of trj_pullforce ############

####### plot the two rmsd in a single matplotlib plot ############
# Compute the distances between pairs of atoms in each frame through the trajectory
# This should give us an idea how far the amino acids are stretched by the pull force. 

sim1_dist = mdtraj.compute_distances(trj_sim1, [(trj_sim1.top.residue(0).atom(2).index, 
                                             trj_sim1.top.residue(9).atom(2).index)],periodic=False )

pullforce_dist = mdtraj.compute_distances(trj_pullforce, [(trj_pullforce.top.residue(0).atom(2).index, 
                                             trj_pullforce.top.residue(9).atom(2).index)],periodic=False)

plt.plot(sim1_dist,label='Normal simulation')
plt.plot(pullforce_dist,label='Pullforce simulation')
plt.legend()
plt.xlabel('Number of snapshots')
plt.ylabel('Distance in nanometer')
plt.show()
#pullforce_dist