Rebinding Simulation

In the previous step we mentioned about how to prepare your data for running a Wepy simulation. In this part, we will cover the topic of how to setup the simulation with this data.

Which files you need depends on whether you are doing a rebinding or rebinding simulation. Since we are focusing on rebinding simulation for this tutorial we need the following files:

  • system_bonded.pkl : Generated in the Building System and Topology Pickle Files section.

  • topology_bonded.pkl : Generated in the Building System and Topology Pickle Files section.

  • system_unbonded.pkl : Generated in the Building System and Topology Pickle Files section.

  • topology_unbonded.pkl : Generated in the Building System and Topology Pickle Files section.

  • *.rst : A restart file from the simulation you want to continue from. This is need for getting the positions and boundary values of the system. E.g. step5_10.rst in this case.

  • *.pdb : A PDB file of the system. E.g. step3_input.pdb in this case.

If you do not have these files, please visit the How to Prepare Your Data section of this tutorial.

After confirming that you have the necessary files, you can proceed to the next step. Let’s start by importing necessary modules.

import os
import os.path as osp
import pickle as pkl

import simtk.openmm.app as omma
import simtk.openmm as omm
import simtk.unit as unit

import mdtraj as mdj

from wepy.sim_manager import Manager
from wepy.resampling.resamplers.revo import REVOResampler
from wepy.resampling.distances.receptor import RebindingDistance
from wepy.runners.openmm import OpenMMGPUWalkerTaskProcess, OpenMMRunner, OpenMMWalker, OpenMMState, gen_sim_state
from wepy.boundary_conditions.receptor import RebindingBC
from wepy.reporter.hdf5 import WepyHDF5Reporter
from wepy.work_mapper.task_mapper import TaskMapper
from wepy.util.mdtraj import mdtraj_to_json_topology

from walker_pkl_reporter import WalkersPickleReporter

from wepy.reporter.dashboard import DashboardReporter
from wepy.reporter.openmm import OpenMMRunnerDashboardSection

Great! Now that we have imported the necessary modules, we can proceed to setup simulation parameters file directories. In this tutorial, we will utilize 8 walkers and run 2 cycles of 5 steps. These parameters can be adjusted to suit your needs since these parameters are for a short demonstration.

# Typical sim details
num_walkers = 8
run = 1
n_steps = 5
n_cycles = 2

# Native state
native_rst_path = f'{inp_path}/native.rst'
native_system_path = f'{inp_path}/system_native.pkl'
native_topology_path = f'{inp_path}/topology_native.pkl'

#### Paths: Set it to your preference
inp_path = './'
pdb_path = f'{inp_path}/step3_input.pdb'
rst_path = f'{inp_path}/step5_10.rst'
system_path = f'{inp_path}/system.pkl'
topology_path = f'{inp_path}/topology.pkl'
outputs_dir = f'simdata_run{run}_steps{n_steps}_cycs{n_cycles}'
os.makedirs(outputs_dir, exist_ok=True)
####

The next step is to use the files we have prepared to setup the simulation. We will start by loading the system and topology files.

with open(system_path,'rb') as f:
system = pkl.load(f)

with open(topology_path,'rb') as f:
    omm_top = pkl.load(f)

# Load native state system
with open(native_system_path,'rb') as f:
    native_system = pkl.load(f)

with open(native_topology_path,'rb') as f:
    native_omm_top = pkl.load(f)

# Get positions and box vectors from an rst file
# This is typically a restart file post nvt/npt from openmm
# However, you can build it from scratch using omma.PDBFile() too
with open(rst_path, 'r') as f:
    simtk_state = omm.XmlSerializer.deserialize(f.read())
    bv = simtk_state.getPeriodicBoxVectors()
    pos = simtk_state.getPositions()

with open(native_rst_path, 'r') as f:
    native_simtk_state = omm.XmlSerializer.deserialize(f.read())
    native_bv = native_simtk_state.getPeriodicBoxVectors()
    native_pos = native_simtk_state.getPositions()

system.setDefaultPeriodicBoxVectors(bv[0],bv[1],bv[2])

Now we need an integrator and a runner for our simulation. an integrator is a computational method that updates the positions and velocities of particles over time. It calculates the motion of particles based on the forces acting on them, following Newton’s laws of motion. In this tutorial, we will use the Langevin integrator. This simulation will be running on a GPU, so we will use the platform=’CUDA’ option while setting up the OpenMMRunner.

integrator = omm.LangevinIntegrator(300*unit.kelvin,
                                1/unit.picosecond,
                                0.002*unit.picoseconds)

runner = OpenMMRunner(system, omm_top, integrator, platform='CUDA')

Next, we will build the walkers for the simulation. For this, we need to generate the state from the system and topology files. All of the initial walkers will have the same state and weight at the start of the simulation.

# Generate a new simtk "state"
new_simtk_state = gen_sim_state(pos,
                                system,
                                integrator)

# Generate native simtk state
native_simtk_state = gen_sim_state(native_pos,
                                native_system,
                                integrator)

# Set up parameters for running the simulation
init_weight = 1.0 / num_walkers

# Generate the walker state in wepy format
walker_state = OpenMMState(new_simtk_state)

# Make a list of the initial walkers
init_walkers = [OpenMMWalker(walker_state, init_weight) for i in range(num_walkers)]

native_state = OpenMMState(native_simtk_state)

Now that we have the walkers, we need a distance metric for measuring differences between walker states. In this case, we will use the RebindingDistance class from the wepy library, however you can define your own distance metrics if needed. For this tutorial, we will use the the indices of the ligand, binding site and the walker state.

pdb = mdj.load_pdb(pdb_path)
json_top = mdtraj_to_json_topology(pdb.top)

# Save some relevant indices
lig_idxs = pdb.top.select('<selection of your ligand>')
binding_selection_idxs =  pdb.top.select('<selection of your binding site residues>')

# Distance metric to be used in REVO
rb_distance = RebindingDistance(lig_idxs,
                                binding_selection_idxs,
                                native_state)

Next, we will setup the boundary conditions for the simulation. In this case, we will use the RebindingBC class from the wepy library. This class will be used to check if the walker has crossed the boundary and should be resampled.

rbc = RebindingBC(cutoff_distance=0.2,  # nm
                native_state=native_state,
                initial_state=walker_state,
                initial_weights=init_weight,
                topology=json_top,
                ligand_idxs=lig_idxs,
                binding_site_idxs=binding_selection_idxs)

Next, we will use the REVOResampler class from the wepy library to setup the resampler. This class will be used to resample the walkers based on the distance metric and boundary conditions we have defined.

# Set up the REVO Resampler with the parameters
resampler = REVOResampler(distance=rb_distance,
                        init_state=walker_state,
                        weights=True,
                        pmax=0.1,
                        dist_exponent=4,
                        merge_dist=0.25,
                        char_dist=0.1)

And finally, we need to setup the reporters for recording the simulation data, define the task mapper and the simulation manager. An important point in this step is to define the number of workers and the device ids for the GPU. In this case, we will use 2 workers and the GPU device ids 0 and 1. If you are using more GPUs, you can increase the number of workers and device ids accordingly.

# Set up the HDF5 reporter
hdf5_reporter = WepyHDF5Reporter(save_fields=('positions','box_vectors'),
                        file_path=osp.join(outputs_dir,f'wepy.results.h5') ,
                        resampler=resampler,
                        boundary_conditions=rbc,
                        topology=json_top)

# Set up the pickle reporter (Essential for restarts)
out_folder_pkl = osp.join(outputs_dir,f'pkls')
pkl_reporter = WalkersPickleReporter(save_dir = out_folder_pkl,
                            freq = 1,
                            num_backups = 2)

# Set up the dashboard reporter
dashboard_path = osp.join(outputs_dir,f'wepy.dash.org')
openmm_dashboard_sec = OpenMMRunnerDashboardSection(runner)
dashboard_reporter = DashboardReporter(file_path = dashboard_path,
                                runner_dash = openmm_dashboard_sec)


# Create a work mapper for NVIDIA GPUs for a GPU cluster
mapper = TaskMapper(walker_task_type=OpenMMGPUWalkerTaskProcess,
                num_workers=2,
                platform='CUDA',
                device_ids=[0,1])

# Build the simulation manager
sim_manager = Manager(init_walkers,
                runner=runner,
                resampler=resampler,
                boundary_conditions=rbc,
                work_mapper=mapper,
                reporters=[hdf5_reporter, pkl_reporter, dashboard_reporter])

print('Running the simulation...')
# run a simulation with the manager for 'n_cycles' with 'n_steps' of integrator steps in each
steps_list = [n_steps for i in range(n_cycles)]

# and..... go!
sim_manager.run_simulation(n_cycles,
                    steps_list)

And that is it! Now you can start the simulation by running python wepy_run.py. The simulation will run for the specified number of cycles and steps. The simulation data will be saved in the outputs_dir directory. You can use the saved data to analyze the simulation results.