Introducing Observables

Observables are quantities calculated from simulation data that provide insights into the system’s behavior. A common observable is RMSD (Root Mean Square Deviation). In this guide, we’ll walk through the process of defining and using observables within a simulation.

import mdtraj as mdj

# Load the PDB file
pdb_path = 'path/to/your/file.pdb'
pdb = mdj.load_pdb(pdb_path)

To calculate observables, we often need to select specific residues or atoms. Here’s how to select residues using MDTraj.

# Select specific residues based on segment names and residue numbers
active_1_res = pdb.top.select('(segname PROA and (residue 399)) or (segname PROB and (residue 366))')
active_2_res = pdb.top.select('(segname PROB and (residue 399)) or (segname PROC and (residue 366))')
active_3_res = pdb.top.select('(segname PROC and (residue 399)) or (segname PROA and (residue 366))')

CARA_active_res = pdb.top.select('segname CARA and (residue 24 25)')
CARB_active_res = pdb.top.select('segname CARB and (residue 24 25)')
CARC_active_res = pdb.top.select('segname CARC and (residue 24 25)')

An observable function takes simulation data as input and computes a specific property. You can define your own functions based on your research needs.

Here’s an example of how you might define an observable function to compute centroid distances:

import numpy as np

def centroid_distance(fields_d, *args, **kwargs):
    centroid_distances = []
    for i in range(len(fields_d['positions'])):
        active_1_centroid = np.mean(fields_d['positions'][i][args[0]['active_1_res']], axis=0)
        active_2_centroid = np.mean(fields_d['positions'][i][args[0]['active_2_res']], axis=0)
        active_3_centroid = np.mean(fields_d['positions'][i][args[0]['active_3_res']], axis=0)

        CARA_active_centroid = np.mean(fields_d['positions'][i][args[0]['CARA_active_res']], axis=0)
        CARB_active_centroid = np.mean(fields_d['positions'][i][args[0]['CARB_active_res']], axis=0)
        CARC_active_centroid = np.mean(fields_d['positions'][i][args[0]['CARC_active_res']], axis=0)

        centroid_distances.append(np.array([
            np.linalg.norm(active_1_centroid - CARA_active_centroid),
            np.linalg.norm(active_3_centroid - CARB_active_centroid),
            np.linalg.norm(active_2_centroid - CARC_active_centroid)
        ]))

    return np.array(centroid_distances)

Once we have the observable function, we can use it to compute properties during the simulation. Wepy provides tools to manage simulation data and compute observables.

from wepy.hdf5 import WepyHDF5

# Path to the WEPY results files
outputs_dir = ['path/to/output1', 'path/to/output2']
wepy_results = [WepyHDF5(output_dir + '/wepy.results.h5', mode='r+') for output_dir in outputs_dir]

for wepy_result in wepy_results:
    with wepy_result:
        args = [{
            'active_1_res': active_1_res,
            'active_2_res': active_2_res,
            'active_3_res': active_3_res,
            'CARA_active_res': CARA_active_res,
            'CARB_active_res': CARB_active_res,
            'CARC_active_res': CARC_active_res
        }]

        obs = wepy_result.compute_observable(
            centroid_distance,
            ['positions'],  # Specify the required fields
            args=(args),    # Pass custom arguments
            save_to_hdf5='centroid',  # Save results to the HDF5 file
            return_results=True
        )

Now you have centroid distance data that you can use to analyze the system’s behavior. You can also use the data to visualize the system’s behavior over time.