wepy.hdf5 module¶
Primary wepy simulation database driver and access API using the HDF5 format.
The HDF5 Format Specification¶
As part of the wepy framework this module provides a fully-featured API for creating and accessing data generated in weighted ensemble simulations run with wepy.
The need for a special purpose format is many-fold but primarily it is the nonlinear branching structure of walker trajectories coupled with weights.
That is for standard simulations data is organized as independent linear trajectories of frames each related linearly to the one before it and after it.
In weighted ensemble due to the resampling (i.e. cloning and merging) of walkers, a single frame may have multiple ‘child’ frames.
This is the primary motivation for this format.
However, in practice it solves several other issues and itself is a more general and flexible format than for just weighted ensemble simulations.
Concretely the WepyHDF5 format is simply an informally described schema that is commensurable with the HDF5 constructs of hierarchical groups (similar to unix filesystem directories) arranged as a tree with datasets as the leaves.
The hierarchy is fairly deep and so we will progress downwards from the top and describe each broad section in turn breaking it down when necessary.
Header¶
The items right under the root of the tree are:
runs
topology
_settings
The first item ‘runs’ is itself a group that contains all of the primary data from simulations. In WepyHDF5 the run is the unit dataset. All data internal to a run is self contained. That is for multiple dependent trajectories (e.g. from cloning and merging) all exist within a single run.
This excludes metadata-like things that may be needed for interpreting this data, such as the molecular topology that imposes structure over a frame of atom positions. This information is placed in the ‘topology’ item.
The topology field has no specified internal structure at this time. However, with the current implementation of the WepyHDF5Reporter (which is the principal implementation of generating a WepyHDF5 object/file from simulations) this is simply a string dataset. This string dataset should be a JSON compliant string. The format of which is specified elsewhere and was borrowed from the mdtraj library.
Warning! this format and specification for the topology is subject to change in the future and will likely be kept unspecified indefinitely.
For most intents and purposes (which we assume to be for molecular or molecular-like simulations) the ‘topology’ item (and perhaps any other item at the top level other than those proceeded by and underscore, such as in the ‘_settings’ item) is merely useful metadata that applies to ALL runs and is not dynamical.
In the language of the orchestration module all data in ‘runs’ uses the same ‘apparatus’ which is the function that takes in the initial conditions for walkers and produces new walkers. The apparatus may differ in the specific values of parameters but not in kind. This is to facilitate runs that are continuations of other runs. For these kinds of simulations the state of the resampler, boundary conditions, etc. will not be as they were initially but are the same in kind or type.
All of the necessary type information of data in runs is kept in the ‘_settings’ group. This is used to serialize information about the data types, shapes, run to run continuations etc. This allows for the initialization of an empty (no runs) WepyHDF5 database at one time and filling of data at another time. Otherwise types of datasets would have to be inferred from the data itself, which may not exist yet.
As a convention items which are preceeded by an underscore (following the python convention) are to be considered hidden and mechanical to the proper functioning of various WepyHDF5 API features, such as sparse trajectory fields.
The ‘_settings’ is specified as a simple key-value structure, however values may be arbitrarily complex.
Runs¶
The meat of the format is contained within the runs group:
runs
0
1
2
…
Under the runs group are a series of groups for each run. Runs are named according to the order in which they were added to the database.
Within a run (say ‘0’ from above) we have a number of items:
0
init_walkers
trajectories
decision
resampling
resampler
warping
progress
boundary_conditions
Trajectories¶
The ‘trajectories’ group is where the data for the frames of the walker trajectories is stored.
Even though the tree-like trajectories of weighted ensemble data may be well suited to having a tree-like storage topology we have opted to use something more familiar to the field, and have used a collection of linear “trajectories”.
This way of breaking up the trajectory data coupled with proper records of resampling (see below) allows for the imposition of a tree structure without committing to that as the data storage topology.
This allows the WepyHDF5 format to be easily used as a container format for collections of linear trajectories. While this is not supported in any real capacity it is one small step to convergence. We feel that a format that contains multiple trajectories is important for situations like weighted ensemble where trajectories are interdependent. The transition to a storage format like HDF5 however opens up many possibilities for new features for trajectories that have not occurred despite several attempts to forge new formats based on HDF5 (TODO: get references right; see work in mdtraj and MDHDF5).
Perhaps these formats have not caught on because the existing formats (e.g. XTC, DCD) for simple linear trajectories are good enough and there is little motivation to migrate.
However, by making the WepyHDF5 format (and related sub-formats to be described e.g. record groups and the trajectory format) both cover a new use case which can’t be achieved with old formats and old ones with ease.
Once users see the power of using a format like HDF5 from using wepy they may continue to use it for simpler simulations.
In any case the ‘trajectories’ in the group for weighted ensemble simulations should be thought of only as containers and not literally as trajectories. That is frame 4 does not necessarily follow from frame 3. So one may think of them more as “lanes” or “slots” for trajectory data that needs to be stitched together with the appropriate resampling records.
The routines and methods for generating contiguous trajectories from the data in WepyHDF5 are given through the ‘analysis’ module, which generates “traces” through the dataset.
With this in mind we will describe the sub-format of a trajectory now.
The ‘trajectories’ group is similar to the ‘runs’ group in that it has sub-groups whose names are numbers. These numbers however are not the order in which they are created but an index of that trajectory which are typically laid out all at once.
For a wepy simulation with a constant number of walkers you will only ever need as many trajectories/slots as there are walkers. So if you have 8 walkers then you will have trajectories 0 through 7. Concretely:
runs
0
trajectories
0
1
2
3
4
5
6
7
If we look at trajectory 0 we might see the following groups within:
positions
box_vectors
velocities
weights
Which is what you would expect for a constant pressure molecular dynamics simulation where you have the positions of the atoms, the box size, and velocities of the atoms.
The particulars for what “fields” a trajectory in general has are not important but this important use-case is directly supported in the WepyHDF5 format.
In any such simulation, however, the ‘weights’ field will appear since this is the weight of the walker of this frame and is a value important to weighted ensemble and not the underlying dynamics.
The naive approach to these fields is that each is a dataset of dimension (n_frames, feature_vector_shape[0], …) where the first dimension is the cycle_idx and the rest of the dimensions are determined by the atomic feature vector for each field for a single frame.
For example, the positions for a molecular simulation with 100 atoms with x, y, and z coordinates that ran for 1000 cycles would be a dataset of the shape (1000, 100, 3). Similarly the box vectors would be (1000, 3, 3) and the weights would be (1000, 1).
This uniformity vastly simplifies accessing and adding new variables and requires that individual state values in walkers always be arrays with shapes, even when they are single values (e.g. energy). The exception being the weight which is handled separately.
However, this situation is actually more complex to allow for special features.
First of all is the presence of compound fields which allow nesting of multiple groups.
The above “trajectory fields” would have identifiers such as the literal strings ‘positions’ and ‘box_vectors’, while a compound field would have an identifier ‘observables/rmsd’ or ‘alt_reps/binding_site’.
Use of trajectory field names using the ‘/’ path separator will automatically make a field a group and the last element of the field name the dataset. So for the observables example we might have:
0
observables
rmsd
sasa
Where the rmsd would be accessed as a trajectory field of trajectory 0 as ‘observables/rmsd’ and the solvent accessible surface area as ‘observables/sasa’.
This example introduces how the WepyHDF5 format is not only useful for storing data produced by simulation but also in the analysis of that data and computation of by-frame quantities.
The ‘observables’ compound group key prefix is special and will be used in the ‘compute_observables’ method.
The other special compound group key prefix is ‘alt_reps’ which is used for particle simulations to store “alternate representation” of the positions. This is useful in cooperation with the next feature of wepy trajectory fields to allow for more economical storage of data.
The next feature (and complication of the format) is the allowance for sparse fields. As the fields were introduced we said that they should have as many feature vectors as there are frames for the simulation. In the example however, you will notice that storing both the full atomic positions and velocities for a long simulation requires a heavy storage burden.
So perhaps you only want to store the velocities (or forces) every 100 frames so that you can be able to restart a simulation form midway through the simulation. This is achieved through sparse fields.
A sparse field is no longer a dataset but a group with two items:
_sparse_idxs
data
The ‘_sparse_idxs’ are simply a dataset of integers that assign each element of the ‘data’ dataset to a frame index. Using the above example we run a simulation for 1000 frames with 100 atoms and we save the velocities every 100 frames we would have a ‘velocities/data’ dataset of shape (100, 100, 3) which is 10 times less data than if it were saved every frame.
While this complicates the storage format use of the proper API methods should be transparent whether you are returning a sparse field or not.
As alluded to above the use of sparse fields can be used for more than just accessory fields. In many simulations, such as those with full atomistic simulations of proteins in solvent we often don’t care about the dynamics of most of the atoms in the simulation and so would like to not have to save them.
The ‘alt_reps’ compound field is meant to solve this. For example, the WepyHDF5Reporter supports a special option to save only a subset of the atoms in the main ‘positions’ field but also to save the full atomic system as an alternate representation, which is the field name ‘alt_reps/all_atoms’. So that you can still save the full system every once in a while but be economical in what positions you save every single frame.
Note that there really isn’t a way to achieve this with other formats. You either make a completely new trajectory with only the atoms of interest and now you are duplicating those in two places, or you duplicate and then filter your full systems trajectory file and rely on some sort of index to always live with it in the filesystem, which is a very precarious scenario. The situation is particularly hopeless for weighted ensemble trajectories.
Init Walkers¶
The data stored in the ‘trajectories’ section is the data that is returned after running dynamics in a cycle. Since we view the WepyHDF5 as a completely self-contained format for simulations it seems negligent to rely on outside sources (such as the filesystem) for the initial structures that seeded the simulations. These states (and weights) can be stored in this group.
The format of this group is identical to the one for trajectories except that there is only one frame for each slot and so the shape of the datasets for each field is just the shape of the feature vector.
Record Groups¶
TODO: add reference to reference groups
The last five items are what are called ‘record groups’ and all follow the same format.
Each record group contains itself a number of datasets, where the names of the datasets correspond to the ‘field names’ from the record group specification. So each record groups is simply a key-value store where the values must be datasets.
For instance the fields in the ‘resampling’ (which is particularly important as it encodes the branching structure) record group for a WExplore resampler simulation are:
step_idx
walker_idx
decision_id
target_idxs
region_assignment
Where the ‘step_idx’ is an integer specifying which step of resampling within the cycle the resampling action took place (the cycle index is metadata for the group). The ‘walker_idx’ is the index of the walker that this action was assigned to. The ‘decision_id’ is an integer that is related to an enumeration of decision types that encodes which discrete action is to be taken for this resampling event (the enumeration is in the ‘decision’ item of the run groups). The ‘target_idxs’ is a variable length 1-D array of integers which assigns the results of the action to specific target ‘slots’ (which was discussed for the ‘trajectories’ run group). And the ‘region_assignment’ is specific to WExplore which reports on which region the walker was in at that time, and is a variable length 1-D array of integers.
Additionally, record groups are broken into two types:
continual
sporadic
Continual records occur once per cycle and so there is no extra indexing necessary.
Sporadic records can happen multiple or zero times per cycle and so require a special index for them which is contained in the extra dataset ‘_cycle_idxs’.
It is worth noting that the underlying methods for each record group are general. So while these are the official wepy record groups that are supported if there is a use-case that demands a new record group it is a fairly straightforward task from a developers perspective.
- wepy.hdf5.TOPOLOGY = 'topology'¶
Default header apparatus dataset. The molecular topology dataset.
- wepy.hdf5.SETTINGS = '_settings'¶
Name of the settings group in the header group.
- wepy.hdf5.RUNS = 'runs'¶
The group name for runs.
- wepy.hdf5.RUN_IDX = 'run_idx'¶
Metadata field for run groups for the run index within this file.
- wepy.hdf5.RUN_START_SNAPSHOT_HASH = 'start_snapshot_hash'¶
Metadata field for a run that corresponds to the hash of the starting simulation snapshot in orchestration.
- wepy.hdf5.RUN_END_SNAPSHOT_HASH = 'end_snapshot_hash'¶
Metadata field for a run that corresponds to the hash of the ending simulation snapshot in orchestration.
- wepy.hdf5.TRAJ_IDX = 'traj_idx'¶
Metadata field for trajectory groups for the trajectory index in that run.
- wepy.hdf5.CYCLE_IDX = 'cycle_idx'¶
String for setting the names of cycle indices in records and miscellaneous situations.
- wepy.hdf5.SPARSE_FIELDS = 'sparse_fields'¶
Settings field name for sparse field trajectory field flags.
- wepy.hdf5.N_ATOMS = 'n_atoms'¶
Settings field name group for the number of atoms in the default positions field.
- wepy.hdf5.N_DIMS_STR = 'n_dims'¶
Settings field name for positions field spatial dimensions.
- wepy.hdf5.MAIN_REP_IDXS = 'main_rep_idxs'¶
Settings field name for the indices of the full apparatus topology in the default positions trajectory field.
- wepy.hdf5.ALT_REPS_IDXS = 'alt_reps_idxs'¶
Settings field name for the different ‘alt_reps’. The indices of the atoms from the full apparatus topology for each.
- wepy.hdf5.FIELD_FEATURE_SHAPES_STR = 'field_feature_shapes'¶
Settings field name for the trajectory field shapes.
- wepy.hdf5.FIELD_FEATURE_DTYPES_STR = 'field_feature_dtypes'¶
Settings field name for the trajectory field data types.
- wepy.hdf5.UNITS = 'units'¶
Settings field name for the units of the trajectory fields.
- wepy.hdf5.RECORD_FIELDS = 'record_fields'¶
Settings field name for the record fields that are to be included in the truncated listing of record group fields.
- wepy.hdf5.CONTINUATIONS = 'continuations'¶
Settings field name for the continuations relationships between runs.
- wepy.hdf5.TRAJECTORIES = 'trajectories'¶
Run field name for the trajectories group.
- wepy.hdf5.INIT_WALKERS = 'init_walkers'¶
Run field name for the initial walkers group.
- wepy.hdf5.DECISION = 'decision'¶
Run field name for the decision enumeration group.
- wepy.hdf5.RESAMPLING = 'resampling'¶
Record group run field name for the resampling records
- wepy.hdf5.RESAMPLER = 'resampler'¶
Record group run field name for the resampler records
- wepy.hdf5.WARPING = 'warping'¶
Record group run field name for the warping records
- wepy.hdf5.PROGRESS = 'progress'¶
Record group run field name for the progress records
- wepy.hdf5.BC = 'boundary_conditions'¶
Record group run field name for the boundary conditions records
- wepy.hdf5.NONE_STR = 'None'¶
String signifying a field of unspecified shape. Used for serializing the None python object.
- wepy.hdf5.CYCLE_IDXS = '_cycle_idxs'¶
Group name for the cycle indices of sporadic records.
- wepy.hdf5.SPORADIC_RECORDS = ('resampler', 'warping', 'resampling', 'boundary_conditions')¶
Enumeration of the record groups that are sporadic.
- wepy.hdf5.N_DIMS = 3¶
Number of dimensions for the default positions.
- wepy.hdf5.WEIGHTS = 'weights'¶
The field name for the frame weights.
- wepy.hdf5.POSITIONS = 'positions'¶
The field name for the default positions.
- wepy.hdf5.BOX_VECTORS = 'box_vectors'¶
The field name for the default box vectors.
- wepy.hdf5.VELOCITIES = 'velocities'¶
The field name for the default velocities.
- wepy.hdf5.FORCES = 'forces'¶
The field name for the default forces.
- wepy.hdf5.TIME = 'time'¶
The field name for the default time.
- wepy.hdf5.KINETIC_ENERGY = 'kinetic_energy'¶
The field name for the default kinetic energy.
- wepy.hdf5.POTENTIAL_ENERGY = 'potential_energy'¶
The field name for the default potential energy.
- wepy.hdf5.BOX_VOLUME = 'box_volume'¶
The field name for the default box volume.
- wepy.hdf5.PARAMETERS = 'parameters'¶
The field name for the default parameters.
- wepy.hdf5.PARAMETER_DERIVATIVES = 'parameter_derivatives'¶
The field name for the default parameter derivatives.
- wepy.hdf5.ALT_REPS = 'alt_reps'¶
The field name for the default compound field observables.
- wepy.hdf5.OBSERVABLES = 'observables'¶
The field name for the default compound field observables.
- wepy.hdf5.WEIGHT_SHAPE = (1,)¶
Weights feature vector shape.
- wepy.hdf5.WEIGHT_DTYPE¶
Weights feature vector data type.
- wepy.hdf5.FIELD_FEATURE_SHAPES = (('time', (1,)), ('box_vectors', (3, 3)), ('box_volume', (1,)), ('kinetic_energy', (1,)), ('potential_energy', (1,)))¶
Default shapes for the default fields.
- wepy.hdf5.FIELD_FEATURE_DTYPES = (('positions', <class 'float'>), ('velocities', <class 'float'>), ('forces', <class 'float'>), ('time', <class 'float'>), ('box_vectors', <class 'float'>), ('box_volume', <class 'float'>), ('kinetic_energy', <class 'float'>), ('potential_energy', <class 'float'>))¶
Default data types for the default fields.
- wepy.hdf5.POSITIONS_LIKE_FIELDS = ('velocities', 'forces')¶
Default trajectory fields which are the same shape as the main positions field.
- wepy.hdf5.DATA = 'data'¶
Name of the dataset in sparse trajectory fields.
- wepy.hdf5.SPARSE_IDXS = '_sparse_idxs'¶
Name of the dataset that indexes sparse trajectory fields.
- wepy.hdf5._iter_field_paths(grp)[source]¶
Return all subgroup field name paths from a group.
Useful for compound fields. For example if you have the group observables with multiple subfields:
observables - rmsd - sasa
Passing the h5py group ‘observables’ will return the full field names for each subfield:
‘observables/rmsd’
‘observables/sasa’
- class wepy.hdf5.WepyHDF5(filename, mode='x', topology=None, units=None, sparse_fields=None, feature_shapes=None, feature_dtypes=None, n_dims=None, alt_reps=None, main_rep_idxs=None, swmr_mode=False, expert_mode=False)[source]¶
Bases:
object
Wrapper for h5py interface to an HDF5 file object for creation and access of WepyHDF5 data.
This is the primary implementation of the API for creating, accessing, and modifying data in an HDF5 file that conforms to the WepyHDF5 specification.
Constructor for the WepyHDF5 class.
Initialize a new Wepy HDF5 file. This will create an h5py.File object.
The File will be closed after construction by default.
mode: r Readonly, file must exist r+ Read/write, file must exist w Create file, truncate if exists x or w- Create file, fail if exists a Read/write if exists, create otherwise
- Parameters:
filename (str) – File path
mode (str) – Mode specification for opening the HDF5 file.
topology (str) – JSON string representing topology of system being simulated.
units (dict of str : str, optional) – Mapping of trajectory field names to string specs for units.
sparse_fields (list of str, optional) – List of trajectory fields that should be initialized as sparse.
feature_shapes (dict of str : shape_spec, optional) – Mapping of trajectory fields to their shape spec for initialization.
feature_dtypes (dict of str : dtype_spec, optional) – Mapping of trajectory fields to their shape spec for initialization.
n_dims (int, default: 3) – Set the number of spatial dimensions for the default positions trajectory field.
alt_reps (dict of str : list of int, optional) – Specifies that there will be ‘alt_reps’ of positions each named by the keys of this mapping and containing the indices in each value list.
main_rep_idxs (list of int, optional) – The indices of atom positions to save as the main ‘positions’ trajectory field. Defaults to all atoms.
expert_mode (bool) – If True no initialization is performed other than the setting of the filename. Useful mainly for debugging.
- Raises:
AssertionError – If the mode is not one of the supported mode specs.
AssertionError – If a topology is not given for a creation mode.
- Warns:
If initialization data was given but the file was opened in a read mode.
- MODES = ('r', 'r+', 'w', 'w-', 'x', 'a')¶
The recognized modes for opening the WepyHDF5 file.
- WRITE_MODES = ('r+', 'w', 'w-', 'x', 'a')¶
- property swmr_mode¶
- _create_init()[source]¶
Creation mode constructor.
Completely overwrite the data in the file. Reinitialize the values and set with the new ones if given.
- _add_init()[source]¶
The addition mode constructor.
Create the dataset if it doesn’t exist and put it in r+ mode, otherwise, just open in r+ mode.
- _set_default_init_field_attributes(n_dims=None)[source]¶
Sets the feature_shapes and feature_dtypes to be the default for this module. These will be used to initialize field datasets when no given during construction (i.e. for sparse values)
- Parameters:
n_dims (int)
- _get_field_path_grp(run_idx, traj_idx, field_path)[source]¶
Given a field path for the trajectory returns the group the field’s dataset goes in and the key for the field name in that group.
The field path for a simple field is just the name of the field and for a compound field it is the compound field group name with the subfield separated by a ‘/’ like ‘observables/observable1’ where ‘observables’ is the compound field group and ‘observable1’ is the subfield name.
- _init_continuations()[source]¶
This will either create a dataset in the settings for the continuations or if continuations already exist it will reinitialize them and delete the data that exists there.
- Returns:
continuation_dset
- Return type:
h5py.Dataset
- _add_run_init(run_idx, continue_run=None)[source]¶
Routines for creating a run includes updating and setting object global variables, increasing the counter for the number of runs.
- _add_init_walkers(init_walkers_grp, init_walkers)[source]¶
Adds the run field group for the initial walkers.
- Parameters:
init_walkers_grp (h5py.Group) – The group to add the walker data to.
init_walkers (list of objects implementing the Walker interface) – The walkers to save in the group
- _init_run_sporadic_record_grp(run_idx, run_record_key, fields)[source]¶
Initialize a sporadic record group for a run.
- _init_run_continual_record_grp(run_idx, run_record_key, fields)[source]¶
Initialize a continual record group for a run.
- _init_run_records_field(run_idx, run_record_key, field_name, field_shape, field_dtype)[source]¶
Initialize a single field for a run record group.
- Parameters:
- Returns:
dataset
- Return type:
h5py.Dataset
- _init_traj_field(run_idx, traj_idx, field_path, feature_shape, dtype)[source]¶
Initialize a trajectory field.
Initialize a data field in the trajectory to be empty but resizeable.
- _init_contiguous_traj_field(run_idx, traj_idx, field_path, shape, dtype)[source]¶
Initialize a contiguous (non-sparse) trajectory field.
- _init_traj_fields(run_idx, traj_idx, field_paths, field_feature_shapes, field_feature_dtypes)[source]¶
Initialize a number of fields for a trajectory.
- _add_traj_field_data(run_idx, traj_idx, field_path, field_data, sparse_idxs=None)[source]¶
Add a trajectory field to a trajectory.
If the sparse indices are given the field will be created as a sparse field otherwise a normal one.
- _extend_contiguous_traj_field(run_idx, traj_idx, field_path, field_data)[source]¶
Add multiple new frames worth of data to the end of an existing contiguous (non-sparse)trajectory field.
- _extend_sparse_traj_field(run_idx, traj_idx, field_path, values, sparse_idxs)[source]¶
Add multiple new frames worth of data to the end of an existing contiguous (non-sparse)trajectory field.
- _add_sparse_field_flag(field_path)[source]¶
Register a trajectory field as sparse in the header settings.
- Parameters:
field_path (str) – Name of the trajectory field you want to flag as sparse
- _add_field_feature_shape(field_path, field_feature_shape)[source]¶
Add the shape to the header settings for a trajectory field.
- Parameters:
field_path (str) – The name of the trajectory field you want to set for.
field_feature_shape (shape_spec) – The shape spec to serialize as a dataset.
- _add_field_feature_dtype(field_path, field_feature_dtype)[source]¶
Add the data type to the header settings for a trajectory field.
- Parameters:
field_path (str) – The name of the trajectory field you want to set for.
field_feature_dtype (dtype_spec) – The dtype spec to serialize as a dataset.
- _set_field_feature_shape(field_path, field_feature_shape)[source]¶
Add the trajectory field shape to header settings or set the value.
- Parameters:
field_path (str) – The name of the trajectory field you want to set for.
field_feature_shape (shape_spec) – The shape spec to serialize as a dataset.
- _set_field_feature_dtype(field_path, field_feature_dtype)[source]¶
Add the trajectory field dtype to header settings or set the value.
- Parameters:
field_path (str) – The name of the trajectory field you want to set for.
field_feature_dtype (dtype_spec) – The dtype spec to serialize as a dataset.
- _extend_run_record_data_field(run_idx, run_record_key, field_name, field_data)[source]¶
Primitive record append method.
Adds data for a single field dataset in a run records group. This is done without paying attention to whether it is sporadic or continual and is supposed to be only the data write method.
- _run_record_namedtuple(run_record_key)[source]¶
Generate a namedtuple record type for a record group.
The class name will be formatted like ‘{}_Record’ where the {} will be replaced with the name of the record group.
- Parameters:
run_record_key (str) – Name of the record group
- Returns:
RecordType – The record type to generate records for this record group.
- Return type:
namedtuple
- _convert_record_field_to_table_column(run_idx, run_record_key, record_field)[source]¶
Converts a dataset of feature vectors to more palatable values for use in external datasets.
For single value feature vectors it unwraps them into single values.
For 1-D feature vectors it casts them as tuples.
Anything of higher rank will raise an error.
- _convert_record_fields_to_table_columns(run_idx, run_record_key)[source]¶
Convert record group data to truncated namedtuple records.
This uses the specified record fields from the header settings to choose which record group fields to apply this to.
Does no checking to make sure the fields are “table-ifiable”. If a field is not it will raise a TypeError.
- _make_records(run_record_key, cycle_idxs, fields)[source]¶
Generate a list of proper (nametuple) records for a record group.
- _run_records_sporadic(run_idxs, run_record_key)[source]¶
Generate records for a sporadic record group for a multi-run contig.
If multiple run indices are given assumes that these are a contig (e.g. the second run index is a continuation of the first and so on). This method is considered low-level and does no checking to make sure this is true.
The cycle indices of records from “continuation” runs will be modified so as the records will be indexed as if they are a single run.
Uses the record fields settings to decide which fields to use.
- _run_records_continual(run_idxs, run_record_key)[source]¶
Generate records for a continual record group for a multi-run contig.
If multiple run indices are given assumes that these are a contig (e.g. the second run index is a continuation of the first and so on). This method is considered low-level and does no checking to make sure this is true.
The cycle indices of records from “continuation” runs will be modified so as the records will be indexed as if they are a single run.
Uses the record fields settings to decide which fields to use.
- _get_contiguous_traj_field(run_idx, traj_idx, field_path, frames=None)[source]¶
Access actual data for a trajectory field.
- _get_sparse_traj_field(run_idx, traj_idx, field_path, frames=None, masked=True)[source]¶
Access actual data for a trajectory field.
- Parameters:
- Returns:
field_data – The data requested for the field.
- Return type:
arraylike
- _add_run_field(run_idx, field_path, data, sparse_idxs=None, force=False)[source]¶
Add a trajectory field to all trajectories in a run.
By enforcing adding it to all trajectories at one time we promote in-run consistency.
- Parameters:
run_idx (int)
field_path (str) – Name to set the trajectory field as. Can be compound.
data (arraylike of shape (n_trajectories, n_cycles, feature_vector_shape[0],...)) – The data for all trajectories to be added.
sparse_idxs (list of int) – If the data you are adding is sparse specify which cycles to apply them to.
If ‘force’ is turned on, no checking for constraints will be done.
- _add_field(field_path, data, sparse_idxs=None, force=False)[source]¶
Add a trajectory field to all runs in a file.
- Parameters:
field_path (str) – Name of trajectory field
data (list of arraylike) – Each element of this list corresponds to a single run. The elements of which are arraylikes of shape (n_trajectories, n_cycles, feature_vector_shape[0],…) for each run.
sparse_idxs (list of list of int) – The list of cycle indices to set for the sparse fields. If None, no trajectories are set as sparse.
- property filename¶
The path to the underlying HDF5 file.
- open(mode=None)[source]¶
Open the underlying HDF5 file for access.
- Parameters:
mode (str) – Valid mode spec. Opens the HDF5 file in this mode if given otherwise uses the existing mode.
- property mode¶
The WepyHDF5 mode this object was created with.
- property h5_mode¶
The h5py.File mode the HDF5 file currently has.
- _set_h5_mode(h5_mode)[source]¶
Set the mode to open the HDF5 file with.
This really shouldn’t be set without using the main wepy mode as they need to be aligned.
- property h5¶
The underlying h5py.File object.
- run(run_idx)[source]¶
Get the h5py.Group for a run.
- Parameters:
run_idx (int)
- Returns:
run_group
- Return type:
h5py.Group
- run_trajs(run_idx)[source]¶
Get the trajectories group for a run.
- Parameters:
run_idx (int)
- Returns:
trajectories_grp
- Return type:
h5py.Group
- property runs¶
The runs group.
- run_start_snapshot_hash(run_idx)[source]¶
Hash identifier for the starting snapshot of a run from orchestration.
- run_end_snapshot_hash(run_idx)[source]¶
Hash identifier for the ending snapshot of a run from orchestration.
- set_run_start_snapshot_hash(run_idx, snaphash)[source]¶
Set the starting snapshot hash identifier for a run from orchestration.
- set_run_end_snapshot_hash(run_idx, snaphash)[source]¶
Set the ending snapshot hash identifier for a run from orchestration.
- property settings_grp¶
The header settings group.
- decision_grp(run_idx)[source]¶
Get the decision enumeration group for a run.
- Parameters:
run_idx (int)
- Returns:
decision_grp
- Return type:
h5py.Group
- init_walkers_grp(run_idx)[source]¶
Get the group for the initial walkers for a run.
- Parameters:
run_idx (int)
- Returns:
init_walkers_grp
- Return type:
h5py.Group
- resampling_grp(run_idx)[source]¶
Get this record group for a run.
- Parameters:
run_idx (int)
- Returns:
run_record_group
- Return type:
h5py.Group
- resampler_grp(run_idx)[source]¶
Get this record group for a run.
- Parameters:
run_idx (int)
- Returns:
run_record_group
- Return type:
h5py.Group
- warping_grp(run_idx)[source]¶
Get this record group for a run.
- Parameters:
run_idx (int)
- Returns:
run_record_group
- Return type:
h5py.Group
- bc_grp(run_idx)[source]¶
Get this record group for a run.
- Parameters:
run_idx (int)
- Returns:
run_record_group
- Return type:
h5py.Group
- progress_grp(run_idx)[source]¶
Get this record group for a run.
- Parameters:
run_idx (int)
- Returns:
run_record_group
- Return type:
h5py.Group
- iter_trajs(idxs=False, traj_sel=None)[source]¶
Generator for iterating over trajectories in a file.
- Parameters:
- Yields:
traj_id (tuple of int, if idxs is True) – A tuple of (run_idx, traj_idx) for the group
trajectory (h5py.Group)
- property defined_traj_field_names¶
A list of the settings defined field names all trajectories have in the file.
- property observable_field_names¶
Returns a list of the names of the observables that all trajectories have.
If this encounters observable fields that don’t occur in all trajectories (inconsistency) raises an inconsistency error.
- _check_traj_field_consistency(field_names)[source]¶
Checks that every trajectory has the given fields across the entire dataset.
- property record_fields¶
The record fields for each record group which are selected for inclusion in the truncated records.
These are the fields which are considered to be table-ified.
- property sparse_fields¶
The trajectory fields that are sparse.
- property main_rep_idxs¶
The indices of the atoms included from the full topology in the default ‘positions’ trajectory
- property alt_reps_idxs¶
Mapping of the names of the alt reps to the indices of the atoms from the topology that they include in their datasets.
- property alt_reps¶
Names of the alt reps.
- property field_feature_shapes¶
Mapping of the names of the trajectory fields to their feature vector shapes.
- property field_feature_dtypes¶
Mapping of the names of the trajectory fields to their feature vector numpy dtypes.
- property continuations¶
The continuation relationships in this file.
- property metadata¶
File metadata (h5py.attrs).
- decision_enum(run_idx)[source]¶
Mapping of decision enumerated names to their integer representations.
- Parameters:
run_idx (int)
- Returns:
decision_enum – Mapping of the decision ID string to the integer representation.
- Return type:
dict of str : int
See also
WepyHDF5.decision_value_names
for the reverse mapping
- decision_value_names(run_idx)[source]¶
Mapping of the integer values for decisions to the decision ID strings.
- Parameters:
run_idx (int)
- Returns:
decision_enum – Mapping of the decision integer to the decision ID string representation.
- Return type:
dict of int : str
See also
WepyHDF5.decision_enum
for the reverse mapping
- get_topology(alt_rep='positions')[source]¶
Get the JSON topology for a particular represenation of the positions.
By default gives the topology for the main ‘positions’ field (when alt_rep ‘positions’). To get the full topology the file was initialized with set alt_rep to None. Topologies for alternative representations (subfields of ‘alt_reps’) can be obtained by passing in the key for that alt_rep. For example, ‘all_atoms’ for the field in alt_reps called ‘all_atoms’.
- property topology¶
The topology for the full simulated system.
May not be the main representation in the POSITIONS field; for that use the get_topology method.
- Returns:
topology – The JSON topology string for the full representation.
- Return type:
- get_mdtraj_topology(alt_rep='positions')[source]¶
Get an mdtraj.Topology object for a system representation.
By default gives the topology for the main ‘positions’ field (when alt_rep ‘positions’). To get the full topology the file was initialized with set alt_rep to None. Topologies for alternative representations (subfields of ‘alt_reps’) can be obtained by passing in the key for that alt_rep. For example, ‘all_atoms’ for the field in alt_reps called ‘all_atoms’.
- initial_walker_fields(run_idx, fields, walker_idxs=None)[source]¶
Get fields from the initial walkers of the simulation.
- Parameters:
- Returns:
walker_fields – Dictionary mapping fields to the values for all walkers. Frames will be either in counting order if no indices were requested or the order of the walker indices as given.
- Return type:
dict of str : array of shape
- initial_walkers_to_mdtraj(run_idx, walker_idxs=None, alt_rep='positions')[source]¶
Generate an mdtraj Trajectory from a trace of frames from the runs.
Uses the default fields for positions (unless an alternate representation is specified) and box vectors which are assumed to be present in the trajectory fields.
The time value for the mdtraj trajectory is set to the cycle indices for each trace frame.
This is useful for converting WepyHDF5 data to common molecular dynamics data formats accessible through the mdtraj library.
- Parameters:
run_idx (int) – Run to get initial walkers for.
fields (list of str) – Names of the fields you want to retrieve.
walker_idxs (None or list of int) – If None returns all of the walkers fields, otherwise a list of ints that are a selection from those walkers.
alt_rep (None or str) – If None uses default ‘positions’ representation otherwise chooses the representation from the ‘alt_reps’ compound field.
- Returns:
traj
- Return type:
mdtraj.Trajectory
- property num_atoms¶
The number of atoms in the full topology representation.
- property num_dims¶
The number of spatial dimensions in the positions and alt_reps trajectory fields.
- property num_runs¶
The number of runs in the file.
- property num_trajs¶
The total number of trajectories in the entire file.
- property run_idxs¶
The indices of the runs in the file.
- run_traj_idx_tuples(runs=None)[source]¶
Get identifier tuples (run_idx, traj_idx) for all trajectories in all runs.
- get_traj_field_cycle_idxs(run_idx, traj_idx, field_path)[source]¶
Returns the cycle indices for a sparse trajectory field.
- next_run_idx()[source]¶
The index of the next run if it were to be added.
Because runs are named as the integer value of the order they were added this gives the index of the next run that would be added.
- Returns:
next_run_idx
- Return type:
- is_run_contig(run_idxs)[source]¶
This method checks that if a given list of run indices is a valid contig or not.
- clone(path, mode='x')[source]¶
Clone the header information of this file into another file.
Clones this WepyHDF5 file without any of the actual runs and run data. This includes the topology, units, sparse_fields, feature shapes and dtypes, alt_reps, and main representation information.
This method will flush the buffers for this file.
Does not preserve metadata pertaining to inter-run relationships like continuations.
- link_run(filepath, run_idx, continue_run=None, **kwargs)[source]¶
Add a run from another file to this one as an HDF5 external link.
- Parameters:
filepath (str) – File path to the HDF5 file that the run is on.
run_idx (int) – The run index from the target file you want to link.
continue_run (int, optional) – The run from the linking WepyHDF5 file you want the target linked run to continue.
kwargs (dict) – Adds metadata (h5py.attrs) to the linked run.
- Returns:
linked_run_idx – The index of the linked run in the linking file.
- Return type:
- link_file_runs(wepy_h5_path)[source]¶
Link all runs from another WepyHDF5 file.
This preserves continuations within that file. This will open the file if not already opened.
- extract_run(filepath, run_idx, continue_run=None, run_slice=None, **kwargs)[source]¶
Add a run from another file to this one by copying it and truncating it if necessary.
- Parameters:
filepath (str) – File path to the HDF5 file that the run is on.
run_idx (int) – The run index from the target file you want to link.
continue_run (int, optional) – The run from the linking WepyHDF5 file you want the target linked run to continue.
run_slice
kwargs (dict) – Adds metadata (h5py.attrs) to the linked run.
- Returns:
linked_run_idx – The index of the linked run in the linking file.
- Return type:
- extract_file_runs(wepy_h5_path, run_slices=None)[source]¶
Extract (copying and truncating appropriately) all runs from another WepyHDF5 file.
This preserves continuations within that file. This will open the file if not already opened.
- join(other_h5)[source]¶
Given another WepyHDF5 file object does a left join on this file, renumbering the runs starting from this file.
This function uses the H5O function for copying. Data will be copied not linked.
- Parameters:
other_h5 (h5py.File) – File handle to the file you want to join to this one.
- add_metadata(key, value)[source]¶
Add metadata for the whole file.
- Parameters:
key (str)
value (h5py value) – h5py valid metadata value.
- init_record_fields(run_record_key, record_fields)[source]¶
Initialize the settings record fields for a record group in the settings group.
Save which records are to be considered from a run record group’s datasets to be in the table like representation. This exists to allow there to large and small datasets for records to be stored together but allow for a more compact single table like representation to be produced for serialization.
- init_resampling_record_fields(resampler)[source]¶
Initialize the record fields for this record group.
- Parameters:
resampler (object implementing the Resampler interface) – The resampler which contains the data for which record fields to set.
- init_resampler_record_fields(resampler)[source]¶
Initialize the record fields for this record group.
- Parameters:
resampler (object implementing the Resampler interface) – The resampler which contains the data for which record fields to set.
- init_bc_record_fields(bc)[source]¶
Initialize the record fields for this record group.
- Parameters:
bc (object implementing the BoundaryConditions interface) – The boundary conditions object which contains the data for which record fields to set.
- init_warping_record_fields(bc)[source]¶
Initialize the record fields for this record group.
- Parameters:
bc (object implementing the BoundaryConditions interface) – The boundary conditions object which contains the data for which record fields to set.
- init_progress_record_fields(bc)[source]¶
Initialize the record fields for this record group.
- Parameters:
bc (object implementing the BoundaryConditions interface) – The boundary conditions object which contains the data for which record fields to set.
- new_run(init_walkers, continue_run=None, **kwargs)[source]¶
Initialize a new run.
- Parameters:
- Returns:
run_grp – The group of the newly created run.
- Return type:
h5py.Group
- init_run_resampling(run_idx, resampler)[source]¶
Initialize data for resampling records.
Initialized the run record group as well as settings for the fields.
This method also creates the decision group for the run.
- Parameters:
run_idx (int)
resampler (object implementing the Resampler interface) – The resampler which contains the data for which record fields to set.
- Returns:
record_grp
- Return type:
h5py.Group
- init_run_resampling_decision(run_idx, resampler)[source]¶
Initialize the decision group for the run resampling records.
- Parameters:
run_idx (int)
resampler (object implementing the Resampler interface) – The resampler which contains the data for which record fields to set.
- init_run_resampler(run_idx, resampler)[source]¶
Initialize data for this record group in a run.
Initialized the run record group as well as settings for the fields.
- Parameters:
run_idx (int)
resampler (object implementing the Resampler interface) – The resampler which contains the data for which record fields to set.
- Returns:
record_grp
- Return type:
h5py.Group
- init_run_warping(run_idx, bc)[source]¶
Initialize data for this record group in a run.
Initialized the run record group as well as settings for the fields.
- Parameters:
run_idx (int)
bc (object implementing the BoundaryConditions interface) – The boundary conditions object which contains the data for which record fields to set.
- Returns:
record_grp
- Return type:
h5py.Group
- init_run_progress(run_idx, bc)[source]¶
Initialize data for this record group in a run.
Initialized the run record group as well as settings for the fields.
- Parameters:
run_idx (int)
bc (object implementing the BoundaryConditions interface) – The boundary conditions object which contains the data for which record fields to set.
- Returns:
record_grp
- Return type:
h5py.Group
- init_run_bc(run_idx, bc)[source]¶
Initialize data for this record group in a run.
Initialized the run record group as well as settings for the fields.
- Parameters:
run_idx (int)
bc (object implementing the BoundaryConditions interface) – The boundary conditions object which contains the data for which record fields to set.
- Returns:
record_grp
- Return type:
h5py.Group
- init_run_fields_resampling_decision(run_idx, decision_enum_dict)[source]¶
Initialize the decision group for this run.
- add_traj(run_idx, data, weights=None, sparse_idxs=None, metadata=None)[source]¶
Add a full trajectory to a run.
- Parameters:
run_idx (int)
data (dict of str : arraylike) – Mapping of trajectory fields to the data for them to add.
weights (1-D arraylike of float) – The weights of each frame. If None defaults all frames to 1.0.
sparse_idxs (list of int) – Cycle indices the data corresponds to.
metadata (dict of str : value) – Metadata for the trajectory.
- Returns:
traj_grp
- Return type:
h5py.Group
- extend_traj(run_idx, traj_idx, data, weights=None)[source]¶
Extend a trajectory with data for all fields.
- extend_cycle_warping_records(run_idx, cycle_idx, warping_data)[source]¶
Add records for each field for this record group.
- extend_cycle_bc_records(run_idx, cycle_idx, bc_data)[source]¶
Add records for each field for this record group.
- extend_cycle_progress_records(run_idx, cycle_idx, progress_data)[source]¶
Add records for each field for this record group.
- extend_cycle_resampling_records(run_idx, cycle_idx, resampling_data)[source]¶
Add records for each field for this record group.
- extend_cycle_resampler_records(run_idx, cycle_idx, resampler_data)[source]¶
Add records for each field for this record group.
- extend_cycle_run_group_records(run_idx, run_record_key, cycle_idx, fields_data)[source]¶
Extend data for a whole records group.
This must have the cycle index for the data it is appending as this is done for sporadic and continual datasets.
- run_contig_records(run_idxs, run_record_key)[source]¶
Get the records for a record group for the contig that is formed by the run indices.
This alters the cycle indices for the records so that they appear to have come from a single run. That is they are the cycle indices of the contig.
- run_records_dataframe(run_idx, run_record_key)[source]¶
Get the records for a record group for a single run in the form of a pandas DataFrame.
- run_contig_records_dataframe(run_idxs, run_record_key)[source]¶
Get the records for a record group for a contig of runs in the form of a pandas DataFrame.
- resampling_records(run_idxs)[source]¶
Get the records this record group for the contig that is formed by the run indices.
This alters the cycle indices for the records so that they appear to have come from a single run. That is they are the cycle indices of the contig.
- resampling_records_dataframe(run_idxs)[source]¶
Get the records for this record group for a contig of runs in the form of a pandas DataFrame.
- resampler_records(run_idxs)[source]¶
Get the records this record group for the contig that is formed by the run indices.
This alters the cycle indices for the records so that they appear to have come from a single run. That is they are the cycle indices of the contig.
- resampler_records_dataframe(run_idxs)[source]¶
Get the records for this record group for a contig of runs in the form of a pandas DataFrame.
- warping_records(run_idxs)[source]¶
Get the records this record group for the contig that is formed by the run indices.
This alters the cycle indices for the records so that they appear to have come from a single run. That is they are the cycle indices of the contig.
- warping_records_dataframe(run_idxs)[source]¶
Get the records for this record group for a contig of runs in the form of a pandas DataFrame.
- bc_records(run_idxs)[source]¶
Get the records this record group for the contig that is formed by the run indices.
This alters the cycle indices for the records so that they appear to have come from a single run. That is they are the cycle indices of the contig.
- bc_records_dataframe(run_idxs)[source]¶
Get the records for this record group for a contig of runs in the form of a pandas DataFrame.
- progress_records(run_idxs)[source]¶
Get the records this record group for the contig that is formed by the run indices.
This alters the cycle indices for the records so that they appear to have come from a single run. That is they are the cycle indices of the contig.
- progress_records_dataframe(run_idxs)[source]¶
Get the records for this record group for a contig of runs in the form of a pandas DataFrame.
- run_resampling_panel(run_idx)[source]¶
Generate a resampling panel from the resampling records of a run.
- run_contig_resampling_panel(run_idxs)[source]¶
Generate a resampling panel from the resampling records of a contig, which is a series of runs.
- add_run_observable(run_idx, observable_name, data, sparse_idxs=None)[source]¶
Add a trajectory sub-field in the compound field “observables” for a single run.
- Parameters:
run_idx (int)
observable_name (str) – What to name the observable subfield.
data (arraylike of shape (n_trajs, feature_vector_shape[0], ...)) – The data for all of the trajectories that will be set to this observable field.
sparse_idxs (list of int, optional) – If not None, specifies the cycle indices this data corresponds to.
- add_traj_observable(observable_name, data, sparse_idxs=None)[source]¶
Add a trajectory sub-field in the compound field “observables” for an entire file, on a trajectory basis.
- Parameters:
observable_name (str) – What to name the observable subfield.
data (list of arraylike) – The data for each run are the elements of this argument. Each element is an arraylike of shape (n_traj_frames, feature_vector_shape[0],…) where the n_run_frames is the number of frames in trajectory.
sparse_idxs (list of list of int, optional) – If not None, specifies the cycle indices this data corresponds to. First by run, then by trajectory.
- add_observable(observable_name, data, sparse_idxs=None)[source]¶
Add a trajectory sub-field in the compound field “observables” for an entire file, on a compound run and trajectory basis.
- Parameters:
observable_name (str) – What to name the observable subfield.
data (list of list of arraylike) – The data for each run are the elements of this argument. Each element is a list of the trajectory observable arraylikes of shape (n_traj_frames, feature_vector_shape[0],…).
sparse_idxs (list of list of int, optional) – If not None, specifies the cycle indices this data corresponds to. First by run, then by trajectory.
- compute_observable(func, fields, args, map_func=<class 'map'>, traj_sel=None, save_to_hdf5=None, idxs=False, return_results=True)[source]¶
Compute an observable on the trajectory data according to a function. Optionally save that data in the observables data group for the trajectory.
- Parameters:
func (callable) – The function to apply to the trajectory fields (by cycle). Must accept a dictionary mapping string trajectory field names to a feature vector for that cycle and return an arraylike. May accept other positional arguments as well.
fields (list of str) – A list of trajectory field names to pass to the mapped function.
args (tuple) – A single tuple of arguments which will be expanded and passed to the mapped function for every evaluation.
map_func (callable) – The mapping function. The implementation of how to map the computation function over the data. Default is the python builtin map function. Can be a parallel implementation for example.
traj_sel (list of tuple, optional) – If not None, a list of trajectory identifier tuple (run_idx, traj_idx) to restrict the computation to.
save_to_hdf5 (None or string, optional) – If not None, a string that specifies the name of the observables sub-field that the computed values will be saved to.
idxs (bool) – If True will return the trajectory identifier tuple (run_idx, traj_idx) along with other return values.
return_results (bool) – If True will return the results of the mapping. If not using the ‘save_to_hdf5’ option, be sure to use this or results will be lost.
- Returns:
traj_id_tuples (list of tuple of int, if ‘idxs’ option is True) – A list of the tuple identifiers for each trajectory result.
results (list of arraylike, if ‘return_results’ option is True) – A list of arraylike feature vectors for each trajectory.
- get_traj_field(run_idx, traj_idx, field_path, frames=None, masked=True)[source]¶
Returns a numpy array for the given trajectory field.
You can control how sparse fields are returned using the masked option. When True (default) a masked numpy array will be returned such that you can get which cycles it is from, when False an unmasked array of the data will be returned which has no cycle information.
- Parameters:
run_idx (int)
traj_idx (int)
field_path (str) – Name of the trajectory field to get
frames (None or list of int) – If not None, a list of the frame indices of the trajectory to return values for.
masked (bool) – If true will return sparse field values as masked arrays, otherwise just returns the compacted data.
- Returns:
field_data – The data for the trajectory field.
- Return type:
arraylike
- get_trace_fields(frame_tups, fields, same_order=True)[source]¶
Get trajectory field data for the frames specified by the trace.
- Parameters:
frame_tups (list of tuple of int) – The trace values. Each tuple is of the form (run_idx, traj_idx, frame_idx).
fields (list of str) – The names of the fields to get for each frame.
same_order (bool) – (Default = True) If True will ensure that the results will be sorted exactly as the order of the frame_tups were. If False will return them in an arbitrary implementation determined order that should be more efficient.
- Returns:
trace_fields – Mapping of the field names to the array of feature vectors for the trace.
- Return type:
dict of str : arraylike
- get_run_trace_fields(run_idx, frame_tups, fields)[source]¶
Get trajectory field data for the frames specified by the trace within a single run.
- Parameters:
- Returns:
trace_fields – Mapping of the field names to the array of feature vectors for the trace.
- Return type:
dict of str : arraylike
- get_contig_trace_fields(contig_trace, fields)[source]¶
Get field data for all trajectories of a contig for the frames specified by the contig trace.
- Parameters:
- Returns:
contig_fields – of shape (n_cycles, n_trajs, field_feature_shape[0],…) Mapping of the field names to the array of feature vectors for contig trace.
- Return type:
dict of str : arraylike
- iter_trajs_fields(fields, idxs=False, traj_sel=None)[source]¶
Generator for iterating over fields trajectories in a file.
- Parameters:
- Yields:
traj_identifier (tuple of int if ‘idxs’ option is True) – Tuple identifying the trajectory the data belongs to (run_idx, traj_idx).
fields_data (dict of str : arraylike) – Mapping of the field name to the array of feature vectors of that field for this trajectory.
- traj_fields_map(func, fields, args, map_func=<class 'map'>, idxs=False, traj_sel=None)[source]¶
Function for mapping work onto field of trajectories.
- Parameters:
func (callable) – The function to apply to the trajectory fields (by cycle). Must accept a dictionary mapping string trajectory field names to a feature vector for that cycle and return an arraylike. May accept other positional arguments as well.
fields (list of str) – A list of trajectory field names to pass to the mapped function.
args (None or or tuple) – A single tuple of arguments which will be passed to the mapped function for every evaluation.
map_func (callable) – The mapping function. The implementation of how to map the computation function over the data. Default is the python builtin map function. Can be a parallel implementation for example.
traj_sel (list of tuple, optional) – If not None, a list of trajectory identifier tuple (run_idx, traj_idx) to restrict the computation to.
idxs (bool) – If True will return the trajectory identifier tuple (run_idx, traj_idx) along with other return values.
- Returns:
traj_id_tuples (list of tuple of int, if ‘idxs’ option is True) – A list of the tuple identifiers for each trajectory result.
results (list of arraylike) – A list of arraylike feature vectors for each trajectory.
- to_mdtraj(run_idx, traj_idx, frames=None, alt_rep=None)[source]¶
Convert a trajectory to an mdtraj Trajectory object.
Works if the right trajectory fields are defined. Minimally this is a representation, including the ‘positions’ field or an ‘alt_rep’ subfield.
Will also set the unitcell lengths and angle if the ‘box_vectors’ field is present.
Will also set the time for the frames if the ‘time’ field is present, although this is likely not useful since walker segments have the time reset.
- trace_to_mdtraj(trace, alt_rep=None)[source]¶
Generate an mdtraj Trajectory from a trace of frames from the runs.
Uses the default fields for positions (unless an alternate representation is specified) and box vectors which are assumed to be present in the trajectory fields.
The time value for the mdtraj trajectory is set to the cycle indices for each trace frame.
This is useful for converting WepyHDF5 data to common molecular dynamics data formats accessible through the mdtraj library.
- Parameters:
- Returns:
traj
- Return type:
mdtraj.Trajectory
- run_trace_to_mdtraj(run_idx, trace, alt_rep=None)[source]¶
Generate an mdtraj Trajectory from a trace of frames from the runs.
Uses the default fields for positions (unless an alternate representation is specified) and box vectors which are assumed to be present in the trajectory fields.
The time value for the mdtraj trajectory is set to the cycle indices for each trace frame.
This is useful for converting WepyHDF5 data to common molecular dynamics data formats accessible through the mdtraj library.
- Parameters:
- Returns:
traj
- Return type:
mdtraj.Trajectory
- _choose_rep_path(alt_rep)[source]¶
Given a positions specification string, gets the field name/path for it.
- Parameters:
alt_rep (str) – The short name (non relative path) for a representation of the positions.
- Returns:
rep_path (str) – The relative field path to that representation.
E.g.
If you give it ‘positions’ or None it will simply return
’positions’, however if you ask for ‘all_atoms’ it will return
’alt_reps/all_atoms’.
- traj_fields_to_mdtraj(traj_fields, alt_rep='positions')[source]¶
Create an mdtraj.Trajectory from a traj_fields dictionary.
- Parameters:
- Returns:
traj (mdtraj.Trajectory object)
This is mainly a convenience function to retrieve the correct
topology for the positions which will be passed to the generic
traj_fields_to_mdtraj function.