Introduction & Features

There is an academic paper describing various aspects of the design and usage of wepy. Peer-review is in process:

Weighted Ensemble (WE)

The weighted ensemble algorithm (WE) is a general strategy for simulating rare or long-timescale events in stochastic systems []. It works by simulating an ensemble of different simulations (individually called ‘walkers’), where at specific times during the simulations all walkers are stopped and examined in order to identify any behaviors of interest that may have occured. Each pause in the simulation is called a ‘cycle’ and walkers which are “interesting” will have more simulation effort put into them while those of less exhibited interest will be dropped or simulated less. This typically is achieved by “cloning” high-value walkers into many copies, thus giving them more chances to exhibit otherwise rare behaviors. This is also usually concomitant with a removal (typically called merging/squashing) of walkers so that the computational resources are not diluted. This is similar to Importance Sampling methods, however, we are interested in not only the behaviors of the walker simulations but also the estimates of their probability. In order to achieve this, the process of cloning and merging must always must preserve the original weight distribution. Such a process is said to be a “resampling” process, and indeed the cloning and merging approach does this by construction. The outcome is that the “weights” of walkers is always accounted for, and in the limit of a fully converged simulation these weights correspond to the stationary probabilities.

While the computational effort needed to reach convergence may be excessive for systems with high-dimensional state spaces, WE has other advantages when compared to so-called biased methods. That is that the corresponding “laws” of propagation (e.g. Hamiltonians, force-fields, etc.) are never modified which means that the individual trajectories are always “correct”. This particular feature may not be of interest to many but for fields where the specific details of the “mechanism” of a process are of interest this is a major advantage. For instance a major use-case of WE (and wepy itself) is to observe “transition states” (kinetic bottleneck structures in biomolecular processes) at all-atom resolution. This is useful for scientists that are designing drugs that effect the structure in these particular states.

Features

  • State of the art WE resamplers: WExplore [] and REVO []

  • Super fast molecular dynamics via OpenMM []

  • Purpose built HDF5 storage format for WE data with extensive API: WepyHDF5

  • Analysis routines for:

    • free energy profiles

    • rate calculations

    • computing trajectory observables

    • extracting linear trajectories from clone-merge trees

    • conformation state networks such as Markov State Models (MSMs)

    • aggregating multiple runs

  • Orchestration framework for managing large number of simulations with simulation checkpointing, recovery, and continuations.

  • Expert friendly: Fully-featured framework for building and customizing simulations for exactly what you need.

  • Leverage the entire python ecosystem, you’re never limited to an old version of an embedded interpreter.

  • No complex ad hoc configuration files, everything is python.

Getting Started

Once you have wepy installed you can check out the quickstart to get a rough idea of how it works.

Then you can either read the user's guide or head on to the tutorials or execute the examples.

For a complete description of the modules and their components check out the API documentation.

Compatibility

Tested succesfully with:

Python

OpenMM

Pass

3.6

7.4.1

3.6

7.3.1

3.7

7.4.1

3.7

7.3.1

3.7

7.5.1

3.8

7.4.1

3.8

7.3.1

See the noxfile.py for full test matrix.

Contributed wepy libraries and other useful resources

Here is a list of packages that are not in the main wepy repository but may be of interest to users of wepy.

These include things like :

  • distance metrics and boundary conditions for different kinds of systems

  • resampler and runner prototypes

  • related analysis or utility libraries

They are:

geomm

purely functional library for common numerical routines in computational biology and chemistry, with no dependency on specific file or topology formats.

wepy-developer-resources

Unofficial and miscellaneous materials related to wepy, including talks, workshops, contributed tutorials etc. May be out of date.

mastic

Library for doing general purpose “profiling” of intermolecular interactions. Useful for computing observables an experimental chemist understands. Also useful for building distance metrics.

mdtraj

Excellent library with optimized code for numerical routines of interest in computational biology and chemistry. Differs from geomm in that it relies on their own topology format. The WepyHDF5 JSON topology format is borrowed from this library. Used in wepy as a utility writer of commonly used formats like PDBs, DCDs, etc.

openmmtools

Contributed components for OpenMM. Contains some ready-made test systems that are very convenient for testing and prototyping components in wepy.

openmm-systems

A friendly fork of openmmtools that just provides the test systems for ease of installation. We depend on this for our examples and testing.

CSNAnalysis

small library for aiding in the analysis of conformation state networks (CSNs) which can be generated from wepy data.

Alternatives

wepy is not the only WE framework package. Other packages have different scopes and features. I have tried to provide a fair comparison of wepy to them to help potential users make an informed decision. If you feel a package is misrepresented contact the wepy devs or submit a pull request with your desired changes.

WESTPA

Weighted ensemble package in Python 2.7. More reliant and integrated with unix-like operating systems providing modularity through shell scripting and python modules [].

As an older project it has support for more MD engines (and non-MD stochastic sampling engines, e.g. BioNetGen) and is currently better suited for running simulations on large numbers of CPUs in a clustered environment.

Support for WE algorithms closer to the original paper by Huber and Kim with a focus on static tesselation of conformational space.

Has some support for adaptive binning algorithms like WExplore, but it is a little more challenging to develop radically different resamplers like REVO, which have no concept of bins at all.

AWE: Accelerated Weighted Ensemble

Another Python 2 library with a focus on the Accelerated WE resampling algorithm and integration with a Work Queue library for distributed jobs [] .

Bibliography