wepy.analysis.network module

Module that allows for imposing a kinetically connected network structure of weighted ensemble simulation data.

exception wepy.analysis.network.MacroStateNetworkError[source]

Bases: Exception

Errors specific to MacroStateNetwork requirements.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class wepy.analysis.network.BaseMacroStateNetwork(contig_tree, assg_field_key=None, assignments=None, transition_lag_time=2)[source]

Bases: object

A base class for the MacroStateNetwork which doesn’t contain a WepyHDF5 object. Useful for serialization of the object and can then be reattached later to a WepyHDF5. For this functionality see the ‘MacroStateNetwork’ class.

BaseMacroStateNetwork can also be though of as just a way of mapping macrostate properties to the underlying microstate data.

The network itself is a networkx directed graph.

Upon construction the nodes will be a value called the ‘node_id’ which is the label/assignment for the node. This either comes from an explicit labelling (the ‘assignments’ argument) or from the labels/assignments from the contig tree (from the ‘assg_field_key’ argument).

Nodes have the following attributes after construction:

  • node_id :: Same as the actual node value

  • node_idx :: An extra index that is used for ‘internal’ ordering

    of the nodes in a consistent manner. Used for example in any method which constructs matrices from edges and ensures they are all the same.

  • assignments :: An index trace over the contig_tree dataset used

    to construct the network. This is how the individual microstates are indexed for each node.

  • num_samples :: A total of the number of microstates that a node

    has. Is the length of the ‘assignments’ attribute.

Additionally, there are auxiliary node attributes that may be added by various methods. All of these are prefixed with a single underscore ‘_’ and any user set values should avoid this.

These auxiliary attributes also make use of namespacing, where namespaces are similar to file paths and are separated by ‘/’ characters.

Additionally the auxiliary groups are typically managed such that they remain consistent across all of the nodes and have metadata queryable from the BaseMacroStateNetwork object. In contrast user defined node attributes are not restricted to this structure.

The auxiliary groups are:

  • ‘_groups’ :: used to mark nodes as belonging to a higher level group.

  • ‘_observables’ :: used for scalar values that are calculated

    from the underlying microstate structures. As opposed to more operational values describing the network itself. By virtue of being scalar these are also compatible with output to tabular formats.

Edge values are simply 2-tuples of node_ids where the first value is the source and the second value is the target. Edges have the following attributes following initialization:

  • ‘weighted_counts’ :: The weighted sum of all the transitions

    for an edge. This is a floating point number.

  • ‘unweighted_counts’ :: The unweighted sum of all the

    transitions for an edge, this is a normal count and is a whole integer.

  • ‘all_transition’ :: This is an array of floats of the weight

    for every individual transition for an edge. This is useful for doing more advanced statistics for a given edge.

A network object can be used as a stateful container for calculated values over the nodes and edges and has methods to support this. However, there is no standard way to serialize this data beyond the generic python techniques like pickle.

Create a network of macrostates from the simulation microstates using a field in the trajectory data or precomputed assignments.

Either ‘assg_field_key’ or ‘assignments’ must be given, but not both.

The ‘transition_lag_time’ is default set to 2, which is the natural connection between microstates. The lag time can be increased to vary the kinetic accuracy of transition probabilities generated through Markov State Modelling.

The ‘transition_lag_time’ must be given as an integer greater than 1.

Parameters
  • contig_tree (ContigTree object) –

  • assg_field_key (str, conditionally optional on 'assignments') – The field in the WepyHDF5 dataset you want to generate macrostates for.

  • assignments (list of list of array_like of dim (n_traj_frames, observable_shape[0], ..),) –

    conditionally optional on ‘assg_field_key’

    List of assignments for all frames in each run, where each element of the outer list is for a run, the elements of these lists are lists for each trajectory which are arraylikes of shape (n_traj, observable_shape[0], …).

See Also

ASSIGNMENTS = 'assignments'

Key for the microstates that are assigned to a macrostate.

_key_init(contig_tree)[source]

Initialize the assignments structures given the field key to use.

_assignments_init(assignments)[source]

Given the assignments structure sets up the other necessary structures.

Parameters

assignments (list of list of array_like of dim (n_traj_frames, observable_shape[0], ..),) –

conditionally optional on ‘assg_field_key’

List of assignments for all frames in each run, where each element of the outer list is for a run, the elements of these lists are lists for each trajectory which are arraylikes of shape (n_traj, observable_shape[0], …).

_init_transition_counts(contig_tree, transition_lag_time)[source]

Given the lag time get the transitions between microstates for the network using the sliding windows algorithm.

This will create a directed edge between nodes that had at least one transition, no matter the weight.

See the main class docstring for a description of the fields.

contig_tree should be unopened.

node_id_to_idx(assg_key)[source]

Convert a node_id (which is the assignment value) to a canonical index.

Parameters

assg_key (node_id) –

Returns

node_idx

Return type

int

node_idx_to_id(node_idx)[source]

Convert a node index to its node id.

Parameters

node_idx (int) –

Returns

node_id

Return type

node_id

node_id_to_idx_dict()[source]

Generate a full mapping of node_ids to node_idxs.

node_idx_to_id_dict()[source]

Generate a full mapping of node_idxs to node_ids.

property graph

The networkx.DiGraph of the macrostate network.

property num_states

The number of states in the network.

property node_ids

A list of the node_ids.

property contig_tree

The underlying ContigTree

property assg_field_key

The string key of the field used to make macro states from the WepyHDF5 dataset.

Raises

MacroStateNetworkError – If this wasn’t used to construct the MacroStateNetwork.

get_node_attributes(node_id)[source]

Returns the node attributes of the macrostate.

Parameters

node_id (node_id) –

Returns

macrostate_attrs

Return type

dict

get_node_attribute(node_id, attribute_key)[source]

Return the value for a specific node and attribute.

Parameters
  • node_id (node_id) –

  • attribute_key (str) –

Returns

Return type

node_attribute

get_nodes_attribute(attribute_key)[source]

Get a dictionary mapping nodes to a specific attribute.

node_assignments(node_id)[source]

Return the microstates assigned to this macrostate as a run trace.

Parameters

node_id (node_id) –

Returns

node_assignments – Run trace of the nodes assigned to this macrostate.

Return type

list of tuples of ints (run_idx, traj_idx, cycle_idx)

set_nodes_attribute(key, values_dict)[source]

Set node attributes for the key and values for each node.

Parameters
  • key (str) –

  • values_dict (dict of node_id: values) –

property node_groups
set_node_group(group_name, node_ids)[source]
_set_group_nodes_attribute(group_name, group_node_ids)[source]
property observables

The list of available observables.

node_observables(node_id)[source]

Dictionary of observables for each node_id.

set_nodes_observable(observable_name, node_values)[source]
get_edge_attributes(edge_id)[source]

Returns the edge attributes of the macrostate.

Parameters

edge_id (edge_id) –

Returns

edge_attrs

Return type

dict

get_edge_attribute(edge_id, attribute_key)[source]

Return the value for a specific edge and attribute.

Parameters
  • edge_id (edge_id) –

  • attribute_key (str) –

Returns

Return type

edge_attribute

get_edges_attribute(attribute_key)[source]

Get a dictionary mapping edges to a specific attribute.

property layouts
node_layouts(node_id)[source]

Dictionary of layouts for each node_id.

set_nodes_layout(layout_name, node_values)[source]
write_gexf(filepath, exclude_node_fields=None, exclude_edge_fields=None, layout=None)[source]

Writes a graph file in the gexf format of the network.

Parameters

filepath (str) –

nodes_to_records(extra_attributes=('_observables/total_weight'))[source]
nodes_to_dataframe(extra_attributes=('_observables/total_weight'))[source]

Make a dataframe of the nodes and their attributes.

Not all attributes will be added as they are not relevant to a table style representation anyhow.

The columns will be:

  • node_id

  • node_idx

  • num samples

  • groups (as booleans) which is anything in the ‘_groups’ namespace

  • observables : anything in the ‘_observables’ namespace and will assume to be scalars

And anything in the ‘extra_attributes’ argument.

edges_to_records(extra_attributes=None)[source]

Make a dataframe of the nodes and their attributes.

Not all attributes will be added as they are not relevant to a table style representation anyhow.

The columns will be:

  • edge_id

  • source

  • target

  • weighted_counts

  • unweighted_counts

edges_to_dataframe(extra_attributes=None)[source]

Make a dataframe of the nodes and their attributes.

Not all attributes will be added as they are not relevant to a table style representation anyhow.

The columns will be:

  • edge_id

  • source

  • target

  • weighted_counts

  • unweighted_counts

node_map(func, map_func=<class 'map'>)[source]

Map a function over the nodes.

The function should take as its first argument a node_id and the second argument a dictionary of the node attributes. This will not give access to the underlying trajectory data in the HDF5, to do this use the ‘node_fields_map’ function.

Extra args not supported use ‘functools.partial’ to make functions with arguments for all data.

Parameters
  • func (callable) – The function to map over the nodes.

  • map_func (callable) – The mapping function, implementing the map interface

Returns

node_values – The mapping of node_ids to the values computed by the mapped func.

Return type

dict of node_id : values

edge_attribute_to_matrix(attribute_key, fill_value=nan)[source]

Convert scalar edge attributes to an assymetric matrix.

This will always return matrices of size (num_nodes, num_nodes).

Additionally, matrices for the same network will always have the same indexing, which is according to the ‘node_idx’ attribute of each node.

For example if you have a matrix like:

>>> msn = MacroStateNetwork(...)
>>> mat = msn.edge_attribute_to_matrix('unweighted_counts')

Then, for example, the node with node_id of ‘10’ having a ‘node_idx’ of 0 will always be the first element for each dimension. Using this example the self edge ‘10’->’10’ can be accessed from the matrix like:

>>> mat[0,0]

For another node (‘node_id’ ‘25’) having ‘node_idx’ 4, we can get the edge from ‘10’->’25’ like:

>>> mat[0,4]

This is because ‘node_id’ does not necessarily have to be an integer, and even if they are integers they don’t necessarily have to be a contiguous range from 0 to N.

To get the ‘node_id’ for a ‘node_idx’ use the method ‘node_idx_to_id’.

>>> msn.node_idx_to_id(0)
=== 10
Parameters
  • attribute_key (str) – The key of the edge attribute the matrix should be made of.

  • fill_value (Any) – The value to put in the array for non-existent edges. Must be a numpy dtype compatible with the dtype of the attribute value.

Returns

edge_matrix – Assymetric matrix of dim (n_macrostates, n_macrostates). The 0-th axis corresponds to the ‘source’ node and the 1-st axis corresponds to the ‘target’ nodes, i.e. the dimensions mean: (source, target).

Return type

numpy.ndarray

class wepy.analysis.network.MacroStateNetwork(contig_tree, base_network=None, assg_field_key=None, assignments=None, transition_lag_time=2)[source]

Bases: object

Provides an abstraction over weighted ensemble data in the form of a kinetically connected network.

The MacroStateNetwork refers to any grouping of the so called “micro” states that were observed during simulation, i.e. trajectory frames, and not necessarily in the usual sense used in statistical mechanics. Although it is the perfect vehicle for working with such macrostates.

Because walker trajectories in weighted ensemble there is a natural way to generate the edges between the macrostate nodes in the network. These edges are determined automatically and a lag time can also be specified, which is useful in the creation of Markov State Models.

This class provides transparent access to an underlying ‘WepyHDF5’ dataset. If you wish to have a simple serializable network that does not reference see the ‘BaseMacroStateNetwork’ class, which you can construct standalone or access the instance attached as the ‘base_network’ attribute of an object of this class.

For a description of all of the default node and edge attributes which are set after construction see the docstring for the ‘BaseMacroStateNetwork’ class docstring.

Warning

This class is not serializable as it references a ‘WepyHDF5’ object. Either construct a ‘BaseMacroStateNetwork’ or use the attached instance in the ‘base_network’ attribute.

For documentation of the following arguments see the constructor docstring of the ‘BaseMacroStateNetwork’ class:

  • contig_tree

  • assg_field_key

  • assignments

  • transition_lag_time

The other arguments are documented here. This is primarily optional ‘base_network’ argument. This is a ‘BaseMacroStateNetwork’ instance, which allows you to associate it with a ‘WepyHDF5’ dataset for access to the microstate data etc.

Parameters

base_network (BaseMacroStateNetwork object) – An already constructed network, which will avoid recomputing all in-memory network values again for this object.

_set_base_network_to_self(base_network)[source]
open(mode=None)[source]
close()[source]
property graph

The networkx.DiGraph of the macrostate network.

property num_states

The number of states in the network.

property node_ids

A list of the node_ids.

property assg_field_key

The string key of the field used to make macro states from the WepyHDF5 dataset.

Raises

MacroStateNetworkError – If this wasn’t used to construct the MacroStateNetwork.

property base_network
property wepy_h5

The WepyHDF5 source object for which the contig tree is being constructed.

state_to_mdtraj(node_id, alt_rep=None)[source]

Generate an mdtraj.Trajectory object from a macrostate.

By default uses the “main_rep” in the WepyHDF5 object. Alternative representations of the topology can be specified.

Parameters
  • node_id (node_id) –

  • alt_rep (str) – (Default value = None)

Returns

traj

Return type

mdtraj.Trajectory

state_to_traj_fields(node_id, alt_rep=None)[source]
states_to_traj_fields(node_ids, alt_rep=None)[source]
get_node_fields(node_id, fields)[source]

Return the trajectory fields for all the microstates in the specified macrostate.

Parameters
  • node_id (node_id) –

  • fields (list of str) – Field name to retrieve.

Returns

fields – A dictionary mapping the names of the fields to an array of the field. Like fields of a trace.

Return type

dict of str: array_like

iter_nodes_fields(fields)[source]

Iterate over all nodes and return the field values for all the microstates for each.

Parameters

fields (list of str) –

Returns

nodes_fields – A dictionary with an entry for each node. Each node has it’s own dictionary of node fields for each microstate.

Return type

dict of node_id: (dict of field: array_like)

microstate_weights()[source]

Returns the weights of each microstate on the basis of macrostates.

Returns

microstate_weights

Return type

dict of node_id: ndarray

macrostate_weights()[source]

Compute the total weight of each macrostate.

Returns

macrostate_weights

Return type

dict of node_id: float

set_macrostate_weights()[source]

Compute the macrostate weights and set them as node attributes ‘total_weight’.

node_fields_map(func, fields, map_func=<class 'map'>)[source]

Map a function over the nodes and microstate fields.

The function should take as its arguments:

  1. node_id

  2. dictionary of all the node attributes

3. fields dictionary mapping traj field names. (The output of MacroStateNetwork.get_node_fields)

This will give access to the underlying trajectory data in the HDF5 which can be requested with the fields argument. The behaviour is very similar to the WepyHDF5.compute_observable function with the added input data to the mapped function being all of the macrostate node attributes.

Extra args not supported use ‘functools.partial’ to make functions with arguments for all data.

Parameters
  • func (callable) – The function to map over the nodes.

  • fields (iterable of str) – The microstate (trajectory) fields to provide to the mapped function.

  • map_func (callable) – The mapping function, implementing the map interface

Returns

node_values – The mapping of node_ids to the values computed by the mapped func.

Return type

dict of node_id : values

Returns

node_values – Dictionary mapping nodes to the computed values from the mapped function.

Return type

dict of node_id : values