wepy.analysis.network module¶
Module that allows for imposing a kinetically connected network structure of weighted ensemble simulation data.
- exception wepy.analysis.network.MacroStateNetworkError[source]¶
Bases:
Exception
Errors specific to MacroStateNetwork requirements.
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class wepy.analysis.network.BaseMacroStateNetwork(contig_tree, assg_field_key=None, assignments=None, transition_lag_time=2)[source]¶
Bases:
object
A base class for the MacroStateNetwork which doesn’t contain a WepyHDF5 object. Useful for serialization of the object and can then be reattached later to a WepyHDF5. For this functionality see the ‘MacroStateNetwork’ class.
BaseMacroStateNetwork can also be though of as just a way of mapping macrostate properties to the underlying microstate data.
The network itself is a networkx directed graph.
Upon construction the nodes will be a value called the ‘node_id’ which is the label/assignment for the node. This either comes from an explicit labelling (the ‘assignments’ argument) or from the labels/assignments from the contig tree (from the ‘assg_field_key’ argument).
Nodes have the following attributes after construction:
node_id :: Same as the actual node value
- node_idx :: An extra index that is used for ‘internal’ ordering
of the nodes in a consistent manner. Used for example in any method which constructs matrices from edges and ensures they are all the same.
- assignments :: An index trace over the contig_tree dataset used
to construct the network. This is how the individual microstates are indexed for each node.
- num_samples :: A total of the number of microstates that a node
has. Is the length of the ‘assignments’ attribute.
Additionally, there are auxiliary node attributes that may be added by various methods. All of these are prefixed with a single underscore ‘_’ and any user set values should avoid this.
These auxiliary attributes also make use of namespacing, where namespaces are similar to file paths and are separated by ‘/’ characters.
Additionally the auxiliary groups are typically managed such that they remain consistent across all of the nodes and have metadata queryable from the BaseMacroStateNetwork object. In contrast user defined node attributes are not restricted to this structure.
The auxiliary groups are:
‘_groups’ :: used to mark nodes as belonging to a higher level group.
- ‘_observables’ :: used for scalar values that are calculated
from the underlying microstate structures. As opposed to more operational values describing the network itself. By virtue of being scalar these are also compatible with output to tabular formats.
Edge values are simply 2-tuples of node_ids where the first value is the source and the second value is the target. Edges have the following attributes following initialization:
- ‘weighted_counts’ :: The weighted sum of all the transitions
for an edge. This is a floating point number.
- ‘unweighted_counts’ :: The unweighted sum of all the
transitions for an edge, this is a normal count and is a whole integer.
- ‘all_transition’ :: This is an array of floats of the weight
for every individual transition for an edge. This is useful for doing more advanced statistics for a given edge.
A network object can be used as a stateful container for calculated values over the nodes and edges and has methods to support this. However, there is no standard way to serialize this data beyond the generic python techniques like pickle.
Create a network of macrostates from the simulation microstates using a field in the trajectory data or precomputed assignments.
Either ‘assg_field_key’ or ‘assignments’ must be given, but not both.
The ‘transition_lag_time’ is default set to 2, which is the natural connection between microstates. The lag time can be increased to vary the kinetic accuracy of transition probabilities generated through Markov State Modelling.
The ‘transition_lag_time’ must be given as an integer greater than 1.
- Parameters:
contig_tree (ContigTree object)
assg_field_key (str, conditionally optional on 'assignments') – The field in the WepyHDF5 dataset you want to generate macrostates for.
assignments (list of list of array_like of dim (n_traj_frames, observable_shape[0], ...),) –
conditionally optional on ‘assg_field_key’
List of assignments for all frames in each run, where each element of the outer list is for a run, the elements of these lists are lists for each trajectory which are arraylikes of shape (n_traj, observable_shape[0], …).
See Also
- ASSIGNMENTS = 'assignments'¶
Key for the microstates that are assigned to a macrostate.
- _assignments_init(assignments)[source]¶
Given the assignments structure sets up the other necessary structures.
- Parameters:
assignments (list of list of array_like of dim (n_traj_frames, observable_shape[0], ...),) –
conditionally optional on ‘assg_field_key’
List of assignments for all frames in each run, where each element of the outer list is for a run, the elements of these lists are lists for each trajectory which are arraylikes of shape (n_traj, observable_shape[0], …).
- _init_transition_counts(contig_tree, transition_lag_time)[source]¶
Given the lag time get the transitions between microstates for the network using the sliding windows algorithm.
This will create a directed edge between nodes that had at least one transition, no matter the weight.
See the main class docstring for a description of the fields.
contig_tree should be unopened.
- node_id_to_idx(assg_key)[source]¶
Convert a node_id (which is the assignment value) to a canonical index.
- Parameters:
assg_key (node_id)
- Returns:
node_idx
- Return type:
- node_idx_to_id(node_idx)[source]¶
Convert a node index to its node id.
- Parameters:
node_idx (int)
- Returns:
node_id
- Return type:
node_id
- property graph¶
The networkx.DiGraph of the macrostate network.
- property num_states¶
The number of states in the network.
- property node_ids¶
A list of the node_ids.
- property contig_tree¶
The underlying ContigTree
- property assg_field_key¶
The string key of the field used to make macro states from the WepyHDF5 dataset.
- Raises:
MacroStateNetworkError – If this wasn’t used to construct the MacroStateNetwork.
- get_node_attributes(node_id)[source]¶
Returns the node attributes of the macrostate.
- Parameters:
node_id (node_id)
- Returns:
macrostate_attrs
- Return type:
- get_node_attribute(node_id, attribute_key)[source]¶
Return the value for a specific node and attribute.
- Parameters:
node_id (node_id)
attribute_key (str)
- Return type:
node_attribute
- node_assignments(node_id)[source]¶
Return the microstates assigned to this macrostate as a run trace.
- Parameters:
node_id (node_id)
- Returns:
node_assignments – Run trace of the nodes assigned to this macrostate.
- Return type:
list of tuples of ints (run_idx, traj_idx, cycle_idx)
- set_nodes_attribute(key, values_dict)[source]¶
Set node attributes for the key and values for each node.
- property node_groups¶
- property observables¶
The list of available observables.
- get_edge_attributes(edge_id)[source]¶
Returns the edge attributes of the macrostate.
- Parameters:
edge_id (edge_id)
- Returns:
edge_attrs
- Return type:
- get_edge_attribute(edge_id, attribute_key)[source]¶
Return the value for a specific edge and attribute.
- Parameters:
edge_id (edge_id)
attribute_key (str)
- Return type:
edge_attribute
- property layouts¶
- write_gexf(filepath, exclude_node_fields=None, exclude_edge_fields=None, layout=None)[source]¶
Writes a graph file in the gexf format of the network.
- Parameters:
filepath (str)
- nodes_to_dataframe(extra_attributes=('_observables/total_weight',))[source]¶
Make a dataframe of the nodes and their attributes.
Not all attributes will be added as they are not relevant to a table style representation anyhow.
The columns will be:
node_id
node_idx
num samples
groups (as booleans) which is anything in the ‘_groups’ namespace
observables : anything in the ‘_observables’ namespace and will assume to be scalars
And anything in the ‘extra_attributes’ argument.
- edges_to_records(extra_attributes=None)[source]¶
Make a dataframe of the nodes and their attributes.
Not all attributes will be added as they are not relevant to a table style representation anyhow.
The columns will be:
edge_id
source
target
weighted_counts
unweighted_counts
- edges_to_dataframe(extra_attributes=None)[source]¶
Make a dataframe of the nodes and their attributes.
Not all attributes will be added as they are not relevant to a table style representation anyhow.
The columns will be:
edge_id
source
target
weighted_counts
unweighted_counts
- node_map(func, map_func=<class 'map'>)[source]¶
Map a function over the nodes.
The function should take as its first argument a node_id and the second argument a dictionary of the node attributes. This will not give access to the underlying trajectory data in the HDF5, to do this use the ‘node_fields_map’ function.
Extra args not supported use ‘functools.partial’ to make functions with arguments for all data.
- Parameters:
func (callable) – The function to map over the nodes.
map_func (callable) – The mapping function, implementing the map interface
- Returns:
node_values – The mapping of node_ids to the values computed by the mapped func.
- Return type:
dict of node_id : values
- edge_attribute_to_matrix(attribute_key, fill_value=nan)[source]¶
Convert scalar edge attributes to an assymetric matrix.
This will always return matrices of size (num_nodes, num_nodes).
Additionally, matrices for the same network will always have the same indexing, which is according to the ‘node_idx’ attribute of each node.
For example if you have a matrix like:
>>> msn = MacroStateNetwork(...) >>> mat = msn.edge_attribute_to_matrix('unweighted_counts')
Then, for example, the node with node_id of ‘10’ having a ‘node_idx’ of 0 will always be the first element for each dimension. Using this example the self edge ‘10’->’10’ can be accessed from the matrix like:
>>> mat[0,0]
For another node (‘node_id’ ‘25’) having ‘node_idx’ 4, we can get the edge from ‘10’->’25’ like:
>>> mat[0,4]
This is because ‘node_id’ does not necessarily have to be an integer, and even if they are integers they don’t necessarily have to be a contiguous range from 0 to N.
To get the ‘node_id’ for a ‘node_idx’ use the method ‘node_idx_to_id’.
>>> msn.node_idx_to_id(0) === 10
- Parameters:
attribute_key (str) – The key of the edge attribute the matrix should be made of.
fill_value (Any) – The value to put in the array for non-existent edges. Must be a numpy dtype compatible with the dtype of the attribute value.
- Returns:
edge_matrix – Assymetric matrix of dim (n_macrostates, n_macrostates). The 0-th axis corresponds to the ‘source’ node and the 1-st axis corresponds to the ‘target’ nodes, i.e. the dimensions mean: (source, target).
- Return type:
numpy.ndarray
- class wepy.analysis.network.MacroStateNetwork(contig_tree, base_network=None, assg_field_key=None, assignments=None, transition_lag_time=2)[source]¶
Bases:
object
Provides an abstraction over weighted ensemble data in the form of a kinetically connected network.
The MacroStateNetwork refers to any grouping of the so called “micro” states that were observed during simulation, i.e. trajectory frames, and not necessarily in the usual sense used in statistical mechanics. Although it is the perfect vehicle for working with such macrostates.
Because walker trajectories in weighted ensemble there is a natural way to generate the edges between the macrostate nodes in the network. These edges are determined automatically and a lag time can also be specified, which is useful in the creation of Markov State Models.
This class provides transparent access to an underlying ‘WepyHDF5’ dataset. If you wish to have a simple serializable network that does not reference see the ‘BaseMacroStateNetwork’ class, which you can construct standalone or access the instance attached as the ‘base_network’ attribute of an object of this class.
For a description of all of the default node and edge attributes which are set after construction see the docstring for the ‘BaseMacroStateNetwork’ class docstring.
Warning
This class is not serializable as it references a ‘WepyHDF5’ object. Either construct a ‘BaseMacroStateNetwork’ or use the attached instance in the ‘base_network’ attribute.
For documentation of the following arguments see the constructor docstring of the ‘BaseMacroStateNetwork’ class:
contig_tree
assg_field_key
assignments
transition_lag_time
The other arguments are documented here. This is primarily optional ‘base_network’ argument. This is a ‘BaseMacroStateNetwork’ instance, which allows you to associate it with a ‘WepyHDF5’ dataset for access to the microstate data etc.
- Parameters:
base_network (BaseMacroStateNetwork object) – An already constructed network, which will avoid recomputing all in-memory network values again for this object.
- property graph¶
The networkx.DiGraph of the macrostate network.
- property num_states¶
The number of states in the network.
- property node_ids¶
A list of the node_ids.
- property assg_field_key¶
The string key of the field used to make macro states from the WepyHDF5 dataset.
- Raises:
MacroStateNetworkError – If this wasn’t used to construct the MacroStateNetwork.
- property base_network¶
- property wepy_h5¶
The WepyHDF5 source object for which the contig tree is being constructed.
- state_to_mdtraj(node_id, alt_rep=None)[source]¶
Generate an mdtraj.Trajectory object from a macrostate.
By default uses the “main_rep” in the WepyHDF5 object. Alternative representations of the topology can be specified.
- Parameters:
node_id (node_id)
alt_rep (str) – (Default value = None)
- Returns:
traj
- Return type:
mdtraj.Trajectory
- get_node_fields(node_id, fields)[source]¶
Return the trajectory fields for all the microstates in the specified macrostate.
- iter_nodes_fields(fields)[source]¶
Iterate over all nodes and return the field values for all the microstates for each.
- microstate_weights()[source]¶
Returns the weights of each microstate on the basis of macrostates.
- Returns:
microstate_weights
- Return type:
dict of node_id: ndarray
- macrostate_weights()[source]¶
Compute the total weight of each macrostate.
- Returns:
macrostate_weights
- Return type:
dict of node_id: float
- set_macrostate_weights()[source]¶
Compute the macrostate weights and set them as node attributes ‘total_weight’.
- node_fields_map(func, fields, map_func=<class 'map'>)[source]¶
Map a function over the nodes and microstate fields.
The function should take as its arguments:
node_id
dictionary of all the node attributes
3. fields dictionary mapping traj field names. (The output of MacroStateNetwork.get_node_fields)
This will give access to the underlying trajectory data in the HDF5 which can be requested with the fields argument. The behaviour is very similar to the WepyHDF5.compute_observable function with the added input data to the mapped function being all of the macrostate node attributes.
Extra args not supported use ‘functools.partial’ to make functions with arguments for all data.
- Parameters:
func (callable) – The function to map over the nodes.
fields (iterable of str) – The microstate (trajectory) fields to provide to the mapped function.
map_func (callable) – The mapping function, implementing the map interface
- Returns:
node_values – The mapping of node_ids to the values computed by the mapped func.
- Return type:
dict of node_id : values
- Returns:
node_values – Dictionary mapping nodes to the computed values from the mapped function.
- Return type:
dict of node_id : values