wepy.resampling.resamplers.revo module

class wepy.resampling.resamplers.revo.REVOResampler(merge_dist=None, char_dist=None, distance=None, init_state=None, weights=True, pmin=1e-12, pmax=0.1, dist_exponent=4, seed=None, **kwargs)[source]

Bases: wepy.resampling.resamplers.clone_merge.CloneMergeResampler

Resampler implementing the REVO algorithm.

You can find more detailed information in the paper “REVO: Resampling of ensembles by variation optimization” but briefly:

REVO is a Weighted Ensemble based enhanced sampling algorithm which uses cloning and merging to create ensembles of diverse trajectories without defining any regions. It instead optimizes a measure of “variation” that depends on the pairwise distances between the walkers and their weights.

REVO solves this optimization problem using a greedy algorithm which at each step selects best walkers for resampling operations (cloning and merging) in order to maximize the “trajectory variation”.

The trajectory variation is defined as

\[V = \sum_{i} V_i = \sum_i \sum_{j}(\frac{d_{ij}}{d_0}) ^{\alpha}\phi_i\phi_j\]

where

\(V_i\) : the trajectory variation value of walker i

\(d_{ij}\) : the distance between walker i and j according the distance metric

\(\alpha\) : modulates the influence of the distances in the variation calculation

\(d_0\) : the characteristic distance and is used to make the equation unitless.

\(\phi\) : is a non-negative function which is a measure of the relative importance of the walker and is referred to as a “novelty function”. Here it is a function of a walker’s weight.

Furthermore REVO needs the following parameters:

pmin: the minimum statistical weight. REVO does not clone walkers with a weight less than pmin.

pmax: The maximum statistical weight. It prevents the accumulation of too much weight in one walker.

merge_dist: This is the merge-distance threshold. The distance

between merged walkers should be less than this value.

The resample function, called during every cycle, takes the ensemble of walkers and performs the follow steps:

  • Calculate the pairwise all-to-all distance matrix using the distance metric

  • Decides which walkers should be merged or cloned

  • Applies the cloning and merging decisions to get the resampled walkers

  • Creates the resampling data that includes

  • distance_matrix : the calculated all-to-all distance matrix

    • n_walkers : the number of walkers. number of walkers is kept constant thought the resampling.

    • variation : the final value of trajectory variation

    • images : the images of walkers that is defined by the distance object

    • image_shape : the shape of the image

The algorithm saves the records of cloning and merging information in resampling data.

Only the net clones and merges are recorded in the resampling records.

Constructor for the REVO Resampler.

Parameters
  • dist_exponent (int) – The distance exponent that modifies distance and weight novelty relative to each other in the variation equation.

  • merge_dist (float) – The merge distance threshold. Units should be the same as the distance metric.

  • char_dist (float) – The characteristic distance value. It is calculated by running a single dynamic cycle and then calculating the average distance between all walkers. Units should be the same as the distance metric.

  • distance (object implementing Distance) – The distance metric to compare walkers.

  • weights (bool) – Turns off or on the weight novelty in calculating the variation equation. When weight is False, the value of the novelty function is set to 1 for all walkers.

  • init_state (WalkerState object) – Used for automatically determining the state image shape.

  • seed (None or int, optional) – The random seed. If None, the system (random) one will be used.

RESAMPLING_FIELDS = ('decision_id', 'target_idxs', 'step_idx', 'walker_idx')

String names of fields produced in this record group.

Resampling records are typically used to report on the details of how walkers are resampled for a given resampling step.

Warning

This is a critical function of many other components of the wepy framework and probably shouldn’t be altered by most developers.

Thi is where the information about cloning and merging of walkers is given. Seeing as this is a most of the value proposition of wepy as a tool getting rid of it will render most of the framework useless.

But sticking to the ‘loosely coupled, tightly integrated’ mantra you are free to modify these fields. This would be useful for implementing resampling strategies that do not follow basic cloning and merging. Just beware, that most of the lineage based analysis will be broken without implementing a new Decision class.

RESAMPLING_SHAPES = ((1,), Ellipsis, (1,), (1,))

Numpy-style shapes of all fields produced in records.

There should be the same number of elements as there are in the corresponding ‘FIELDS’ class constant.

Each entry should either be:

  1. A tuple of ints that specify the shape of the field element array.

  2. Ellipsis, indicating that the field is variable length and limited to being a rank one array (e.g. (3,) or (1,)).

  3. None, indicating that the first instance of this field will not be known until runtime. Any field that is returned by a record producing method will automatically interpreted as None if not specified here.

Note that the shapes must be tuple and not simple integers for rank-1 arrays.

Option B will result in the special h5py datatype ‘vlen’ and should not be used for large datasets for efficiency reasons.

RESAMPLING_DTYPES = (<class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>)

Specifies the numpy dtypes to be used for records.

There should be the same number of elements as there are in the corresponding ‘FIELDS’ class constant.

Each entry should either be:

  1. A numpy.dtype object.

  1. None, indicating that the first instance of this field will not be known until runtime. Any field that is returned by a record producing method will automatically interpreted as None if not specified here.

RESAMPLING_RECORD_FIELDS = ('decision_id', 'target_idxs', 'step_idx', 'walker_idx')

Optional, names of fields to be selected for truncated representation of the record group.

These entries should be strings that are previously contained in the ‘FIELDS’ class constant.

While strictly no constraints on to which fields can be added here you should only choose those fields whose features could fit into a plaintext csv or similar format.

RESAMPLER_FIELDS = ('num_walkers', 'distance_matrix', 'variation', 'image_shape', 'images')

String names of fields produced in this record group.

Resampler records are typically used to report on changes in the state of the resampler.

Notes

These fields are not critical to the proper functioning of the rest of the wepy framework and can be modified freely.

However, reporters specific to this resampler probably will make use of these records.

RESAMPLER_SHAPES = ((1,), Ellipsis, (1,), Ellipsis, Ellipsis)

Numpy-style shapes of all fields produced in records.

There should be the same number of elements as there are in the corresponding ‘FIELDS’ class constant.

Each entry should either be:

  1. A tuple of ints that specify the shape of the field element array.

  2. Ellipsis, indicating that the field is variable length and limited to being a rank one array (e.g. (3,) or (1,)).

  3. None, indicating that the first instance of this field will not be known until runtime. Any field that is returned by a record producing method will automatically interpreted as None if not specified here.

Note that the shapes must be tuple and not simple integers for rank-1 arrays.

Option B will result in the special h5py datatype ‘vlen’ and should not be used for large datasets for efficiency reasons.

RESAMPLER_DTYPES = (<class 'int'>, <class 'float'>, <class 'float'>, <class 'int'>, None)

Specifies the numpy dtypes to be used for records.

There should be the same number of elements as there are in the corresponding ‘FIELDS’ class constant.

Each entry should either be:

  1. A numpy.dtype object.

  1. None, indicating that the first instance of this field will not be known until runtime. Any field that is returned by a record producing method will automatically interpreted as None if not specified here.

RESAMPLER_RECORD_FIELDS = ('variation',)

Optional, names of fields to be selected for truncated representation of the record group.

These entries should be strings that are previously contained in the ‘FIELDS’ class constant.

While strictly no constraints on to which fields can be added here you should only choose those fields whose features could fit into a plaintext csv or similar format.

resampler_field_dtypes()[source]

Finds out the datatype of the image.

Returns

  • datatypes (tuple of datatype)

  • The type of reasampler image.

_novelty(walker_weight, num_walker_copy)[source]

Calculates the novelty fuction value.

Parameters
  • walker_weight (float) – The weight of the walker.

  • num_walker_copy (int) – The number of copies of the walker.

Returns

  • novelty (float)

  • The calcualted value of novelty for the given walker.

_calcvariation(walker_weights, num_walker_copies, distance_matrix)[source]

Calculates the variation value.

Parameters
  • walker_weights (list of float) – The weights of all walkers. The sum of all weights should be 1.0.

  • num_walker_copies (list of int) – The number of copies of each walker. 0 means the walker is not exists anymore. 1 means there is one of the this walker. >1 means it should be cloned to this number of walkers.

  • distance_matrix (list of arraylike of shape (num_walkers)) –

Returns

  • variation (float) – The calculated variation value.

  • walker_variations (arraylike of shape (num_walkers)) – The Vi value of each walker.

decide(walker_weights, num_walker_copies, distance_matrix)[source]

Optimize the trajectory variation by making decisions for resampling.

Parameters
  • walker_weights (list of flaot) – The weights of all walkers. The sum of all weights should be 1.0.

  • num_walker_copies (list of int) – The number of copies of each walker. 0 means the walker is not exists anymore. 1 means there is one of the this walker. >1 means it should be cloned to this number of walkers.

  • distance_matrix (list of arraylike of shape (num_walkers)) –

Returns

  • variation (float) – The optimized value of the trajectory variation.

  • resampling_data (list of dict of str: value) – The resampling records resulting from the decisions.

_all_to_all_distance(walkers)[source]

Calculate the pairwise all-to-all distances between walkers.

Parameters

walkers (list of walkers) –

Returns

  • distance_matrix (list of arraylike of shape (num_walkers))

  • images (list of image obeject)

resample(walkers)[source]

Resamples walkers based on REVO algorithm

Parameters

walkers (list of walkers) –

Returns

  • resampled_walkers (list of resampled_walkers)

  • resampling_data (list of dict of str: value) – The resampling records resulting from the decisions.

  • resampler_data (list of dict of str: value) – The resampler records resulting from the resampler actions.

CYCLE_DTYPES = (<class 'int'>, <class 'int'>)

Data types of the cycle fields

CYCLE_FIELDS = ('step_idx', 'walker_idx')

The fields that get added to the decision record for all resampling records. This places a record within a single destructured listing of records for a single cycle of resampling using the step and walker index.

CYCLE_RECORD_FIELDS = ('step_idx', 'walker_idx')

Optional, names of fields to be selected for truncated representation of the record group.

CYCLE_SHAPES = ((1,), (1,))

Data shapes of the cycle fields.

DEBUG_MODES = (True, False)
DECISION

alias of wepy.resampling.decisions.clone_merge.MultiCloneMergeDecision

_check_resampled_walkers(resampled_walkers)

Check constraints on resampled walkers.

Raises errors when constraints are violated.

Parameters

resampled_walkers (list of Walker objects) –

_init_walker_actions(n_walkers)

Returns a list of default resampling records for a single resampling step.

Parameters

n_walkers (int) – The number of walkers to generate records for

Returns

decision_records – A list of default decision records for one step of resampling.

Return type

list of dict of str: value

_resample_cleanup(**kwargs)

Common cleanup stuff for resamplers.

Unsets the number of walkers for this round of resampling.

_resample_init(walkers, **kwargs)

Common initialization stuff for resamplers.

Sets the number of walkers in this round of resampling.

Parameters

walkers (list of Walker objects) –

_set_resampling_num_walkers(num_walkers)

Sets the concrete number of walkers constraints given a number of walkers and the settings for max and min.

Parameters

num_walkers (int) –

_unset_resampling_num_walkers()
assign_clones(merge_groups, walker_clone_nums)

Convert two convenient data structures to a list of almost normalized resampling records.

The two data structures are merge_groups and walker_clone_nums and are convenient to make.

Each is a list with number of elements equal to the number of walkers that resampling will act on.

Each element of the merge_groups is a list-like of integers indicating the indices of the walkers that will be merged into this one (i.e. squashed). A non-empty collection indicates a KEEP_MERGE decision.

Each element of the walker_clone_nums is simply an integer specifying how many clones to make of this walker.

These data structures simply declare requirements on what the actual decision records must achieve. The actual placement of walkers in slots (indices) is unspecified and immaterial.

Parameters
  • merge_groups (list of list of int) – The specification of which walkers will be squashed and merged.

  • walker_clone_nums (list of int) – The number of clones to make for each walker.

Returns

walker_actions – List of resampling record like dictionaries. These are not completely normalized for consumption by reporters, since they don’t have the right list-like wrappers.

Return type

list of dict of str: values

debug_off()
debug_on()
property decision

The decision class for this resampler.

property is_debug_on
max_num_walkers()

” Get the max number of walkers allowed currently

property max_num_walkers_setting

The specification for the maximum number of walkers for the resampler.

min_num_walkers()

” Get the min number of walkers allowed currently

property min_num_walkers_setting

The specification for the minimum number of walkers for the resampler.

property pmax
property pmin
resampler_field_names()

Access the class level FIELDS constant for this record group.

resampler_field_shapes()

Access the class level SHAPES constant for this record group.

resampler_fields()

Returns a list of zipped field specs.

Returns

record_specs – A list of the specs for each field, a spec is a tuple of type (field_name, shape_spec, dtype_spec)

Return type

list of tuple

resampler_record_field_names()

Access the class level RECORD_FIELDS constant for this record group.

resampling_field_dtypes()

Access the class level DTYPES constant for this record group.

resampling_field_names()

Access the class level FIELDS constant for this record group.

resampling_field_shapes()

Access the class level SHAPES constant for this record group.

resampling_fields()

Returns a list of zipped field specs.

Returns

record_specs – A list of the specs for each field, a spec is a tuple of type (field_name, shape_spec, dtype_spec)

Return type

list of tuple

resampling_record_field_names()

Access the class level RECORD_FIELDS constant for this record group.

set_debug_mode(mode)
Parameters

mode