wepy.reporter.hdf5 module

class wepy.reporter.hdf5.WepyHDF5Reporter(save_fields=None, topology=None, units=None, sparse_fields=None, feature_shapes=None, feature_dtypes=None, n_dims=None, main_rep_idxs=None, all_atoms_rep_freq=None, alt_reps=None, resampler=None, boundary_conditions=None, resampling_fields=None, decision_enum_dict=None, resampler_fields=None, warping_fields=None, progress_fields=None, bc_fields=None, resampling_records=None, resampler_records=None, warping_records=None, bc_records=None, progress_records=None, swmr_mode=False, **kwargs)[source]

Bases: wepy.reporter.reporter.FileReporter

Reporter for generating an HDF5 format (WepyHDF5) data file from simulations.

This is the most important reporter as it is the principle output format for storing weighted ensemble simulation data.

Files generated with this reporter can be opened using the wepy.hdf5.WepyHDF5 class.

Constructor for the WepyHDF5Reporter.

Parameters
  • save_fields (tuple of str, default: None) – A selection of fields from the walker states to be stored. Allows for the ignoring of some states. If None all fields from states will attempted to be saved.

  • topology (str) – JSON string representing topology of system being simulated.

  • units (dict of str: str, optional) – Mapping of trajectory field names to string specs for units.

  • sparse_fields (list of str, optional) – List of trajectory fields that should be initialized as sparse.

  • feature_shapes (dict of str: shape_spec, optional) – Mapping of trajectory fields to their shape spec for initialization.

  • feature_dtypes (dict of str: dtype_spec, optional) – Mapping of trajectory fields to their shape spec for initialization.

  • n_dims (int, default: 3) – Set the number of spatial dimensions for the default positions trajectory field.

  • alt_reps (dict of str: tuple of (list of int, int), optional) – Specifies that there will be ‘alt_reps’ of positions each named by the keys of this mapping and containing the indices in each value list as the first value of the tuple and the second value being the frequency at which this field gets saved. Setting all_atoms_rep_freq is the equivalent of setting an entry {‘all_atoms’ : ([…], all_atoms_rep_freq)}.

  • main_rep_idxs (list of int, optional) – The indices of atom positions to save as the main ‘positions’ trajectory field. Defaults to all atoms.

  • all_atoms_rep_freq (int, optional) – The frequency at which to set an ‘alt_rep’ for all of the atoms in a simulation. Will be set as the field ‘alt_rep/all_atoms’.

  • resampler (Resampler object, optional but recommended) – The resampler being used for the simulation. Is used as a convenient container for a variety of constants needed for specifying data for the resampling records. If this is not given then these of the Other Parameters below must be specified manually: resampling_fields, decision_enum_dict, resampler_fields, resampling_records, resampler_records.

  • boundary_conditions (BoundaryConditions object, optional but recommended) – The boundary conditions being used for the simulation. Is used as a convenient container for a variety of constants needed for specifying data for the warping and progress records. If this is not given then these of the Other Parameters below must be specified manually: warping_fields, progress_fields, bc_fields, warping_records, bc_records, progress_records

  • swmr_mode (bool) – Whether to write to open the HDF5 in single-writer multi-reader (SWMR) mode.

Other Parameters
  • resampling_fields (list of str) – The names of the fields for resampling records

  • decision_enum_dict (dict of str : int) – Mapping of the names of resampling decision enum to their integer values.

  • resampler_fields (list of str) – The names of the fields for the resampler records

  • warping_fields (list of str) – The names of the fields for the warping records

  • progress_fields (list of str) – The names of the fields for the progress records

  • bc_fields (list of str) – The names of the fields for the bounadry condition records.

  • resampling_records (list of str, optional) – Names of the resampling_fields that will be used in table-like views.

  • resampler_records (list of str, optional) – Names of the resampler_fields that will be used in table-like views.

  • warping_records (list of str, optional) – Names of the warping_fields that will be used in table-like views.

  • bc_records (list of str, optional) – Names of the bc_fields that will be used in table-like views.

  • progress_records (list of str, optional) – Names of the progress_fields that will be used in table-like views.

ALL_ATOMS_REP_KEY = 'all_atoms'
FILE_ORDER = ('wepy_hdf5_path',)

Specify an ordering of file paths. Should be customized.

SUGGESTED_EXTENSIONS = ('wepy.h5',)

Suggested extensions for file paths for use with the automatic reparametrization feature. Should be customized.

DEFAULT_MODE = 'x'

The default mode to set for opening files if none is specified (create if doesn’t exist, fail if it does.)

DEFAULT_SUGGESTED_EXTENSION = 'report'

The default file extension used for files during dynamic reparametrization, if none is specified

MODES = ('x', 'w', 'w-', 'r', 'r+')

Valid modes accepted for files.

SUGGESTED_FILENAME_TEMPLATE = '{config}{narration}{reporter_class}.{ext}'

Template to use for dynamic reparametrization of file path names.

The fields in the template are:

config : indicator of the runtime configuration used

narration : freeform description of the instance

reporter_classthe name of the class that produced the

output. When no specific name is given for a file report generated from a reporter this is used to disambiguate, along with the extension.

ext : The file extension, for multiple files produced from one reporter this should be sufficient to disambiguate the files.

The ‘config’ and ‘narration’ should be the same across all reporters in the same simulation manager, and the ‘narration’ is considered optional.

_validate_mode(mode)

Check if the mode spec is a valid one.

Parameters

mode (str) –

Returns

valid

Return type

bool

property file_path

For single file path reporters the file path to that file spec.

property file_paths

The file paths for this reporter, in order.

property mode

For single file path reporters the mode of that file.

property modes

The modes for the files, in order.

reparametrize(file_paths, modes)

Set the file paths and modes for all files in the reporter.

Parameters
  • file_paths (list of str) – New file paths for each file, in order.

  • modes (list of str) – New modes for each file, in order.

set_mode(file_idx, mode)

Set the mode for a single indexed file.

Parameters
  • file_idx (int) – Index in the listing of files.

  • mode (str) – The new mode spec.

set_path(file_idx, path)

Set the path for a single indexed file.

Parameters
  • file_idx (int) – Index in the listing of files.

  • path (str) – The new path to set for this file

init(continue_run=None, init_walkers=None, **kwargs)[source]

Initialization routines for the reporter at simulation runtime.

Initialize I/O connections including file descriptors, database connections, timers, stdout/stderr etc.

Void method for reporter base class.

Reporters can expect to have the following key word arguments passed to them during a simulation by the sim_manager in this call.

Parameters
  • init_walkers (list of Walker objects) – The initial walkers for the simulation.

  • runner (Runner object) – The runner that will be used in the simulation.

  • resampler (Resampler object) – The resampler that will be used in the simulation.

  • boundary_conditions (BoundaryConditions object) – The boundary conditions taht will be used in the simulation.

  • work_mapper (WorkMapper object) – The work mapper that will be used in the simulation.

  • reporters (list of Reporter objects) – The list of reporters that are in the simulation.

  • continue_run (int) – The index of the run that is being continued within this same file.

cleanup(**kwargs)[source]

Teardown routines for the reporter at the end of the simulation.

Use to cleanly and safely close I/O connections or other cleanup I/O.

Use to close file descriptors, database connections etc.

Reporters can expect to have the following key word arguments passed to them during a simulation by the sim_manager.

Parameters
  • runner (Runner object) – The runner at the end of the simulation

  • work_mapper (WorkeMapper object) – The work mapper at the end of the simulation

  • resampler (Resampler object) – The resampler at the end of the simulation

  • boundary_conditions (BoundaryConditions object) – The boundary conditions at the end of the simulation

  • reporters (list of Reporter objects) – The list of reporters at the end of the simulation

report(new_walkers=None, cycle_idx=None, warp_data=None, bc_data=None, progress_data=None, resampling_data=None, resampler_data=None, **kwargs)[source]

Given data concerning the main simulation components state, perform I/O operations to persist that data.

Void method for reporter base class.

Reporters can expect to have the following key word arguments passed to them during a simulation by the sim_manager.

Parameters
  • cycle_idx (int) –

  • new_walkers (list of Walker objects) – List of walkers that were produced from running their dynamics by the runner.

  • warp_data (list of dict of str : value) – List of dict-like records for each warping event from the last cycle.

  • bc_data (list of dict of str : value) – List of dict-like records specifying the changes to the state of the boundary conditions in the last cycle.

  • progress_data (dict str : list) – A record indicating the progress values for each walker in the last cycle.

  • resampling_data (list of dict of str : value) – List of records specifying the resampling to occur at this cycle.

  • resampler_data (list of dict of str : value) – List of records specifying the changes to the state of the resampler in the last cycle.

  • n_segment_steps (int) – The number of dynamics steps that were completed in the last cycle

  • worker_segment_times (dict of int : list of float) – Mapping worker index to the times they took for each segment they processed.

  • cycle_runner_time (float) – Total time runner took in last cycle.

  • cycle_bc_time (float) – Total time boundary conditions took in last cycle.

  • cycle_resampling_time (float) – Total time resampler took in last cycle.

  • resampled_walkers (list of Walker objects) – List of walkers that were produced from the new_walkers from applying resampling and boundary conditions.

_report_warping(cycle_idx, warping_data)[source]

Method to write warping specific information.

Parameters
  • cycle_idx (int) –

  • warp_data (list of dict of str : value) – List of dict-like records for each warping event from the last cycle.

_report_bc(cycle_idx, bc_data)[source]

Method to write boundary condition update specific information.

Parameters
  • cycle_idx (int) –

  • bc_data (list of dict of str : value) – List of dict-like records specifying the changes to the state of the boundary conditions in the last cycle.

_report_resampler(cycle_idx, resampler_data)[source]

Method to write resampler update specific information.

Parameters
  • cycle_idx (int) –

  • resampler_data (list of dict of str : value) – List of records specifying the changes to the state of the resampler in the last cycle.

_report_resampling(cycle_idx, resampling_data)[source]

Method to write resampling specific information.

Parameters
  • cycle_idx (int) –

  • resampling_data (list of dict of str : value) – List of records specifying the resampling to occur at this cycle.

_report_progress(cycle_idx, progress_data)[source]

Method to write progress specific information.

Parameters
  • cycle_idx (int) –

  • progress_data (dict str : list) – A record indicating the progress values for each walker in the last cycle.