Analysis

Analysis back-ends

Cross correlation

functions

Helper functions. Uses tslearn.cycc

mesmerize.analysis.math.cross_correlation.ncc_c(x: numpy.ndarray, y: numpy.ndarray) → numpy.ndarray[source]

Must pass 1D array to both x and y

Parameters:
  • x – Input array [x1, x2, x3, … xn]
  • y – Input array [y2, y2, x3, … yn]
Returns:

Returns the normalized cross correlation function (as an array) of the two input vector arguments “x” and “y”

Return type:

np.ndarray

mesmerize.analysis.math.cross_correlation.get_omega(x: numpy.ndarray = None, y: numpy.ndarray = None, cc: numpy.ndarray = None) → int[source]

Must pass a 1D array to either both “x” and “y” or a cross-correlation function (as an array) to “cc”

Parameters:
  • x – Input array [x1, x2, x3, … xn]
  • y – Input array [y2, y2, x3, … yn]
  • cc – cross-correlation function represented as an array [c1, c2, c3, … cn]
Returns:

index (x-axis position) of the global maxima of the cross-correlation function

Return type:

np.ndarray

mesmerize.analysis.math.cross_correlation.get_lag(x: numpy.ndarray = None, y: numpy.ndarray = None, cc: numpy.ndarray = None) → float[source]

Must pass a 1D array to either both “x” and “y” or a cross-correlation function (as an array) to “cc”

Parameters:
  • x – Input array [x1, x2, x3, … xn]
  • y – Input array [y2, y2, x3, … yn]
  • cc – cross-correlation function represented as a array [c1, c2, c3, … cn]
Returns:

Position of the maxima of the cross-correlation function with respect to middle point of the cross-correlation function

Return type:

np.ndarray

mesmerize.analysis.math.cross_correlation.get_epsilon(x: numpy.ndarray = None, y: numpy.ndarray = None, cc: numpy.ndarray = None) → float[source]

Must pass a 1D vector to either both “x” and “y” or a cross-correlation function to “cc”

Parameters:
  • x – Input array [x1, x2, x3, … xn]
  • y – Input array [y2, y2, x3, … yn]
  • cc – cross-correlation function represented as an array [c1, c2, c3, … cn]
Returns:

Magnitude of the global maxima of the cross-correlationn function

Return type:

np.ndarray

mesmerize.analysis.math.cross_correlation.get_lag_matrix(curves: numpy.ndarray = None, ccs: numpy.ndarray = None) → numpy.ndarray[source]

Get a 2D matrix of lags. Can pass either a 2D array of 1D curves or cross-correlations

Parameters:
  • curves – 2D array of 1D curves
  • ccs – 2D array of 1D cross-correlation functions represented by arrays
Returns:

2D matrix of lag values, shape is [n_curves, n_curves]

Return type:

np.ndarray

mesmerize.analysis.math.cross_correlation.get_epsilon_matrix(curves: numpy.ndarray = None, ccs: numpy.ndarray = None) → numpy.ndarray[source]

Get a 2D matrix of maximas. Can pass either a 2D array of 1D curves or cross-correlations

Parameters:
  • curves – 2D array of 1D curves
  • ccs – 2D array of 1D cross-correlation functions represented by arrays
Returns:

2D matrix of maxima values, shape is [n_curves, n_curves]

Return type:

np.ndarray

mesmerize.analysis.math.cross_correlation.compute_cc_data(curves: numpy.ndarray) → mesmerize.analysis.math.cross_correlation.CC_Data[source]

Compute cross-correlation data (cc functions, lag and maxima matrices)

Parameters:curves – input curves as a 2D array, shape is [n_samples, curve_size]
Returns:cross correlation data for the input curves as a CC_Data instance
Return type:CC_Data
mesmerize.analysis.math.cross_correlation.compute_ccs(a: numpy.ndarray) → numpy.ndarray[source]

Compute cross-correlations between all 1D curves in a 2D input array

Parameters:a – 2D input array of 1D curves, shape is [n_samples, curve_size]
Return type:np.ndarray

CC_Data

Data container

Warning

All arguments MUST be numpy.ndarray type for CC_Data for the save to be saveable as an hdf5 file. Set numpy.unicode as the dtype for the curve_uuids and labels arrays. If the dtype is 'O' (object) the to_hdf5() method will fail.

class mesmerize.analysis.math.cross_correlation.CC_Data(ccs: numpy.ndarray = None, lag_matrix: numpy.ndarray = None, epsilon_matrix: numpy.ndarray = None, curve_uuids: numpy.ndarray = None, labels: numpy.ndarray = None)[source]
__init__(ccs: numpy.ndarray = None, lag_matrix: numpy.ndarray = None, epsilon_matrix: numpy.ndarray = None, curve_uuids: numpy.ndarray = None, labels: numpy.ndarray = None)[source]

Object for organizing cross-correlation data

types must be numpy.ndarray to be compatible with hdf5

Parameters:
  • ccs (np.ndarray) – array of cross-correlation functions, shape: [n_curves, n_curves, func_length]
  • lag_matrix (np.ndarray) – the lag matrix, shape: [n_curves, n_curves]
  • epsilon_matrix (np.ndarray) – the maxima matrix, shape: [n_curves, n_curves]
  • curve_uuids (np.ndarray) – uuids (str representation) for each of the curves, length: n_curves
  • labels (np.ndarray) – labels for each curve, length: n_curves
ccs = None

array of cross-correlation functions

lag_matrix = None

lag matrix

curve_uuids = None

uuids for each curve

labels = None

labels for each curve

get_threshold_matrix(matrix_type: str, lag_thr: float, max_thr: float, lag_thr_abs: bool = True) → numpy.ndarray[source]

Get lag or maxima matrix with thresholds applied. Values outside the threshold are set to NaN

Parameters:
  • matrix_type – one of ‘lag’ or ‘maxima’
  • lag_thr – lag threshold
  • max_thr – maxima threshold
  • lag_thr_abs – threshold with the absolute value of lag
Returns:

the requested matrix with the thresholds applied to it.

Return type:

np.ndarray

classmethod from_dict(d: dict)[source]

Load data from a dict

to_hdf5(path: str)[source]

Save as an HDF5 file

Parameters:path – path to save the hdf5 file to, file must not exist.
classmethod from_hdf5(path: str)[source]

Load cross-correlation data from an hdf5 file

Parameters:path – path to the hdf5 file

Clustering metrics

mesmerize.analysis.clustering_metrics.get_centerlike(cluster_members: numpy.ndarray, metric: Union[str, callable, None] = None, dist_matrix: Optional[numpy.ndarray] = None) → Tuple[numpy.ndarray, int][source]

Finds the 1D time-series within a cluster that is the most centerlike

Parameters:
  • cluster_members – 2D numpy array in the form [n_samples, 1D time_series]
  • metric – Metric to use for pairwise distance calculation, simply passed to sklearn.metrics.pairwise_distances
  • dist_matrix – Distance matrix of the cluster members
Returns:

The cluster member which is most centerlike, and its index in the cluster_members array

mesmerize.analysis.clustering_metrics.get_cluster_radius(cluster_members: numpy.ndarray, metric: Union[str, callable, None] = None, dist_matrix: Optional[numpy.ndarray] = None, centerlike_index: Optional[int] = None) → float[source]

Returns the cluster radius according to chosen distance metric

Parameters:
  • cluster_members – 2D numpy array in the form [n_samples, 1D time_series]
  • metric – Metric to use for pairwise distance calculation, simply passed to sklearn.metrics.pairwise_distances
  • dist_matrix – Distance matrix of the cluster members
  • centerlike_index – Index of the centerlike cluster member within the cluster_members array
Returns:

The cluster radius, average between the most centerlike member and all other members

mesmerize.analysis.clustering_metrics.davies_bouldin_score(data: numpy.ndarray, cluster_labels: numpy.ndarray, metric: Union[str, callable]) → float[source]

Adopted from sklearn.metrics.davies_bouldin_score to use any distance metric

Parameters:
  • data – Data that was used for clustering, [n_samples, 1D time_series]
  • metric – Metric to use for pairwise distance calculation, simply passed to sklearn.metrics.pairwise_distances
  • cluster_labels – Cluster labels
Returns:

Davies Bouldin Score using EMD