xclim.sdba package
SDBA submodule
Submodules
xclim.sdba._adjustment module
Adjustment Algorithms
This file defines the different steps, to be wrapped into the Adjustment objects.
- xclim.sdba._adjustment._extremes_train_1d(ref, hist, ref_params, *, q_thresh, cluster_thresh, dist, N)[source]
Train for method ExtremeValues, only for 1D input along time.
- xclim.sdba._adjustment._fit_cluster_and_cdf(data, thresh, dist, cluster_thresh)[source]
Fit 1D cluster maximums and immediately compute CDF.
- xclim.sdba._adjustment._fit_on_cluster(data, thresh, dist, cluster_thresh)[source]
Extract clusters on 1D data and fit “dist” on the maximums.
- xclim.sdba._adjustment.npdf_transform(ds, **kwargs)[source]
N-pdf transform : Iterative univariate adjustment in random rotated spaces.
- Parameters:
ds (xr.Dataset) –
- Dataset variables:
ref : Reference multivariate timeseries hist : simulated timeseries on the reference period sim : Simulated timeseries on the projected period. rot_matrices : Random rotation matrices.
**kwargs – pts_dim : multivariate dimension name base : Adjustment class base_kws : Kwargs for initialising the adjustment object adj_kws : Kwargs of the adjust call n_escore : Number of elements to include in the e_score test (0 for all, < 0 to skip)
- Return type:
Dataset
- Returns:
xr.Dataset – Dataset with scenh, scens and escores DataArrays, where scenh and scens are hist and sim respectively after adjustment according to ref. If n_escore is negative, escores will be filled with NaNs.
xclim.sdba._processing module
Compute Functions Submodule
Here are defined the functions wrapped by map_blocks or map_groups. The user-facing, metadata-handling functions should be defined in processing.py.
xclim.sdba.adjustment module
Adjustment Methods
- class xclim.sdba.adjustment.BaseAdjustment(*args, _trained=False, **kwargs)[source]
Bases:
ParametrizableWithDataset
Base class for adjustment objects.
Children classes should implement the train and / or the adjust method.
This base class defined the basic input and output checks. It should only be used for a real adjustment if neither TrainAdjust nor Adjust fit the algorithm.
- _allow_diff_calendars = True
- _attribute = '_xclim_adjustment'
- classmethod _check_inputs(*inputs, group)[source]
Raise an error if there are chunks along the main dimension.
Also raises if
BaseAdjustment._allow_diff_calendars
is False and calendars differ.
- class xclim.sdba.adjustment.DetrendedQuantileMapping(*args, _trained=False, **kwargs)[source]
Bases:
TrainAdjust
Detrended Quantile Mapping bias-adjustment.
The algorithm follows these steps, 1-3 being the ‘train’ and 4-6, the ‘adjust’ steps.
A scaling factor that would make the mean of hist match the mean of ref is computed.
ref and hist are normalized by removing the “dayofyear” mean.
Adjustment factors are computed between the quantiles of the normalized ref and hist.
sim is corrected by the scaling factor, and either normalized by “dayofyear” and detrended group-wise or directly detrended per “dayofyear”, using a linear fit (modifiable).
Values of detrended sim are matched to the corresponding quantiles of normalized hist and corrected accordingly.
The trend is put back on the result.
\[F^{-1}_{ref}\left\{F_{hist}\left[\frac{\overline{hist}\cdot sim}{\overline{sim}}\right]\right\}\frac{\overline{sim}}{\overline{hist}}\]where \(F\) is the cumulative distribution function (CDF) and \(\overline{xyz}\) is the linear trend of the data. This equation is valid for multiplicative adjustment. Based on the DQM method of [Cannon et al., 2015].
- Parameters:
Train step
nquantiles (int or 1d array of floats) – The number of quantiles to use. See
equally_spaced_nodes()
. An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles.kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning a single adjustment group along dimension “time”.Adjust step
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “nearest”.
detrend (int or BaseDetrend instance) – The method to use when detrending. If an int is passed, it is understood as a PolyDetrend (polynomial detrending) degree. Defaults to 1 (linear detrending)
extrapolation ({‘constant’, ‘nan’}) – The type of extrapolation to use. See
xclim.sdba.utils.extrapolate_qm()
for details. Defaults to “constant”.
References
Cannon, Sobie, and Murdock [2015]
- _allow_diff_calendars = False
- class xclim.sdba.adjustment.EmpiricalQuantileMapping(*args, _trained=False, **kwargs)[source]
Bases:
TrainAdjust
Empirical Quantile Mapping bias-adjustment.
Adjustment factors are computed between the quantiles of ref and sim. Values of sim are matched to the corresponding quantiles of hist and corrected accordingly.
\[F^{-1}_{ref} (F_{hist}(sim))\]where \(F\) is the cumulative distribution function (CDF) and mod stands for model data.
- Variables:
step (Adjust) –
nquantiles (int or 1d array of floats) – The number of quantiles to use. Two endpoints at 1e-6 and 1 - 1e-6 will be added. An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles.
kind ({'+', '*'}) – The adjustment kind, either additive or multiplicative. Defaults to “+”.
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning an single adjustment group along dimension “time”.step –
interp ({'nearest', 'linear', 'cubic'}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “nearset”.
extrapolation ({'constant', 'nan'}) – The type of extrapolation to use. See
xclim.sdba.utils.extrapolate_qm()
for details. Defaults to “constant”.
References
Déqué [2007]
- _allow_diff_calendars = False
- class xclim.sdba.adjustment.ExtremeValues(*args, _trained=False, **kwargs)[source]
Bases:
TrainAdjust
Adjustment correction for extreme values.
The tail of the distribution of adjusted data is corrected according to the bias between the parametric Generalized Pareto distributions of the simulated and reference data [Roy et al., 2021]. The distributions are composed of the maximal values of clusters of “large” values. With “large” values being those above cluster_thresh. Only extreme values, whose quantile within the pool of large values are above q_thresh, are re-adjusted. See Notes.
This adjustment method should be considered experimental and used with care.
- Parameters:
Train step
cluster_thresh (Quantity (str with units)) – The threshold value for defining clusters.
q_thresh (float) – The quantile of “extreme” values, [0, 1[. Defaults to 0.95.
ref_params (xr.DataArray, optional) – Distribution parameters to use instead of fitting a GenPareto distribution on ref.
Adjust step
scen (DataArray) – This is a second-order adjustment, so the adjust method needs the first-order adjusted timeseries in addition to the raw “sim”.
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “linear”.
extrapolation ({‘constant’, ‘nan’}) – The type of extrapolation to use. See
extrapolate_qm()
for details. Defaults to “constant”.frac (float) – Fraction where the cutoff happens between the original scen and the corrected one. See Notes, ]0, 1]. Defaults to 0.25.
power (float) – Shape of the correction strength, see Notes. Defaults to 1.0.
Notes
Extreme values are extracted from ref, hist and sim by finding all “clusters”, i.e. runs of consecutive values above cluster_thresh. The q_thresh`th percentile of these values is taken on `ref and hist and becomes thresh, the extreme value threshold. The maximal value of each cluster, if it exceeds that new threshold, is taken and Generalized Pareto distributions are fitted to them, for both ref and hist. The probabilities associated with each of these extremes in hist is used to find the corresponding value according to ref’s distribution. Adjustment factors are computed as the bias between those new extremes and the original ones.
In the adjust step, a Generalized Pareto distributions is fitted on the cluster-maximums of sim and it is used to associate a probability to each extreme, values over the thresh compute in the training, without the clustering. The adjustment factors are computed by interpolating the trained ones using these probabilities and the probabilities computed from hist.
Finally, the adjusted values (\(C_i\)) are mixed with the pre-adjusted ones (scen, \(D_i\)) using the following transition function:
\[V_i = C_i * \tau + D_i * (1 - \tau)\]Where \(\tau\) is a function of sim’s extreme values (unadjusted, \(S_i\)) and of arguments
frac
(\(f\)) andpower
(\(p\)):\[\tau = \left(\frac{1}{f}\frac{S - min(S)}{max(S) - min(S)}\right)^p\]Code based on an internal Matlab source and partly ib the biascorrect_extremes function of the julia package “ClimateTools.jl” [Roy et al., 2021].
Because of limitations imposed by the lazy computing nature of the dask backend, it is not possible to know the number of cluster extremes in ref and hist at the moment the output data structure is created. This is why the code tries to estimate that number and usually overestimates it. In the training dataset, this translated into a quantile dimension that is too large and variables af and px_hist are assigned NaNs on extra elements. This has no incidence on the calculations themselves but requires more memory than is useful.
References
Roy, Smith, Kelman, Nolet-Gravel, Saba, Thomet, TagBot, and Forget [2021] Roy, Rondeau-Genesse, Jalbert, and Fournier [2021]
- class xclim.sdba.adjustment.LOCI(*args, _trained=False, **kwargs)[source]
Bases:
TrainAdjust
Local Intensity Scaling (LOCI) bias-adjustment.
This bias adjustment method is designed to correct daily precipitation time series by considering wet and dry days separately [Schmidli et al., 2006].
Multiplicative adjustment factors are computed such that the mean of hist matches the mean of ref for values above a threshold.
The threshold on the training target ref is first mapped to hist by finding the quantile in hist having the same exceedance probability as thresh in ref. The adjustment factor is then given by
\[s = \frac{\left \langle ref: ref \geq t_{ref} \right\rangle - t_{ref}}{\left \langle hist : hist \geq t_{hist} \right\rangle - t_{hist}}\]In the case of precipitations, the adjustment factor is the ratio of wet-days intensity.
For an adjustment factor s, the bias-adjustment of sim is:
\[sim(t) = \max\left(t_{ref} + s \cdot (hist(t) - t_{hist}), 0\right)\]- Variables:
step (Adjust) –
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning a single adjustment group along dimension “time”.thresh (str) – The threshold in ref above which the values are scaled.
step –
interp ({'nearest', 'linear', 'cubic'}) – The interpolation method to use then interpolating the adjustment factors. Defaults to “linear”.
References
Schmidli, Frei, and Vidale [2006]
- _allow_diff_calendars = False
- class xclim.sdba.adjustment.NpdfTransform(*args, _trained=False, **kwargs)[source]
Bases:
Adjust
N-dimensional probability density function transform.
This adjustment object combines both training and adjust steps in the adjust class method.
A multivariate bias-adjustment algorithm described by Cannon [2018], as part of the MBCn algorithm, based on a color-correction algorithm described by Pitie et al. [2005].
This algorithm in itself, when used with QuantileDeltaMapping, is NOT trend-preserving. The full MBCn algorithm includes a reordering step provided here by
xclim.sdba.processing.reordering()
.See notes for an explanation of the algorithm.
- Parameters:
base (BaseAdjustment) – An univariate bias-adjustment class. This is untested for anything else than QuantileDeltaMapping.
base_kws (dict, optional) – Arguments passed to the training of the univariate adjustment.
n_escore (int) – The number of elements to send to the escore function. The default, 0, means all elements are included. Pass -1 to skip computing the escore completely. Small numbers result in less significant scores, but the execution time goes up quickly with large values.
n_iter (int) – The number of iterations to perform. Defaults to 20.
pts_dim (str) – The name of the “multivariate” dimension. Defaults to “multivar”, which is the normal case when using
xclim.sdba.base.stack_variables()
.adj_kws (dict, optional) – Dictionary of arguments to pass to the adjust method of the univariate adjustment.
rot_matrices (xr.DataArray, optional) – The rotation matrices as a 3D array (‘iterations’, <pts_dim>, <anything>), with shape (n_iter, <N>, <N>). If left empty, random rotation matrices will be automatically generated.
Notes
The historical reference (\(T\), for “target”), simulated historical (\(H\)) and simulated projected (\(S\)) datasets are constructed by stacking the timeseries of N variables together. The algorithm is broken into the following steps:
Rotate the datasets in the N-dimensional variable space with \(\mathbf{R}\), a random rotation NxN matrix.
\[\tilde{\mathbf{T}} = \mathbf{T}\mathbf{R} \ \tilde{\mathbf{H}} = \mathbf{H}\mathbf{R} \ \tilde{\mathbf{S}} = \mathbf{S}\mathbf{R}\]2. An univariate bias-adjustment \(\mathcal{F}\) is used on the rotated datasets. The adjustments are made in additive mode, for each variable \(i\).
\[\hat{\mathbf{H}}_i, \hat{\mathbf{S}}_i = \mathcal{F}\left(\tilde{\mathbf{T}}_i, \tilde{\mathbf{H}}_i, \tilde{\mathbf{S}}_i\right)\]The bias-adjusted datasets are rotated back.
\[\begin{split}\mathbf{H}' = \hat{\mathbf{H}}\mathbf{R} \\ \mathbf{S}' = \hat{\mathbf{S}}\mathbf{R}\end{split}\]These three steps are repeated a certain number of times, prescribed by argument
n_iter
. At each iteration, a new random rotation matrix is generated.The original algorithm [Pitie et al., 2005], stops the iteration when some distance score converges. Following cite:t:sdba-cannon_multivariate_2018 and the MBCn implementation in Cannon [2020], we instead fix the number of iterations.
As done by cite:t:sdba-cannon_multivariate_2018, the distance score chosen is the “Energy distance” from Szekely and Rizzo [2004]. (see:
xclim.sdba.processing.escore()
).The random matrices are generated following a method laid out by Mezzadri [2007].
This is only part of the full MBCn algorithm, see Statistical Downscaling and Bias-Adjustment for an example on how to replicate the full method with xclim. This includes a standardization of the simulated data beforehand, an initial univariate adjustment and the reordering of those adjusted series according to the rank structure of the output of this algorithm.
References
Cannon [2018], Cannon [2020], Mezzadri [2007], Pitie, Kokaram, and Dahyot [2005], Szekely and Rizzo [2004]
- class xclim.sdba.adjustment.PrincipalComponents(*args, _trained=False, **kwargs)[source]
Bases:
TrainAdjust
Principal component adjustment.
This bias-correction method maps model simulation values to the observation space through principal components [Hnilica et al., 2017]. Values in the simulation space (multiple variables, or multiple sites) can be thought of as coordinate along axes, such as variable, temperature, etc. Principal components (PC) are a linear combinations of the original variables where the coefficients are the eigenvectors of the covariance matrix. Values can then be expressed as coordinates along the PC axes. The method makes the assumption that bias-corrected values have the same coordinates along the PC axes of the observations. By converting from the observation PC space to the original space, we get bias corrected values. See Notes for a mathematical explanation.
Warning
Be aware that principal components is meant here as the algebraic operation defining a coordinate system based on the eigenvectors, not statistical principal component analysis.
- Variables:
group (Union[str, Grouper]) – The main dimension and grouping information. See Notes. See
xclim.sdba.base.Grouper
for details. The adjustment will be performed on each group independently. Default is “time”, meaning a single adjustment group along dimension “time”.best_orientation ({'simple', 'full'}) – Which method to use when searching for the best principal component orientation. See
best_pc_orientation_simple()
andbest_pc_orientation_full()
. “full” is more precise, but it is much slower.crd_dim (str) – The data dimension along which the multiple simulation space dimensions are taken. For a multivariate adjustment, this usually is “multivar”, as returned by sdba.stack_variables. For a multisite adjustment, this should be the spatial dimension. The training algorithm currently doesn’t support any chunking along either crd_dim. group.dim and group.add_dims.
Notes
The input data is understood as a set of N points in a \(M\)-dimensional space.
\(M\) is taken along crd_dim.
\(N\) is taken along the dimensions given through group : (the main dim but also, if requested, the add_dims and window).
The principal components (PC) of hist and ref are used to defined new coordinate systems, centered on their respective means. The training step creates a matrix defining the transformation from hist to ref:
\[scen = e_{R} + \mathrm{\mathbf{T}}(sim - e_{H})\]Where:
\[\mathrm{\mathbf{T}} = \mathrm{\mathbf{R}}\mathrm{\mathbf{H}}^{-1}\]\(\mathrm{\mathbf{R}}\) is the matrix transforming from the PC coordinates computed on ref to the data coordinates. Similarly, \(\mathrm{\mathbf{H}}\) is transform from the hist PC to the data coordinates (\(\mathrm{\mathbf{H}}\) is the inverse transformation). \(e_R\) and \(e_H\) are the centroids of the ref and hist distributions respectively. Upon running the adjust step, one may decide to use \(e_S\), the centroid of the sim distribution, instead of \(e_H\).
References
- class xclim.sdba.adjustment.QuantileDeltaMapping(*args, _trained=False, **kwargs)[source]
Bases:
EmpiricalQuantileMapping
Quantile Delta Mapping bias-adjustment.
Adjustment factors are computed between the quantiles of ref and hist. Quantiles of sim are matched to the corresponding quantiles of hist and corrected accordingly.
\[sim\frac{F^{-1}_{ref}\left[F_{sim}(sim)\right]}{F^{-1}_{hist}\left[F_{sim}(sim)\right]}\]where \(F\) is the cumulative distribution function (CDF). This equation is valid for multiplicative adjustment. The algorithm is based on the “QDM” method of [Cannon et al., 2015].
- Parameters:
Train step
nquantiles (int or 1d array of floats) – The number of quantiles to use. See
equally_spaced_nodes()
. An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles.kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning a single adjustment group along dimension “time”.Adjust step
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “nearest”.
extrapolation ({‘constant’, ‘nan’}) – The type of extrapolation to use. See
xclim.sdba.utils.extrapolate_qm()
for details. Defaults to “constant”.Extra diagnostics
—————–
In adjustment
quantiles (The quantile of each value of sim. The adjustment factor is interpolated using this as the “quantile” axis on ds.af.)
References
Cannon, Sobie, and Murdock [2015]
- class xclim.sdba.adjustment.Scaling(*args, _trained=False, **kwargs)[source]
Bases:
TrainAdjust
Scaling bias-adjustment.
Simple bias-adjustment method scaling variables by an additive or multiplicative factor so that the mean of hist matches the mean of ref.
- Parameters:
Train step
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning an single adjustment group along dimension “time”.kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.
Adjust step
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use then interpolating the adjustment factors. Defaults to “nearest”.
- _allow_diff_calendars = False
xclim.sdba.base module
Base Classes and Developer Tools
- class xclim.sdba.base.Grouper(group, window=1, add_dims=None)[source]
Bases:
Parametrizable
Grouper inherited class for parameterizable classes.
- ADD_DIMS = '<ADD_DIMS>'
- DIM = '<DIM>'
- PROP = '<PROP>'
- _repr_hide_params = ['dim', 'prop']
- apply(func, da, main_only=False, **kwargs)[source]
Apply a function group-wise on DataArrays.
- Parameters:
func (Callable or str) – The function to apply to the groups, either a callable or a xr.core.groupby.GroupBy method name as a string. The function will be called as func(group, dim=dims, **kwargs). See main_only for the behaviour of dims.
da (xr.DataArray or dict[str, xr.DataArray] or xr.Dataset) – The DataArray on which to apply the function. Multiple arrays can be passed through a dictionary. A dataset will be created before grouping.
main_only (bool) – Whether to call the function with the main dimension only (if True) or with all grouping dims (if False, default) (including the window and dimensions given through add_dims). The dimensions used are also written in the “group_compute_dims” attribute. If all the input arrays are missing one of the ‘add_dims’, it is silently omitted.
**kwargs – Other keyword arguments to pass to the function.
- Returns:
DataArray or Dataset – Attributes “group”, “group_window” and “group_compute_dims” are added.
If the function did not reduce the array:
The output is sorted along the main dimension.
The output is rechunked to match the chunks on the input If multiple inputs with differing chunking were given as inputs, the chunking with the smallest number of chunks is used.
If the function reduces the array:
If there is only one group, the singleton dimension is squeezed out of the output
The output is rechunked as to have only 1 chunk along the new dimension.
Notes
For the special case where a Dataset is returned, but only some of its variable where reduced by the grouping, xarray’s GroupBy.map will broadcast everything back to the ungrouped dimensions. To overcome this issue, function may add a “_group_apply_reshape” attribute set to True on the variables that should be reduced and these will be re-grouped by calling da.groupby(self.name).first().
- property freq
Format a frequency string corresponding to the group.
For use with xarray’s resampling functions.
- get_coordinate(ds=None)[source]
Return the coordinate as in the output of group.apply.
Currently, only implemented for groupings with prop == month or dayofyear. For prop == dayfofyear, a ds (Dataset or DataArray) can be passed to infer the max day of year from the available years and calendar.
- get_index(da, interp=None)[source]
Return the group index of each element along the main dimension.
- Parameters:
da (xr.DataArray or xr.Dataset) – The input array/dataset for which the group index is returned. It must have Grouper.dim as a coordinate.
interp (bool, optional) – If True, the returned index can be used for interpolation. Only value for month grouping, where integer values represent the middle of the month, all other days are linearly interpolated in between.
- Returns:
xr.DataArray – The index of each element along Grouper.dim. If Grouper.dim is time and Grouper.prop is None, a uniform array of True is returned. If Grouper.prop is a time accessor (month, dayofyear, etc), an numerical array is returned, with a special case of month and interp=True. If Grouper.dim is not time, the dim is simply returned.
- group(da=None, main_only=False, **das)[source]
Return a xr.core.groupby.GroupBy object.
More than one array can be combined to a dataset before grouping using the das kwargs. A new window dimension is added if self.window is larger than 1. If Grouper.dim is ‘time’, but ‘prop’ is None, the whole array is grouped together.
When multiple arrays are passed, some of them can be grouped along the same group as self. They are broadcast, merged to the grouping dataset and regrouped in the output.
- property prop_name
Create a significant name for the grouping.
- class xclim.sdba.base.Parametrizable[source]
Bases:
dict
Helper base class resembling a dictionary.
This object is _completely_ defined by the content of its internal dictionary, accessible through item access (self[‘attr’]) or in self.parameters. When serializing and restoring this object, only members of that internal dict are preserved. All other attributes set directly with self.attr = value will not be preserved upon serialization and restoration of the object with [json]pickle dictionary. Other variables set with self.var = data will be lost in the serialization process. This class is best serialized and restored with jsonpickle.
- _repr_hide_params = []
- property parameters
All parameters as a dictionary. Read-only.
- class xclim.sdba.base.ParametrizableWithDataset[source]
Bases:
Parametrizable
Parametrizeable class that also has a ds attribute storing a dataset.
- _attribute = '_xclim_parameters'
- xclim.sdba.base.duck_empty(dims, sizes, dtype='float64', chunks=None)[source]
Return an empty DataArray based on a numpy or dask backend, depending on the chunks argument.
- xclim.sdba.base.map_blocks(reduces=None, **outvars)[source]
Decorator for declaring functions and wrapping them into a map_blocks.
Takes care of constructing the template dataset. Dimension order is not preserved. The decorated function must always have the signature:
func(ds, **kwargs)
, where ds is a DataArray or a Dataset. It must always output a dataset matching the mapping passed to the decorator.- Parameters:
reduces (sequence of strings) – Name of the dimensions that are removed by the function.
**outvars – Mapping from variable names in the output to their new dimensions. The placeholders
Grouper.PROP
,Grouper.DIM
andGrouper.ADD_DIMS
can be used to signifygroup.prop
,``group.dim`` andgroup.add_dims
respectively. If an output keeps a dimension that another loses, that dimension name must be given inreduces
and in the list of new dimensions of the first output.
- xclim.sdba.base.map_groups(reduces=None, main_only=False, **out_vars)[source]
Decorator for declaring functions acting only on groups and wrapping them into a map_blocks.
This is the same as map_blocks but adds a call to group.apply() in the mapped func and the default value of reduces is changed.
The decorated function must have the signature:
func(ds, dim, **kwargs)
. Where ds is a DataAray or Dataset, dim is the group.dim (and add_dims). The group argument is stripped from the kwargs, but must evidently be provided in the call.- Parameters:
reduces (sequence of str) – Dimensions that are removed from the inputs by the function. Defaults to [Grouper.DIM, Grouper.ADD_DIMS] if main_only is False, and [Grouper.DIM] if main_only is True. See
map_blocks()
.main_only (bool) – Same as for
Grouper.apply()
.- **out_vars – Mapping from variable names in the output to their new dimensions.
The placeholders
Grouper.PROP
,Grouper.DIM
andGrouper.ADD_DIMS
can be used to signifygroup.prop
,``group.dim`` andgroup.add_dims
, respectively.If an output keeps a dimension that another loses, that dimension name must be given in reduces and in the list of new dimensions of the first output.
See also
- xclim.sdba.base.parse_group(func, kwargs=None, allow_only=None)[source]
Parse the kwargs given to a function to set the group arg with a Grouper object.
This function can be used as a decorator, in which case the parsing and updating of the kwargs is done at call time. It can also be called with a function from which extract the default group and kwargs to update, in which case it returns the updated kwargs.
If allow_only is given, an exception is raised when the parsed group is not within that list.
- Return type:
Callable
xclim.sdba.detrending module
Detrending Objects Utilities
- class xclim.sdba.detrending.BaseDetrend(*, group='time', kind='+', **kwargs)[source]
Bases:
ParametrizableWithDataset
Base class for detrending objects.
Defines three methods:
fit(da) : Compute trend from da and return a new _fitted_ Detrend object. detrend(da) : Return detrended array. retrend(da) : Puts trend back on da.
A fitted Detrend object is unique to the trend coordinate of the object used in fit, (usually ‘time’). The computed trend is stored in
Detrend.ds.trend
.Subclasses should implement
_get_trend_group()
or_get_trend()
. The first will be called in agroup.apply(..., main_only=True)
, and should return a single DataArray. The second allows the use of functions wrapped inmap_groups()
and should also return a single DataArray.The subclasses may reimplement
_detrend
and_retrend
.- _get_trend(da)[source]
Compute the trend along the self.group.dim as found on da.
If da is a DataArray (and has a dtype attribute), the trend is cast to have the same dtype.
Notes
This method applies _get_trend_group with self.group.
- fit(da)[source]
Extract the trend of a DataArray along a specific dimension.
Returns a new object that can be used for detrending and retrending. Fitted objects are unique to the fitted coordinate used.
- property fitted
Return whether instance is fitted.
- class xclim.sdba.detrending.LoessDetrend(group='time', kind='+', f=0.2, niter=1, d=0, weights='tricube', equal_spacing=None, skipna=True)[source]
Bases:
BaseDetrend
Detrend time series using a LOESS regression.
The fit is a piecewise linear regression. For each point, the contribution of all neighbors is weighted by a bell-shaped curve (gaussian) with parameters sigma (std). The x-coordinate of the DataArray is scaled to [0,1] before the regression is computed.
- Parameters:
group (str or Grouper) – The grouping information. See
xclim.sdba.base.Grouper
for details. The fit is performed along the group’s main dim.kind ({’’, ‘+’}*) – The way the trend is removed or added, either additive or multiplicative.
d ([0, 1]) – Order of the local regression. Only 0 and 1 currently implemented.
f (float) – Parameter controlling the span of the weights, between 0 and 1.
niter (int) – Number of robustness iterations to execute.
weights ([“tricube”, “gaussian”]) – Shape of the weighting function: “tricube” : a smooth top-hat like curve, f gives the span of non-zero values. “gaussian” : a gaussian curve, f gives the span for 95% of the values.
skipna (bool) – If True (default), missing values are not included in the loess trend computation and thus are not propagated. The output will have the same missing values as the input.
Notes
LOESS smoothing is computationally expensive. As it relies on a loop on gridpoints, it can be useful to use smaller than usual chunks. Moreover, it suffers from heavy boundary effects. As a rule of thumb, the outermost N * f/2 points should be considered dubious. (N is the number of points along each group)
- class xclim.sdba.detrending.MeanDetrend(*, group='time', kind='+', **kwargs)[source]
Bases:
BaseDetrend
Simple detrending removing only the mean from the data, quite similar to normalizing.
- class xclim.sdba.detrending.NoDetrend(*, group='time', kind='+', **kwargs)[source]
Bases:
BaseDetrend
Convenience class for polymorphism. Does nothing.
- class xclim.sdba.detrending.PolyDetrend(group='time', kind='+', degree=4, preserve_mean=False)[source]
Bases:
BaseDetrend
Detrend time series using a polynomial regression.
- Parameters:
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. The fit is performed along the group’s main dim.kind ({’’, ‘+’}*) – The way the trend is removed or added, either additive or multiplicative.
degree (int) – The order of the polynomial to fit.
preserve_mean (bool) – Whether to preserve the mean when de/re-trending. If True, the trend has its mean removed before it is used.
- class xclim.sdba.detrending.RollingMeanDetrend(group='time', kind='+', win=30, weights=None, min_periods=None)[source]
Bases:
BaseDetrend
Detrend time series using a rolling mean.
- Parameters:
group (str or Grouper) – The grouping information. See
xclim.sdba.base.Grouper
for details. The fit is performed along the group’s main dim.kind ({’’, ‘+’}*) – The way the trend is removed or added, either additive or multiplicative.
win (int) – The size of the rolling window. Units are the steps of the grouped data, which means this detrending is best use with either group=’time’ or group=’time.dayofyear’. Other grouping will have large jumps included within the windows and :py`:class:LoessDetrend might offer a better solution.
weights (sequence of floats, optional) – Sequence of length win. Defaults to None, which means a flat window.
min_periods (int, optional) – Minimum number of observations in window required to have a value, otherwise the result is NaN. See
xarray.DataArray.rolling()
. Defaults to None, which sets it equal to win. Setting both weights and this is not implemented yet.
Notes
As for the
LoessDetrend
detrending, important boundary effects are to be expected.
xclim.sdba.loess module
LOESS Smoothing Submodule
- xclim.sdba.loess._gaussian_weighting(x)[source]
Kernel function for loess with a gaussian shape.
The span f covers 95% of the gaussian.
- xclim.sdba.loess._loess_nb(x, y, f=0.5, niter=2, weight_func=CPUDispatcher(<function _tricube_weighting>), reg_func=CPUDispatcher(<function _linear_regression>), dx=0, skipna=True)[source]
1D Locally weighted regression: fits a nonparametric regression curve to a scatter plot.
The arrays x and y contain an equal number of elements; each pair (x[i], y[i]) defines a data point in the scatter plot. The function returns the estimated (smooth) values of y. Originally proposed in Cleveland [1979].
Users should call utils.loess_smoothing. See that function for the main documentation.
- Parameters:
x (np.ndarray) – X-coordinates of the points.
y (np.ndarray) – Y-coordinates of the points.
f (float) – Parameter controlling the shape of the weight curve. Behavior depends on the weighting function.
niter (int) – Number of robustness iterations to execute.
weight_func (numba func) – Numba function giving the weights when passed abs(x - xi) / hi
dx (float) – The spacing of the x coordinates. If above 0, this enables the optimization for equally spaced x coordinates. Must be 0 if spacing is unequal (default).
skipna (bool) – If True (default), remove NaN values before computing the loess. The output has the same missing values as the input.
References
Cleveland [1979]
Code adapted from: Gramfort [2015]
- xclim.sdba.loess.loess_smoothing(da, dim='time', d=1, f=0.5, niter=2, weights='tricube', equal_spacing=None, skipna=True)[source]
Locally weighted regression in 1D: fits a nonparametric regression curve to a scatter plot.
Returns a smoothed curve along given dimension. The regression is computed for each point using a subset of neighbouring points as given from evaluating the weighting function locally. Follows the procedure of Cleveland [1979].
- Parameters:
da (xr.DataArray) – The data to smooth using the loess approach.
dim (str) – Name of the dimension along which to perform the loess.
d ([0, 1]) – Degree of the local regression.
f (float) – Parameter controlling the shape of the weight curve. Behavior depends on the weighting function, but it usually represents the span of the weighting function in reference to x-coordinates normalized from 0 to 1.
niter (int) – Number of robustness iterations to execute.
weights ([“tricube”, “gaussian”] or callable) – Shape of the weighting function, see notes. The user can provide a function or a string: “tricube” : a smooth top-hat like curve. “gaussian” : a gaussian curve, f gives the span for 95% of the values.
equal_spacing (bool, optional) – Whether to use the equal spacing optimization. If None (the default), it is activated only if the x-axis is equally-spaced. When activated, dx = x[1] - x[0].
skipna (bool) – If True (default), skip missing values (as marked by NaN). The output will have the same missing values as the input.
Notes
As stated in Cleveland [1979], the weighting function \(W(x)\) should respect the following conditions:
\(W(x) > 0\) for \(|x| < 1\)
\(W(-x) = W(x)\)
\(W(x)\) is non-increasing for \(x \ge 0\)
\(W(x) = 0\) for \(|x| \ge 0\)
If a Callable is provided, it should only accept the 1D np.ndarray \(x\) which is an absolute value function going from 1 to 0 to 1 around \(x_i\), for all values where \(x - x_i < h_i\) with \(h_i\) the distance of the rth nearest neighbor of \(x_i\), \(r = f * size(x)\).
References
Cleveland [1979]
Code adapted from: Gramfort [2015]
xclim.sdba.measures module
Measures Submodule
SDBA diagnostic tests are made up of properties and measures. Measures compare adjusted simulations to a reference, through statistical properties or directly. This framework for the diagnostic tests was inspired by the VALUE project.
- class xclim.sdba.measures.StatisticalMeasure(**kwds)[source]
Bases:
Indicator
Base indicator class for statistical measures used when validating bias-adjusted outputs.
Statistical measures use input data where the time dimension was reduced, usually by the computation of a
xclim.sdba.properties.StatisticalProperty
instance. They usually take two arrays as input: “sim” and “ref”, “sim” being measured against “ref”. The two arrays must have identical coordinates on their common dimensions.Statistical measures are generally unit-generic. If the inputs have different units, “sim” is converted to match “ref”.
- classmethod _ensure_correct_parameters(parameters)[source]
Ensure the parameters are correctly set and ordered.
Sets the correct variable default to be sure.
- _preprocess_and_checks(das, params)[source]
Perform parent’s checks and also check convert units so that sim matches ref.
- realm = 'generic'
- class xclim.sdba.measures.StatisticalPropertyMeasure(**kwds)[source]
Bases:
Indicator
Base indicator class for statistical properties that include the comparison measure, used when validating bias-adjusted outputs.
StatisticalPropertyMeasure objects combine the functionalities of
xclim.sdba.properties.StatisticalProperty
andxclim.sdba.properties.StatisticalMeasure
.Statistical properties usually reduce the time dimension and sometimes more dimensions (for example in spatial properties), sometimes adding a grouping dimension according to the passed value of group (e.g.: group=’time.month’ means the loss of the time dimension and the addition of a month one).
Statistical measures usually take two arrays as input: “sim” and “ref”, “sim” being measured against “ref”.
Statistical property-measures are generally unit-generic. If the inputs have different units, “sim” is converted to match “ref”.
- classmethod _ensure_correct_parameters(parameters)[source]
Ensure the parameters are correctly set and ordered.
Sets the correct variable default to be sure.
- _preprocess_and_checks(das, params)[source]
Perform parent’s checks and also check convert units so that sim matches ref.
- allowed_groups = None
A list of allowed groupings. A subset of dayofyear, week, month, season or group. The latter stands for no temporal grouping.
- aspect = None
marginal, temporal, multivariate or spatial.
- Type:
The aspect the statistical property studies
- realm = 'generic'
- xclim.sdba.measures._annual_cycle_correlation(sim, ref, window=15, group='time')[source]
Annual cycle correlation.
Pearson correlation coefficient between the smooth day-of-year averaged annual cycles of the simulation and the reference. In the smooth day-of-year averaged annual cycles, each day-of-year is averaged over all years and over a window of days around that day.
- Parameters:
sim (xr.DataArray) – data from the simulation (a time-series for each grid-point)
ref (xr.DataArray) – data from the reference (observations) (a time-series for each grid-point)
window (int) – Size of window around each day of year around which to take the mean. E.g. If window=31, Jan 1st is averaged over from December 17th to January 16th.
group (str) – Compute the property and measure for each temporal groups individually. Currently not implemented.
- Return type:
DataArray
- Returns:
xr.DataArray, [dimensionless] – Annual cycle correlation
- xclim.sdba.measures._bias(sim, ref)[source]
Bias.
The bias is the simulation minus the reference.
- Parameters:
sim (xr.DataArray) – data from the simulation (one value for each grid-point)
ref (xr.DataArray) – data from the reference (observations) (one value for each grid-point)
- Return type:
DataArray
- Returns:
xr.DataArray, [same as ref] – Absolute bias
- xclim.sdba.measures._circular_bias(sim, ref)[source]
Circular bias.
Bias considering circular time series. E.g. The bias between doy 365 and doy 1 is 364, but the circular bias is -1.
- Parameters:
sim (xr.DataArray) – data from the simulation (one value for each grid-point)
ref (xr.DataArray) – data from the reference (observations) (one value for each grid-point)
- Return type:
DataArray
- Returns:
xr.DataArray, [days] – Circular bias
- xclim.sdba.measures._mae(sim, ref, group='time')[source]
Mean absolute error.
The mean absolute error on the time dimension between the simulation and the reference.
- Parameters:
sim (xr.DataArray) – data from the simulation (a time-series for each grid-point)
ref (xr.DataArray) – data from the reference (observations) (a time-series for each grid-point)
group (str) – Compute the property and measure for each temporal groups individually. Currently not implemented.
- Return type:
DataArray
- Returns:
xr.DataArray, [same as ref] – Mean absolute error
- xclim.sdba.measures._ratio(sim, ref)[source]
Ratio.
The ratio is the quotient of the simulation over the reference.
- Parameters:
sim (xr.DataArray) – data from the simulation (one value for each grid-point)
ref (xr.DataArray) – data from the reference (observations) (one value for each grid-point)
- Return type:
DataArray
- Returns:
xr.DataArray, [dimensionless] – Ratio
- xclim.sdba.measures._relative_bias(sim, ref)[source]
Relative Bias.
The relative bias is the simulation minus reference, divided by the reference.
- Parameters:
sim (xr.DataArray) – data from the simulation (one value for each grid-point)
ref (xr.DataArray) – data from the reference (observations) (one value for each grid-point)
- Return type:
DataArray
- Returns:
xr.DataArray, [dimensionless] – Relative bias
- xclim.sdba.measures._rmse(sim, ref, group='time')[source]
Root mean square error.
The root mean square error on the time dimension between the simulation and the reference.
- Parameters:
sim (xr.DataArray) – Data from the simulation (a time-series for each grid-point)
ref (xr.DataArray) – Data from the reference (observations) (a time-series for each grid-point)
group (str) – Compute the property and measure for each temporal groups individually. Currently not implemented.
- Return type:
DataArray
- Returns:
xr.DataArray, [same as ref] – Root mean square error
- xclim.sdba.measures._scorr(sim, ref, *, dims=None, group='time')[source]
Spatial correllogram.
Compute the inter-site correlations of each array, compute the difference in correlations and sum. Taken from Vrac (2018). The spatial and temporal dimensions are reduced.
- Parameters:
sim (xr.DataArray) – data from the simulation (a time-series for each grid-point)
ref (xr.DataArray) – data from the reference (observations) (a time-series for each grid-point)
dims (sequence of strings, optional) – Name of the spatial dimensions. If None (default), all dimensions except ‘time’ are used.
group (str) – Compute the property and measure for each temporal groups individually. Currently not implemented.
- Returns:
xr.DataArray, [dimensionless] – Sum of the inter-site correlation differences.
- xclim.sdba.measures.annual_cycle_correlation(sim='sim', ref='ref', *, window=15, group='time', ds=None)
Annual cycle correlation. (realm: generic)
Pearson correlation coefficient between the smooth day-of-year averaged annual cycles of the simulation and the reference. In the smooth day-of-year averaged annual cycles, each day-of-year is averaged over all years and over a window of days around that day.
Based on indice
_annual_cycle_correlation()
.- Parameters:
sim (str or DataArray) – data from the simulation (a time-series for each grid-point) Default : ds.sim.
ref (str or DataArray) – data from the reference (observations) (a time-series for each grid-point) Default : ds.ref.
window (number) – Size of window around each day of year around which to take the mean. E.g. If window=31, Jan 1st is averaged over from December 17th to January 16th. Default : 15.
group (Any) – Compute the property and measure for each temporal groups individually. Currently not implemented. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
annual_cycle_correlation (DataArray) – Annual cycle correlation
- xclim.sdba.measures.bias(sim='sim', ref='ref', *, ds=None)
Bias. (realm: generic)
The bias is the simulation minus the reference.
Based on indice
_bias()
.- Parameters:
sim (str or DataArray) – data from the simulation (one value for each grid-point) Default : ds.sim.
ref (str or DataArray) – data from the reference (observations) (one value for each grid-point) Default : ds.ref.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
bias (DataArray) – Absolute bias
- xclim.sdba.measures.circular_bias(sim='sim', ref='ref', *, ds=None)
Circular bias. (realm: generic)
Bias considering circular time series. E.g. The bias between doy 365 and doy 1 is 364, but the circular bias is -1.
Based on indice
_circular_bias()
.- Parameters:
sim (str or DataArray) – data from the simulation (one value for each grid-point) Default : ds.sim.
ref (str or DataArray) – data from the reference (observations) (one value for each grid-point) Default : ds.ref.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
circular_bias (DataArray) – Circular bias [days]
- xclim.sdba.measures.mae(sim='sim', ref='ref', *, group='time', ds=None)
Mean absolute error. (realm: generic)
The mean absolute error on the time dimension between the simulation and the reference.
Based on indice
_mae()
.- Parameters:
sim (str or DataArray) – data from the simulation (a time-series for each grid-point) Default : ds.sim.
ref (str or DataArray) – data from the reference (observations) (a time-series for each grid-point) Default : ds.ref.
group (Any) – Compute the property and measure for each temporal groups individually. Currently not implemented. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
mae (DataArray) – Mean absolute error, with additional attributes: cell_methods: time: mean
- xclim.sdba.measures.ratio(sim='sim', ref='ref', *, ds=None)
Ratio. (realm: generic)
The ratio is the quotient of the simulation over the reference.
Based on indice
_ratio()
.- Parameters:
sim (str or DataArray) – data from the simulation (one value for each grid-point) Default : ds.sim.
ref (str or DataArray) – data from the reference (observations) (one value for each grid-point) Default : ds.ref.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
ratio (DataArray) – Ratio
- xclim.sdba.measures.relative_bias(sim='sim', ref='ref', *, ds=None)
Relative Bias. (realm: generic)
The relative bias is the simulation minus reference, divided by the reference.
Based on indice
_relative_bias()
.- Parameters:
sim (str or DataArray) – data from the simulation (one value for each grid-point) Default : ds.sim.
ref (str or DataArray) – data from the reference (observations) (one value for each grid-point) Default : ds.ref.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
relative_bias (DataArray) – Relative bias
- xclim.sdba.measures.rmse(sim='sim', ref='ref', *, group='time', ds=None)
Root mean square error. (realm: generic)
The root mean square error on the time dimension between the simulation and the reference.
Based on indice
_rmse()
.- Parameters:
sim (str or DataArray) – Data from the simulation (a time-series for each grid-point) Default : ds.sim.
ref (str or DataArray) – Data from the reference (observations) (a time-series for each grid-point) Default : ds.ref.
group (Any) – Compute the property and measure for each temporal groups individually. Currently not implemented. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
rmse (DataArray) – Root mean square error, with additional attributes: cell_methods: time: mean
- xclim.sdba.measures.scorr(sim='sim', ref='ref', *, dims=None, group='time', ds=None)
Spatial correllogram. (realm: generic)
Compute the inter-site correlations of each array, compute the difference in correlations and sum. Taken from Vrac (2018). The spatial and temporal dimensions are reduced.
Based on indice
_scorr()
.- Parameters:
sim (str or DataArray) – data from the simulation (a time-series for each grid-point) Default : ds.sim.
ref (str or DataArray) – data from the reference (observations) (a time-series for each grid-point) Default : ds.ref.
dims (Any) – Name of the spatial dimensions. If None (default), all dimensions except ‘time’ are used. Default : None.
group (Any) – Compute the property and measure for each temporal groups individually. Currently not implemented. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
Scorr (DataArray) – Sum of the inter-site correlation differences.
xclim.sdba.nbutils module
Numba-accelerated Utilities
- xclim.sdba.nbutils._autocorrelation(X)[source]
Mean of the NxN pairwise distances of points in X of shape KxN.
Similar to scipy.spatial.distance.pdist(…, ‘euclidean’)
- xclim.sdba.nbutils._correlation(X, Y)[source]
Compute a correlation as the mean of pairwise distances between points in X and Y.
X is KxN and Y is KxM, the result is the mean of the MxN distances. Similar to scipy.spatial.distance.cdist(X, Y, ‘euclidean’)
- xclim.sdba.nbutils._extrapolate_on_quantiles(interp, oldx, oldg, oldy, newx, newg, method='constant')[source]
Apply extrapolation to the output of interpolation on quantiles with a given grouping.
Arguments are the same as _interp_on_quantiles_2D.
- xclim.sdba.nbutils._first_and_last_nonnull(arr)[source]
For each row of arr, get the first and last non NaN elements.
xclim.sdba.processing module
Pre- and Post-Processing Submodule
- xclim.sdba.processing._get_number_of_elements_by_year(time)[source]
Get the number of elements in time in a year by inferring its sampling frequency.
Only calendar with uniform year lengths are supported : 360_day, noleap, all_leap.
- xclim.sdba.processing.adapt_freq(ref, sim, *, group, thresh='0 mm d-1')[source]
Adapt frequency of values under thresh of sim, in order to match ref.
This is useful when the dry-day frequency in the simulations is higher than in the references. This function will create new non-null values for sim/hist, so that adjustment factors are less wet-biased. Based on Themeßl et al. [2012].
- Parameters:
ds (xr.Dataset) – With variables : “ref”, Target/reference data, usually observed data, and “sim”, Simulated data.
dim (str) – Dimension name.
group (str or Grouper) – Grouping information, see base.Grouper
thresh (str) – Threshold below which values are considered zero, a quantity with units.
- Return type:
tuple
[DataArray
,DataArray
,DataArray
]- Returns:
sim_adj (xr.DataArray) – Simulated data with the same frequency of values under threshold than ref. Adjustment is made group-wise.
pth (xr.DataArray) – For each group, the smallest value of sim that was not frequency-adjusted. All values smaller were either left as zero values or given a random value between thresh and pth. NaN where frequency adaptation wasn’t needed.
dP0 (xr.DataArray) – For each group, the percentage of values that were corrected in sim.
Notes
With \(P_0^r\) the frequency of values under threshold \(T_0\) in the reference (ref) and \(P_0^s\) the same for the simulated values, \(\\Delta P_0 = \\frac{P_0^s - P_0^r}{P_0^s}\), when positive, represents the proportion of values under \(T_0\) that need to be corrected.
The correction replaces a proportion \(\\Delta P_0\) of the values under \(T_0\) in sim by a uniform random number between \(T_0\) and \(P_{th}\), where \(P_{th} = F_{ref}^{-1}( F_{sim}( T_0 ) )\) and F(x) is the empirical cumulative distribution function (CDF).
References
Themeßl, Gobiet, and Heinrich [2012]
- xclim.sdba.processing.construct_moving_yearly_window(da, window=21, step=1, dim='movingwin')[source]
Construct a moving window DataArray.
Stack windows of da in a new ‘movingwin’ dimension. Windows are always made of full years, so calendar with non-uniform year lengths are not supported.
Windows are constructed starting at the beginning of da, if number of given years is not a multiple of step, then the last year(s) will be missing as a supplementary window would be incomplete.
- Parameters:
da (xr.Dataset) – A DataArray with a time dimension.
window (int) – The length of the moving window as a number of years.
step (int) – The step between each window as a number of years.
dim (str) – The new dimension name. If given, must also be given to unpack_moving_yearly_window.
- Returns:
xr.DataArray – A DataArray with a new movingwin dimension and a time dimension with a length of 1 window. This assumes downstream algorithms do not make use of the _absolute_ year of the data. The correct timeseries can be reconstructed with
unpack_moving_yearly_window()
. The coordinates of movingwin are the first date of the windows.
- xclim.sdba.processing.escore(tgt, sim, dims=('variables', 'time'), N=0, scale=False)[source]
Energy score, or energy dissimilarity metric, based on Szekely and Rizzo [2004] and Cannon [2018].
- Parameters:
tgt (xr.DataArray) – Target observations.
sim (xr.DataArray) – Candidate observations. Must have the same dimensions as tgt.
dims (sequence of 2 strings) – The name of the dimensions along which the variables and observation points are listed. tgt and sim can have different length along the second one, but must be equal along the first one. The result will keep all other dimensions.
N (int) – If larger than 0, the number of observations to use in the score computation. The points are taken evenly distributed along obs_dim.
scale (bool) – Whether to scale the data before computing the score. If True, both arrays as scaled according to the mean and standard deviation of tgt along obs_dim. (std computed with ddof=1 and both statistics excluding NaN values).
- Return type:
DataArray
- Returns:
xr.DataArray – e-score with dimensions not in dims.
Notes
Explanation adapted from the “energy” R package documentation. The e-distance between two clusters \(C_i\), \(C_j\) (tgt and sim) of size \(n_i,n_j\) proposed by Szekely and Rizzo [2004] is defined by:
\[e(C_i,C_j) = \frac{1}{2}\frac{n_i n_j}{n_i + n_j} \left[2 M_{ij} − M_{ii} − M_{jj}\right]\]where
\[M_{ij} = \frac{1}{n_i n_j} \sum_{p = 1}^{n_i} \sum_{q = 1}^{n_j} \left\Vert X_{ip} − X{jq} \right\Vert.\]\(\Vert\cdot\Vert\) denotes Euclidean norm, \(X_{ip}\) denotes the p-th observation in the i-th cluster.
The input scaling and the factor \(\frac{1}{2}\) in the first equation are additions of Cannon [2018] to the metric. With that factor, the test becomes identical to the one defined by Baringhaus and Franz [2004]. This version is tested against values taken from Alex Cannon’s MBC R package [Cannon, 2020].
References
Baringhaus and Franz [2004], Cannon [2018], Cannon [2020], Szekely and Rizzo [2004]
- xclim.sdba.processing.from_additive_space(data, lower_bound=None, upper_bound=None, trans=None, units=None)[source]
Transform back to the physical space a variable that was transformed with to_additive_space.
Based on Alavoine and Grenier [2022]. If parameters are not present on the attributes of the data, they must be all given are arguments.
- Parameters:
data (xr.DataArray) – A variable that was transformed by
to_additive_space()
.lower_bound (str, optional) – The smallest physical value of the variable, as a Quantity string. The final data will have no value smaller or equal to this bound. If None (default), the sdba_transform_lower attribute is looked up on data.
upper_bound (str, optional) – The largest physical value of the variable, as a Quantity string. Only relevant for the logit transformation. The final data will have no value larger or equal to this bound. If None (default), the sdba_transform_upper attribute is looked up on data.
trans ({‘log’, ‘logit’}, optional) – The transformation to use. See notes. If None (the default), the sdba_transform attribute is looked up on data.
units (str, optional) – The units of the data before transformation to the additive space. If None (the default), the sdba_transform_units attribute is looked up on data.
- Returns:
xr.DataArray – The physical variable. Attributes are conserved, even if some might be incorrect. Except units which are taken from sdba_transform_units if available. All sdba_transform* attributes are deleted.
Notes
Given a variable that is not usable in an additive adjustment,
to_additive_space()
applied a transformation to a space where additive methods are sensible. Given \(Y\) the transformed variable, \(b_-\) the lower physical bound of that variable and \(b_+\) the upper physical bound, two back-transformations are currently implemented to get \(X\), the physical variable.log
\[X = e^{Y} + b_-\]logit
\[X' = \frac{1}{1 + e^{-Y}} X = X * (b_+ - b_-) + b_-\]
See also
to_additive_space
for the original transformation.
References
Alavoine and Grenier [2022]
- xclim.sdba.processing.jitter(x, lower=None, upper=None, minimum=None, maximum=None)[source]
Replace values under a threshold and values above another by a uniform random noise.
Warning
Not to be confused with R’s jitter, which adds uniform noise instead of replacing values.
- Parameters:
x (xr.DataArray) – Values.
lower (str, optional) – Threshold under which to add uniform random noise to values, a quantity with units. If None, no jittering is performed on the lower end.
upper (str, optional) – Threshold over which to add uniform random noise to values, a quantity with units. If None, no jittering is performed on the upper end.
minimum (str, optional) – Lower limit (excluded) for the lower end random noise, a quantity with units. If None but lower is not None, 0 is used.
maximum (str, optional) – Upper limit (excluded) for the upper end random noise, a quantity with units. If upper is not None, it must be given.
- Return type:
DataArray
- Returns:
xr.DataArray – Same as x but values < lower are replaced by a uniform noise in range (minimum, lower) and values >= upper are replaced by a uniform noise in range [upper, maximum). The two noise distributions are independent.
- xclim.sdba.processing.jitter_over_thresh(x, thresh, upper_bnd)[source]
Replace values greater than threshold by a uniform random noise.
Warning
Not to be confused with R’s jitter, which adds uniform noise instead of replacing values.
- Parameters:
x (xr.DataArray) – Values.
thresh (str) – Threshold over which to add uniform random noise to values, a quantity with units.
upper_bnd (str) – Maximum possible value for the random noise, a quantity with units.
- Return type:
DataArray
- Returns:
xr.DataArray
Notes
If thresh is low, this will change the mean value of x.
- xclim.sdba.processing.jitter_under_thresh(x, thresh)[source]
Replace values smaller than threshold by a uniform random noise.
Warning
Not to be confused with R’s jitter, which adds uniform noise instead of replacing values.
- Parameters:
x (xr.DataArray) – Values.
thresh (str) – Threshold under which to add uniform random noise to values, a quantity with units.
- Return type:
DataArray
- Returns:
xr.DataArray
Notes
If thresh is high, this will change the mean value of x.
- xclim.sdba.processing.normalize(data, norm=None, *, group, kind='+')[source]
Normalize an array by removing its mean.
Normalization if performed group-wise and according to kind.
- Parameters:
data (xr.DataArray) – The variable to normalize.
norm (xr.DataArray, optional) – If present, it is used instead of computing the norm again.
group (str or Grouper) – Grouping information. See
xclim.sdba.base.Grouper
for details..kind ({‘+’, ‘’}*) – If kind is “+”, the mean is subtracted from the mean and if it is ‘*’, it is divided from the data.
- Return type:
tuple
[DataArray
,DataArray
]- Returns:
xr.DataArray – Groupwise anomaly.
norm (xr.DataArray) – Mean over each group.
- xclim.sdba.processing.reordering(ref, sim, group='time')[source]
Reorders data in sim following the order of ref.
The rank structure of ref is used to reorder the elements of sim along dimension “time”, optionally doing the operation group-wise.
- Parameters:
sim (xr.DataArray) – Array to reorder.
ref (xr.DataArray) – Array whose rank order sim should replicate.
group (str) – Grouping information. See
xclim.sdba.base.Grouper
for details.
- Return type:
Dataset
- Returns:
xr.Dataset – sim reordered according to ref’s rank order.
References
Cannon [2018]
- xclim.sdba.processing.stack_variables(ds, rechunk=True, dim='multivar')[source]
Stack different variables of a dataset into a single DataArray with a new “variables” dimension.
Variable attributes are all added as lists of attributes to the new coordinate, prefixed with “_”. Variables are concatenated in the new dimension in alphabetical order, to ensure coherent behaviour with different datasets.
- Parameters:
ds (xr.Dataset) – Input dataset.
rechunk (bool) – If True (default), dask arrays are rechunked with variables : -1.
dim (str) – Name of dimension along which variables are indexed.
- Returns:
xr.DataArray – The transformed variable. Attributes are conserved, even if some might be incorrect, except for units, which are replaced with “”. Old units are stored in sdba_transformation_units. A sdba_transform attribute is added, set to the transformation method. sdba_transform_lower and sdba_transform_upper are also set if the requested bounds are different from the defaults.
Array with variables stacked along dim dimension. Units are set to “”.
- xclim.sdba.processing.standardize(da, mean=None, std=None, dim='time')[source]
Standardize a DataArray by centering its mean and scaling it by its standard deviation.
Either of both of mean and std can be provided if need be.
- Return type:
tuple
[DataArray
|Dataset
,DataArray
,DataArray
]- Returns:
out (xr.DataArray or xr.Dataset) – Standardized data.
mean (xr.DataArray) – Mean.
std (xr.DataArray) – Standard Deviation.
- xclim.sdba.processing.to_additive_space(data, lower_bound, upper_bound=None, trans='log')[source]
Transform a non-additive variable into an additive space by the means of a log or logit transformation.
Based on Alavoine and Grenier [2022].
- Parameters:
data (xr.DataArray) – A variable that can’t usually be bias-adjusted by additive methods.
lower_bound (str) – The smallest physical value of the variable, excluded, as a Quantity string. The data should only have values strictly larger than this bound.
upper_bound (str, optional) – The largest physical value of the variable, excluded, as a Quantity string. Only relevant for the logit transformation. The data should only have values strictly smaller than this bound.
trans ({‘log’, ‘logit’}) – The transformation to use. See notes.
Notes
Given a variable that is not usable in an additive adjustment, this applies a transformation to a space where additive methods are sensible. Given \(X\) the variable, \(b_-\) the lower physical bound of that variable and \(b_+\) the upper physical bound, two transformations are currently implemented to get \(Y\), the additive-ready variable. \(\ln\) is the natural logarithm.
log
\[Y = \ln\left( X - b_- \right)\]Usually used for variables with only a lower bound, like precipitation (pr, prsn, etc) and daily temperature range (dtr). Both have a lower bound of 0.
logit
\[X' = (X - b_-) / (b_+ - b_-) Y = \ln\left(\frac{X'}{1 - X'} \right)\]Usually used for variables with both a lower and a upper bound, like relative and specific humidity, cloud cover fraction, etc.
This will thus produce Infinity and NaN values where \(X == b_-\) or \(X == b_+\). We recommend using
jitter_under_thresh()
andjitter_over_thresh()
to remove those issues.See also
from_additive_space
for the inverse transformation.
jitter_under_thresh
Remove values exactly equal to the lower bound.
jitter_over_thresh
Remove values exactly equal to the upper bound.
References
Alavoine and Grenier [2022]
- xclim.sdba.processing.uniform_noise_like(da, low=1e-06, high=0.001)[source]
Return a uniform noise array of the same shape as da.
Noise is uniformly distributed between low and high. Alternative method to jitter_under_thresh for avoiding zeroes.
- Return type:
DataArray
- xclim.sdba.processing.unpack_moving_yearly_window(da, dim='movingwin', append_ends=True)[source]
Unpack a constructed moving window dataset to a normal timeseries, only keeping the central data.
Unpack DataArrays created with
construct_moving_yearly_window()
and recreate a timeseries data. If append_ends is False, only keeps the central non-overlapping years. The final timeseries will be (window - step) years shorter than the initial one. If append_ends is True, the time points from first and last windows will be included in the final timeseries.The time points that are not in a window will never be included in the final timeseries. The window length and window step are inferred from the coordinates.
- Parameters:
da (xr.DataArray) – As constructed by
construct_moving_yearly_window()
.dim (str) – The window dimension name as given to the construction function.
append_ends (bool) – Whether to append the ends of the timeseries If False, the final timeseries will be (window - step) years shorter than the initial one, but all windows will contribute equally. If True, the year before the middle years of the first window and the years after the middle years of the last window are appended to the middle years. The final timeseries will be the same length as the initial timeseries if the windows span the whole timeseries. The time steps that are not in a window will be left out of the final timeseries.
- xclim.sdba.processing.unstack_variables(da, dim=None)[source]
Unstack a DataArray created by stack_variables to a dataset.
- Parameters:
da (xr.DataArray) – Array holding different variables along dim dimension.
dim (str) – Name of dimension along which the variables are stacked. If not specified (default), dim is inferred from attributes of the coordinate.
- Returns:
xr.Dataset – Dataset holding each variable in an individual DataArray.
xclim.sdba.properties module
Properties Submodule
SDBA diagnostic tests are made up of statistical properties and measures. Properties are calculated on both simulation and reference datasets. They collapse the time dimension to one value.
This framework for the diagnostic tests was inspired by the VALUE project. Statistical Properties is the xclim term for ‘indices’ in the VALUE project.
- class xclim.sdba.properties.StatisticalProperty(**kwds)[source]
Bases:
Indicator
Base indicator class for statistical properties used for validating bias-adjusted outputs.
Statistical properties reduce the time dimension, sometimes adding a grouping dimension according to the passed value of group (e.g.: group=’time.month’ means the loss of the time dimension and the addition of a month one).
Statistical properties are generally unit-generic. To use those indicator in a workflow, it is recommended to wrap them with a virtual submodule, creating one specific indicator for each variable input (or at least for each possible dimensionality).
Statistical properties may restrict the sampling frequency of the input, they usually take in a single variable (named “da” in unit-generic instances).
- classmethod _ensure_correct_parameters(parameters)[source]
Ensure the parameters are correctly set and ordered.
Sets the correct variable default to be sure.
- _preprocess_and_checks(das, params)[source]
Perform parent’s checks and also check if group is allowed.
- allowed_groups = None
A list of allowed groupings. A subset of dayofyear, week, month, season or group. The latter stands for no temporal grouping.
- aspect = None
marginal, temporal, multivariate or spatial.
- Type:
The aspect the statistical property studies
- get_measure()[source]
Get the statistical measure indicator that is best used with this statistical property.
- measure = 'xclim.sdba.measures.BIAS'
The default measure to use when comparing the properties of two datasets. This gives the registry id. See
get_measure()
.
- realm = 'generic'
- xclim.sdba.properties._acf(da, *, lag=1, group='time.season')[source]
Autocorrelation.
Autocorrelation with a lag over a time resolution and averaged over all years.
- Parameters:
da (xr.DataArray) – Variable on which to calculate the diagnostic.
lag (int) – Lag.
group ({‘time.season’, ‘time.month’}) – Grouping of the output. E.g. If ‘time.month’, the autocorrelation is calculated over each month separately for all years. Then, the autocorrelation for all Jan/Feb/… is averaged over all years, giving 12 outputs for each grid point.
- Return type:
DataArray
- Returns:
xr.DataArray, [dimensionless] – Lag-{lag} autocorrelation of the variable over a {group.prop} and averaged over all years.
See also
References
Alavoine and Grenier [2022]
- xclim.sdba.properties._annual_cycle(da, *, stat='absamp', window=31, group='time')[source]
Annual cycle statistics.
A daily climatology is calculated and optionnaly smoothed with a (circular) moving average. The requested statistic is returned.
- Parameters:
da (xr.DataArray) – Variable on which to calculate the diagnostic.
stat ({‘absamp’,’relamp’, ‘phase’, ‘min’, ‘max’, ‘asymmetry’}) –
‘absamp’ is the peak-to-peak amplitude. (max - min). In the same units as the input.
‘relamp’ is a relative percentage. 100 * (max - min) / mean (Recommended for precipitation). Dimensionless.
‘phase’ is the day of year of the maximum.
‘max’ is the maximum. Same units as the input.
‘min’ is the minimum. Same units as the input.
‘asymmetry’ is the length of the period going from the minimum to the maximum. In years between 0 and 1.
window (int) – Size of the window for the moving average filtering. Deactivate this feature by passing window = 1.
- Return type:
DataArray
- Returns:
xr.DataArray, [same units as input or dimensionless or time] – {stat} of the annual cycle.
- xclim.sdba.properties._annual_statistic(da, *, stat='absamp', window=31, group='time')[source]
Annual range statistics.
Compute a statistic on each year of data and return the interannual average. This is similar to the annual cycle, but with the statistic and average operations inverted.
- Parameters:
da (xr.DataArray) – Data.
stat ({‘absamp’, ‘relamp’, ‘phase’}) – The statistic to return.
window (int) – Size of the window for the moving average filtering. Deactivate this feature by passing window = 1.
- Returns:
xr.DataArray, [same units as input or dimensionless] – Average annual {stat}.
- xclim.sdba.properties._corr_btw_var(da1, da2, *, corr_type='Spearman', group='time', output='correlation')[source]
Correlation between two variables.
Spearman or Pearson correlation coefficient between two variables at the time resolution.
- Parameters:
da1 (xr.DataArray) – First variable on which to calculate the diagnostic.
da2 (xr.DataArray) – Second variable on which to calculate the diagnostic.
corr_type ({‘Pearson’,’Spearman’}) – Type of correlation to calculate.
output ({‘correlation’, ‘pvalue’}) – Wheter to return the correlation coefficient or the p-value.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping of the output. Eg. For ‘time.month’, the correlation would be calculated on each month separately, but with all the years together.
- Return type:
DataArray
- Returns:
xr.DataArray, [dimensionless] – {corr_type} correlation coefficient
- xclim.sdba.properties._decorrelation_length(da, *, radius=300, thresh=0.5, dims=None, bins=100, group='time')[source]
Decorrelation length.
Distance from a grid cell where the correlation with its neighbours goes below the threshold. A correlogram is calculated for each grid cell following the method from
xclim.sdba.properties.spatial_correlogram
. Then, we find the first bin closest to the correlation threshold.- Parameters:
da (xr.DataArray) – Data.
radius (float) – Radius (in km) defining the region where correlations will be calculated between a point and its neighbours.
thresh (float) –
- Threshold correlation defining decorrelation.
The decorrelation length is defined as the center of the distance bin that has a correlation closest to this threshold.
dims (sequence of strings) – Name of the spatial dimensions. Once these are stacked, the longitude and latitude coordinates must be 1D.
bins – Same as argument bins from
scipy.stats.binned_statistic()
. If given as a scalar, the equal-width bin limits from 0 to radius are generated here (instead of letting scipy do it) to improve performance.group (str) – Useless for now.
- Returns:
xr.DataArray, [km] – Decorrelation length.
Notes
Calculating this property requires a lot of memory. It will not work with large datasets.
- xclim.sdba.properties._first_eof(da, *, dims=None, kind='+', thresh='1 mm/d', group='time')[source]
First Empirical Orthogonal Function.
Through principal component analysis (PCA), compute the predominant empirical orthogonal function. The temporal dimension is reduced. The Eof is multiplied by the sign of its mean to ensure coherent signs as much as possible. Needs the eofs package to run. Based on an idea from [Vrac, 2018], using an implementation from [Dawson, 2016].
- Parameters:
da (xr.DataArray) – Data.
dims (sequence of string, optional) – Name of the spatial dimensions. If None (default), all dimensions except “time” are used.
kind ({‘+’, ‘’}*) – Variable “kind”. If multiplicative, the zero values are set to very small values and the PCA is performed over the logarithm of the data.
thresh (str) – If kind is multiplicative, this is the “zero” threshold passed to
xclim.sdba.processing.jitter_under_thresh()
.group (str) – Useless for now.
- Returns:
xr.DataArray, [dimensionless] – First empirical orthogonal function
- xclim.sdba.properties._mean(da, *, group='time')[source]
Mean.
Mean over all years at the time resolution.
- Parameters:
da (xr.DataArray) – Variable on which to calculate the diagnostic.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping of the output. E.g. If ‘time.month’, the temporal average is performed separately for each month.
- Return type:
DataArray
- Returns:
xr.DataArray, [same as input] – Mean of the variable.
- xclim.sdba.properties._quantile(da, *, q=0.98, group='time')[source]
Quantile.
Returns the quantile q of the distribution of the variable over all years at the time resolution.
- Parameters:
da (xr.DataArray) – Variable on which to calculate the diagnostic.
q (float) – Quantile to be calculated. Should be between 0 and 1.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping of the output. E.g. If ‘time.month’, the quantile is computed separately for each month.
- Return type:
DataArray
- Returns:
xr.DataArray, [same as input] – Quantile {q} of the variable.
- xclim.sdba.properties._relative_frequency(da, *, op='>=', thresh='1 mm d-1', group='time')[source]
Relative Frequency.
Relative Frequency of days with variable respecting a condition (defined by an operation and a threshold) at the time resolution. The relative frequency is the number of days that satisfy the condition divided by the total number of days.
- Parameters:
da (xr.DataArray) – Variable on which to calculate the diagnostic.
op ({“>”, “<”, “>=”, “<=”}) – Operation to verify the condition. The condition is variable {op} threshold.
thresh (str) – Threshold on which to evaluate the condition.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping on the output. Eg. For ‘time.month’, the relative frequency would be calculated on each month, with all years included.
- Return type:
DataArray
- Returns:
xr.DataArray, [dimensionless] – Relative frequency of values {op} {thresh}.
- xclim.sdba.properties._return_value(da, *, period=20, op='max', method='ML', group='time')[source]
Return value.
Return the value corresponding to a return period. On average, the return value will be exceeded (or not exceed for op=’min’) every return period (e.g. 20 years). The return value is computed by first extracting the variable annual maxima/minima, fitting a statistical distribution to the maxima/minima, then estimating the percentile associated with the return period (eg. 95th percentile (1/20) for 20 years)
- Parameters:
da (xr.DataArray) – Variable on which to calculate the diagnostic.
period (int) – Return period. Number of years over which to check if the value is exceeded (or not for op=’min’).
op ({‘max’,’min’}) – Whether we are looking for a probability of exceedance (‘max’, right side of the distribution) or a probability of non-exceedance (min, left side of the distribution).
method ({“ML”, “PWM”}) – Fitting method, either maximum likelihood (ML) or probability weighted moments (PWM), also called L-Moments. The PWM method is usually more robust to outliers.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping of the output. A distribution of the extremes is done for each group.
- Return type:
DataArray
- Returns:
xr.DataArray, [same as input] – {period}-{group.prop_name} {op} return level of the variable.
- xclim.sdba.properties._skewness(da, *, group='time')[source]
Skewness.
Skewness of the distribution of the variable over all years at the time resolution.
- Parameters:
da (xr.DataArray) – Variable on which to calculate the diagnostic.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping of the output. E.g. If ‘time.month’, the skewness is performed separately for each month.
- Return type:
DataArray
- Returns:
xr.DataArray, [dimensionless] – Skewness of the variable.
See also
- xclim.sdba.properties._spatial_correlogram(da, *, dims=None, bins=100, group='time', method=1)[source]
Spatial correlogram.
Compute the pairwise spatial correlations (Spearman) and averages them based on the pairwise distances. This collapses the spatial and temporal dimensions and returns a distance bins dimension. Needs coordinates for longitude and latitude. This property is heavy to compute and it will need to create a NxN array in memory (outside of dask), where N is the number of spatial points. There are shortcuts for all-nan time-slices or spatial points, but scipy’s nan-omitting algorithm is extremely slow, so the presence of any lone NaN will increase the computation time. Based on an idea from [François et al., 2020].
- Parameters:
da (xr.DataArray) – Data.
dims (sequence of strings) – Name of the spatial dimensions. Once these are stacked, the longitude and latitude coordinates must be 1D.
bins – Same as argument bins from
xarray.DataArray.groupby_bins()
. If given as a scalar, the equal-width bin limits are generated here (instead of letting xarray do it) to improve performance.group (str) – Useless for now.
- Returns:
xr.DataArray, [dimensionless] – Inter-site correlogram as a function of distance.
- xclim.sdba.properties._spell_length_distribution(da, *, method='amount', op='>=', thresh='1 mm d-1', stat='mean', group='time', resample_before_rl=True)[source]
Spell length distribution.
- Statistic of spell length distribution when the variable respects a condition (defined by an operation, a method and
a threshold).
- Parameters:
da (xr.DataArray) – Variable on which to calculate the diagnostic.
method ({‘amount’, ‘quantile’}) – Method to choose the threshold. ‘amount’: The threshold is directly the quantity in {thresh}. It needs to have the same units as {da}. ‘quantile’: The threshold is calculated as the quantile {thresh} of the distribution.
op ({“>”, “<”, “>=”, “<=”}) – Operation to verify the condition for a spell. The condition for a spell is variable {op} threshold.
thresh (str or float) – Threshold on which to evaluate the condition to have a spell. Str with units if the method is “amount”. Float of the quantile if the method is “quantile”.
stat ({‘mean’,’max’,’min’}) – Statistics to apply to the resampled input at the {group} (e.g. 1-31 Jan 1980) and then over all years (e.g. Jan 1980-2010)
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping of the output. E.g. If ‘time.month’, the spell lengths are coputed separately for each month.
resample_before_rl (bool) – Determines if the resampling should take place before or after the run length encoding (or a similar algorithm) is applied to runs.
- Return type:
DataArray
- Returns:
xr.DataArray, [units of the sampling frequency] – {stat} of spell length distribution when the variable is {op} the {method} {thresh}.
- xclim.sdba.properties._transition_probability(da, *, initial_op='>=', final_op='>=', thresh='1 mm d-1', group='time')[source]
Transition probability.
Probability of transition from the initial state to the final state. The states are booleans comparing the value of the day to the threshold with the operator.
The transition occurs when consecutive days are both in the given states.
- Parameters:
da (xr.DataArray) – Variable on which to calculate the diagnostic.
initial_op ({“>”, “gt”, “<”, “lt”, “>=”, “ge”, “<=”, “le”, “==”, “eq”, “!=”, “ne”}) – Operation to verify the condition for the initial state. The condition is variable {op} threshold.
final_op ({“>”, “gt”, “<”, “lt”, “>=”, “ge”, “<=”, “le”, “==”, “eq”, “!=”, “ne”}) – Operation to verify the condition for the final state. The condition is variable {op} threshold.
thresh (str) – Threshold on which to evaluate the condition.
group ({“time”, “time.season”, “time.month”}) – Grouping on the output. e.g. For “time.month”, the transition probability would be calculated on each month, with all years included.
- Return type:
DataArray
- Returns:
xr.DataArray, [dimensionless] – Transition probability of values {initial_op} {thresh} to values {final_op} {thresh}.
- xclim.sdba.properties._trend(da, *, group='time', output='slope')[source]
Linear Trend.
The data is averaged over each time resolution and the inter-annual trend is returned. This function will rechunk along the grouping dimension.
- Parameters:
da (xr.DataArray) – Variable on which to calculate the diagnostic.
output ({‘slope’, ‘pvalue’}) – Attributes of the linear regression to return. ‘slope’ is the slope of the regression line. ‘pvalue’ is for a hypothesis test whose null hypothesis is that the slope is zero, using Wald Test with t-distribution of the test statistic.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping on the output.
- Return type:
DataArray
- Returns:
xr.DataArray, [units of input per year or dimensionless] – {output} of the interannual linear trend.
See also
- xclim.sdba.properties._var(da, *, group='time')[source]
Variance.
Variance of the variable over all years at the time resolution.
- Parameters:
da (xr.DataArray) – Variable on which to calculate the diagnostic.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping of the output. E.g. If ‘time.month’, the variance is performed separately for each month.
- Return type:
DataArray
- Returns:
xr.DataArray, [square of the input units] – Variance of the variable.
- xclim.sdba.properties.acf(da='da', *, lag=1, group='time.season', ds=None)
Autocorrelation. (realm: generic)
Autocorrelation with a lag over a time resolution and averaged over all years.
Based on indice
_acf()
.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
lag (number) – Lag. Default : 1.
group ({‘time.season’, ‘time.month’}) – Grouping of the output. E.g. If ‘time.month’, the autocorrelation is calculated over each month separately for all years. Then, the autocorrelation for all Jan/Feb/… is averaged over all years, giving 12 outputs for each grid point. Default : time.season.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
acf (DataArray) – Lag-{lag} autocorrelation of the variable over a {group.prop} and averaged over all years.
References
Alavoine and Grenier [2022]
- xclim.sdba.properties.annual_cycle_amplitude(da='da', *, window=31, group='time', ds=None)
Annual cycle statistics. (realm: generic)
A daily climatology is calculated and optionnaly smoothed with a (circular) moving average. The requested statistic is returned.
Based on indice
_annual_cycle()
. With injected parameters: stat=absamp.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
window (number) – Size of the window for the moving average filtering. Deactivate this feature by passing window = 1. Default : 31.
group (Any) – Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
annual_cycle_amplitude (DataArray) – {stat} of the annual cycle., with additional attributes: cell_methods: time: mean time: range
- xclim.sdba.properties.annual_cycle_asymmetry(da='da', *, window=31, group='time', ds=None)
Annual cycle statistics. (realm: generic)
A daily climatology is calculated and optionnaly smoothed with a (circular) moving average. The requested statistic is returned.
Based on indice
_annual_cycle()
. With injected parameters: stat=asymmetry.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
window (number) – Size of the window for the moving average filtering. Deactivate this feature by passing window = 1. Default : 31.
group (Any) – Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
annual_cycle_asymmetry (DataArray) – {stat} of the annual cycle. [yr]
- xclim.sdba.properties.annual_cycle_maximum(da='da', *, window=31, group='time', ds=None)
Annual cycle statistics. (realm: generic)
A daily climatology is calculated and optionnaly smoothed with a (circular) moving average. The requested statistic is returned.
Based on indice
_annual_cycle()
. With injected parameters: stat=max.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
window (number) – Size of the window for the moving average filtering. Deactivate this feature by passing window = 1. Default : 31.
group (Any) – Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
annual_cycle_maximum (DataArray) – {stat} of the annual cycle., with additional attributes: cell_methods: time: mean time: max
- xclim.sdba.properties.annual_cycle_minimum(da='da', *, window=31, group='time', ds=None)
Annual cycle statistics. (realm: generic)
A daily climatology is calculated and optionnaly smoothed with a (circular) moving average. The requested statistic is returned.
Based on indice
_annual_cycle()
. With injected parameters: stat=min.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
window (number) – Size of the window for the moving average filtering. Deactivate this feature by passing window = 1. Default : 31.
group (Any) – Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
annual_cycle_minimum (DataArray) – {stat} of the annual cycle., with additional attributes: cell_methods: time: mean time: min
- xclim.sdba.properties.annual_cycle_phase(da='da', *, window=31, group='time', ds=None)
Annual cycle statistics. (realm: generic)
A daily climatology is calculated and optionnaly smoothed with a (circular) moving average. The requested statistic is returned.
Based on indice
_annual_cycle()
. With injected parameters: stat=phase.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
window (number) – Size of the window for the moving average filtering. Deactivate this feature by passing window = 1. Default : 31.
group (Any) – Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
annual_cycle_phase (DataArray) – {stat} of the annual cycle., with additional attributes: cell_methods: time: range
- xclim.sdba.properties.corr_btw_var(da1='da1', da2='da2', *, corr_type='Spearman', output='correlation', group='time', ds=None)
Correlation between two variables. (realm: generic)
Spearman or Pearson correlation coefficient between two variables at the time resolution.
Based on indice
_corr_btw_var()
.- Parameters:
da1 (str or DataArray) – First variable on which to calculate the diagnostic. Default : ds.da1.
da2 (str or DataArray) – Second variable on which to calculate the diagnostic. Default : ds.da2.
corr_type ({‘Pearson’, ‘Spearman’}) – Type of correlation to calculate. Default : Spearman.
output ({‘correlation’, ‘pvalue’}) – Wheter to return the correlation coefficient or the p-value. Default : correlation.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping of the output. Eg. For ‘time.month’, the correlation would be calculated on each month separately, but with all the years together. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
corr_btw_var (DataArray) – {corr_type} correlation coefficient
- xclim.sdba.properties.decorrelation_length(da='da', *, radius=300, thresh=0.5, dims=None, bins=100, group='time', ds=None)
Decorrelation length. (realm: generic)
Distance from a grid cell where the correlation with its neighbours goes below the threshold. A correlogram is calculated for each grid cell following the method from
xclim.sdba.properties.spatial_correlogram
. Then, we find the first bin closest to the correlation threshold.Based on indice
_decorrelation_length()
.- Parameters:
da (str or DataArray) – Data. Default : ds.da.
radius (Any) – Radius (in km) defining the region where correlations will be calculated between a point and its neighbours. Default : 300.
thresh (Any) – Threshold correlation defining decorrelation. The decorrelation length is defined as the center of the distance bin that has a correlation closest to this threshold. Default : 0.5.
dims (Any) – Name of the spatial dimensions. Once these are stacked, the longitude and latitude coordinates must be 1D. Default : None.
bins (Any) – Same as argument bins from
scipy.stats.binned_statistic()
. If given as a scalar, the equal-width bin limits from 0 to radius are generated here (instead of letting scipy do it) to improve performance. Default : 100.group (Any) – Useless for now. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
decorrelation_length (DataArray) – Decorrelation length.
Notes
Calculating this property requires a lot of memory. It will not work with large datasets.
- xclim.sdba.properties.first_eof(da='da', *, dims=None, kind='+', thresh='1 mm/d', group='time', ds=None)
First Empirical Orthogonal Function. (realm: generic)
Through principal component analysis (PCA), compute the predominant empirical orthogonal function. The temporal dimension is reduced. The Eof is multiplied by the sign of its mean to ensure coherent signs as much as possible. Needs the eofs package to run. Based on an idea from [Vrac, 2018], using an implementation from [Dawson, 2016].
Based on indice
_first_eof()
.- Parameters:
da (str or DataArray) – Data. Default : ds.da.
dims (Any) – Name of the spatial dimensions. If None (default), all dimensions except “time” are used. Default : None.
kind ({’’, ‘+’}*) – Variable “kind”. If multiplicative, the zero values are set to very small values and the PCA is performed over the logarithm of the data. Default : +.
thresh (Any) – If kind is multiplicative, this is the “zero” threshold passed to
xclim.sdba.processing.jitter_under_thresh()
. Default : 1 mm/d.group (Any) – Useless for now. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
first_eof (DataArray) – First empirical orthogonal function
- xclim.sdba.properties.mean(da='da', *, group='time', ds=None)
Mean. (realm: generic)
Mean over all years at the time resolution.
Based on indice
_mean()
.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping of the output. E.g. If ‘time.month’, the temporal average is performed separately for each month. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
mean (DataArray) – Mean of the variable., with additional attributes: cell_methods: time: mean
- xclim.sdba.properties.mean_annual_phase(da='da', *, window=31, group='time', ds=None)
Annual range statistics. (realm: generic)
Compute a statistic on each year of data and return the interannual average. This is similar to the annual cycle, but with the statistic and average operations inverted.
Based on indice
_annual_statistic()
. With injected parameters: stat=phase.- Parameters:
da (str or DataArray) – Data. Default : ds.da.
window (number) – Size of the window for the moving average filtering. Deactivate this feature by passing window = 1. Default : 31.
group (Any) – Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
mean_annual_phase (DataArray) – Average annual {stat}.
- xclim.sdba.properties.mean_annual_range(da='da', *, window=31, group='time', ds=None)
Annual range statistics. (realm: generic)
Compute a statistic on each year of data and return the interannual average. This is similar to the annual cycle, but with the statistic and average operations inverted.
Based on indice
_annual_statistic()
. With injected parameters: stat=absamp.- Parameters:
da (str or DataArray) – Data. Default : ds.da.
window (number) – Size of the window for the moving average filtering. Deactivate this feature by passing window = 1. Default : 31.
group (Any) – Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
mean_annual_range (DataArray) – Average annual {stat}.
- xclim.sdba.properties.mean_annual_relative_range(da='da', *, window=31, group='time', ds=None)
Annual range statistics. (realm: generic)
Compute a statistic on each year of data and return the interannual average. This is similar to the annual cycle, but with the statistic and average operations inverted.
Based on indice
_annual_statistic()
. With injected parameters: stat=relamp.- Parameters:
da (str or DataArray) – Data. Default : ds.da.
window (number) – Size of the window for the moving average filtering. Deactivate this feature by passing window = 1. Default : 31.
group (Any) – Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
mean_annual_relative_range (DataArray) – Average annual {stat}. [%]
- xclim.sdba.properties.quantile(da='da', *, q=0.98, group='time', ds=None)
Quantile. (realm: generic)
Returns the quantile q of the distribution of the variable over all years at the time resolution.
Based on indice
_quantile()
.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
q (number) – Quantile to be calculated. Should be between 0 and 1. Default : 0.98.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping of the output. E.g. If ‘time.month’, the quantile is computed separately for each month. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
quantile (DataArray) – Quantile {q} of the variable.
- xclim.sdba.properties.relative_annual_cycle_amplitude(da='da', *, window=31, group='time', ds=None)
Annual cycle statistics. (realm: generic)
A daily climatology is calculated and optionnaly smoothed with a (circular) moving average. The requested statistic is returned.
Based on indice
_annual_cycle()
. With injected parameters: stat=relamp.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
window (number) – Size of the window for the moving average filtering. Deactivate this feature by passing window = 1. Default : 31.
group (Any) – Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
relative_annual_cycle_amplitude (DataArray) – {stat} of the annual cycle. [%], with additional attributes: cell_methods: time: mean time: range
- xclim.sdba.properties.relative_frequency(da='da', *, op='>=', thresh='1 mm d-1', group='time', ds=None)
Relative Frequency. (realm: generic)
Relative Frequency of days with variable respecting a condition (defined by an operation and a threshold) at the time resolution. The relative frequency is the number of days that satisfy the condition divided by the total number of days.
Based on indice
_relative_frequency()
.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
op ({‘<’, ‘>’, ‘>=’, ‘<=’}) – Operation to verify the condition. The condition is variable {op} threshold. Default : >=.
thresh (str) – Threshold on which to evaluate the condition. Default : 1 mm d-1.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping on the output. Eg. For ‘time.month’, the relative frequency would be calculated on each month, with all years included. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
relative_frequency (DataArray) – Relative frequency of values {op} {thresh}.
- xclim.sdba.properties.return_value(da='da', *, period=20, op='max', method='ML', group='time', ds=None)
Return value. (realm: generic)
Return the value corresponding to a return period. On average, the return value will be exceeded (or not exceed for op=’min’) every return period (e.g. 20 years). The return value is computed by first extracting the variable annual maxima/minima, fitting a statistical distribution to the maxima/minima, then estimating the percentile associated with the return period (eg. 95th percentile (1/20) for 20 years)
Based on indice
_return_value()
.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
period (number) – Return period. Number of years over which to check if the value is exceeded (or not for op=’min’). Default : 20.
op ({‘min’, ‘max’}) – Whether we are looking for a probability of exceedance (‘max’, right side of the distribution) or a probability of non-exceedance (min, left side of the distribution). Default : max.
method ({‘PWM’, ‘ML’}) – Fitting method, either maximum likelihood (ML) or probability weighted moments (PWM), also called L-Moments. The PWM method is usually more robust to outliers. Default : ML.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping of the output. A distribution of the extremes is done for each group. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
return_value (DataArray) – {period}-{group.prop_name} {op} return level of the variable.
- xclim.sdba.properties.skewness(da='da', *, group='time', ds=None)
Skewness. (realm: generic)
Skewness of the distribution of the variable over all years at the time resolution.
Based on indice
_skewness()
.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping of the output. E.g. If ‘time.month’, the skewness is performed separately for each month. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
skewness (DataArray) – Skewness of the variable.
- xclim.sdba.properties.spatial_correlogram(da='da', *, dims=None, bins=100, group='time', method=1, ds=None)
Spatial correlogram. (realm: generic)
Compute the pairwise spatial correlations (Spearman) and averages them based on the pairwise distances. This collapses the spatial and temporal dimensions and returns a distance bins dimension. Needs coordinates for longitude and latitude. This property is heavy to compute and it will need to create a NxN array in memory (outside of dask), where N is the number of spatial points. There are shortcuts for all-nan time-slices or spatial points, but scipy’s nan-omitting algorithm is extremely slow, so the presence of any lone NaN will increase the computation time. Based on an idea from [François et al., 2020].
Based on indice
_spatial_correlogram()
.- Parameters:
da (str or DataArray) – Data. Default : ds.da.
dims (Any) – Name of the spatial dimensions. Once these are stacked, the longitude and latitude coordinates must be 1D. Default : None.
bins (Any) – Same as argument bins from
xarray.DataArray.groupby_bins()
. If given as a scalar, the equal-width bin limits are generated here (instead of letting xarray do it) to improve performance. Default : 100.group (Any) – Useless for now. Default : time.
method (Any) – Default : 1.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
spatial_correlogram (DataArray) – Inter-site correlogram as a function of distance.
- xclim.sdba.properties.spell_length_distribution(da='da', *, method='amount', op='>=', thresh='1 mm d-1', stat='mean', group='time', resample_before_rl=True, ds=None)
Spell length distribution. (realm: generic)
Statistic of spell length distribution when the variable respects a condition (defined by an operation, a method and a threshold).
Based on indice
_spell_length_distribution()
.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
method ({‘quantile’, ‘amount’}) – Method to choose the threshold. ‘amount’: The threshold is directly the quantity in {thresh}. It needs to have the same units as {da}. ‘quantile’: The threshold is calculated as the quantile {thresh} of the distribution. Default : amount.
op ({‘<’, ‘>’, ‘>=’, ‘<=’}) – Operation to verify the condition for a spell. The condition for a spell is variable {op} threshold. Default : >=.
thresh (str) – Threshold on which to evaluate the condition to have a spell. Str with units if the method is “amount”. Float of the quantile if the method is “quantile”. Default : 1 mm d-1.
stat ({‘min’, ‘mean’, ‘max’}) – Statistics to apply to the resampled input at the {group} (e.g. 1-31 Jan 1980) and then over all years (e.g. Jan 1980-2010) Default : mean.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping of the output. E.g. If ‘time.month’, the spell lengths are coputed separately for each month. Default : time.
resample_before_rl (boolean) – Determines if the resampling should take place before or after the run length encoding (or a similar algorithm) is applied to runs. Default : True.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
spell_length_distribution (DataArray) – {stat} of spell length distribution when the variable is {op} the {method} {thresh}.
- xclim.sdba.properties.transition_probability(da='da', *, initial_op='>=', final_op='>=', thresh='1 mm d-1', group='time', ds=None)
Transition probability. (realm: generic)
Probability of transition from the initial state to the final state. The states are booleans comparing the value of the day to the threshold with the operator.
Based on indice
_transition_probability()
.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
initial_op ({‘eq’, ‘<=’, ‘<’, ‘>’, ‘ge’, ‘==’, ‘le’, ‘lt’, ‘!=’, ‘ne’, ‘>=’, ‘gt’}) – Operation to verify the condition for the initial state. The condition is variable {op} threshold. Default : >=.
final_op ({‘eq’, ‘<=’, ‘<’, ‘>’, ‘ge’, ‘==’, ‘le’, ‘lt’, ‘!=’, ‘ne’, ‘>=’, ‘gt’}) – Operation to verify the condition for the final state. The condition is variable {op} threshold. Default : >=.
thresh (str) – Threshold on which to evaluate the condition. Default : 1 mm d-1.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping on the output. e.g. For “time.month”, the transition probability would be calculated on each month, with all years included. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
transition_probability (DataArray) – Transition probability of values {initial_op} {thresh} to values {final_op} {thresh}.
- xclim.sdba.properties.trend(da='da', *, output='slope', group='time', ds=None)
Linear Trend. (realm: generic)
The data is averaged over each time resolution and the inter-annual trend is returned. This function will rechunk along the grouping dimension.
Based on indice
_trend()
.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
output ({‘slope’, ‘pvalue’}) – Attributes of the linear regression to return. ‘slope’ is the slope of the regression line. ‘pvalue’ is for a hypothesis test whose null hypothesis is that the slope is zero, using Wald Test with t-distribution of the test statistic. Default : slope.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping on the output. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
trend (DataArray) – {output} of the interannual linear trend.
- xclim.sdba.properties.var(da='da', *, group='time', ds=None)
Variance. (realm: generic)
Variance of the variable over all years at the time resolution.
Based on indice
_var()
.- Parameters:
da (str or DataArray) – Variable on which to calculate the diagnostic. Default : ds.da.
group ({‘time’, ‘time.season’, ‘time.month’}) – Grouping of the output. E.g. If ‘time.month’, the variance is performed separately for each month. Default : time.
ds (Dataset, optional) – A dataset with the variables given by name. Default : None.
- Returns:
var (DataArray) – Variance of the variable., with additional attributes: cell_methods: time: var
xclim.sdba.utils module
Statistical Downscaling and Bias Adjustment Utilities
- xclim.sdba.utils._pairwise_spearman(da, dims)[source]
Area-averaged pairwise temporal correlation.
With skipna-shortcuts for cases where all times or all points are NaN.
- xclim.sdba.utils.add_cyclic_bounds(da, att, cyclic_coords=True)[source]
Reindex an array to include the last slice at the beginning and the first at the end.
This is done to allow interpolation near the end-points.
- Parameters:
da (xr.DataArray or xr.Dataset) – An array
att (str) – The name of the coordinate to make cyclic
cyclic_coords (bool) – If True, the coordinates are made cyclic as well, if False, the new values are guessed using the same step as their neighbour.
- Return type:
DataArray
|Dataset
- Returns:
xr.DataArray or xr.Dataset – da but with the last element along att prepended and the last one appended.
- xclim.sdba.utils.apply_correction(x, factor, kind=None)[source]
Apply the additive or multiplicative correction/adjustment factors.
If kind is not given, default to the one stored in the “kind” attribute of factor.
- Return type:
DataArray
- xclim.sdba.utils.best_pc_orientation_full(R, Hinv, Rmean, Hmean, hist)[source]
Return best orientation vector for A according to the method of Alavoine and Grenier [2022].
Eigenvectors returned by pc_matrix do not have a defined orientation. Given an inverse transform Hinv, a transform R, the actual and target origins Hmean and Rmean and the matrix of training observations hist, this computes a scenario for all possible orientations and return the orientation that maximizes the Spearman correlation coefficient of all variables. The correlation is computed for each variable individually, then averaged.
This trick is explained in Alavoine and Grenier [2022]. See docstring of
sdba.adjustment.PrincipalComponentAdjustment()
.- Parameters:
R (np.ndarray) – MxM Matrix defining the final transformation.
Hinv (np.ndarray) – MxM Matrix defining the (inverse) first transformation.
Rmean (np.ndarray) – M vector defining the target distribution center point.
Hmean (np.ndarray) – M vector defining the original distribution center point.
hist (np.ndarray) – MxN matrix of all training observations of the M variables/sites.
- Return type:
- Returns:
np.ndarray – M vector of orientation correction (1 or -1).
References
Alavoine and Grenier [2022]
- xclim.sdba.utils.best_pc_orientation_simple(R, Hinv, val=1000)[source]
Return best orientation vector according to a simple test.
Eigenvectors returned by pc_matrix do not have a defined orientation. Given an inverse transform Hinv and a transform R, this returns the orientation minimizing the projected distance for a test point far from the origin.
This trick is inspired by the one exposed in Hnilica et al. [2017]. For each possible orientation vector, the test point is reprojected and the distance from the original point is computed. The orientation minimizing that distance is chosen.
- Parameters:
R (np.ndarray) – MxM Matrix defining the final transformation.
Hinv (np.ndarray) – MxM Matrix defining the (inverse) first transformation.
val (float) – The coordinate of the test point (same for all axes). It should be much greater than the largest furthest point in the array used to define B.
- Return type:
- Returns:
np.ndarray – Mx1 vector of orientation correction (1 or -1).
See also
sdba.adjustment.PrincipalComponentAdjustment
References
Hnilica, Hanel, and Puš [2017]
- xclim.sdba.utils.broadcast(grouped, x, *, group='time', interp='nearest', sel=None)[source]
Broadcast a grouped array back to the same shape as a given array.
- Parameters:
grouped (xr.DataArray) – The grouped array to broadcast like x.
x (xr.DataArray) – The array to broadcast grouped to.
group (str or Grouper) – Grouping information. See
xclim.sdba.base.Grouper
for details.interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use,
sel (dict[str, xr.DataArray]) – Mapping of grouped coordinates to x coordinates (other than the grouping one).
- Return type:
DataArray
- Returns:
xr.DataArray
- xclim.sdba.utils.copy_all_attrs(ds, ref)[source]
Copy all attributes of ds to ref, including attributes of shared coordinates, and variables in the case of Datasets.
- xclim.sdba.utils.ecdf(x, value, dim='time')[source]
Return the empirical CDF of a sample at a given value.
- Parameters:
x (array) – Sample.
value (float) – The value within the support of x for which to compute the CDF value.
dim (str) – Dimension name.
- Return type:
DataArray
- Returns:
xr.DataArray – Empirical CDF.
- xclim.sdba.utils.ensure_longest_doy(func)[source]
Ensure that selected day is the longest day of year for x and y dims.
- Return type:
Callable
- xclim.sdba.utils.equally_spaced_nodes(n, eps=None)[source]
Return nodes with n equally spaced points within [0, 1], optionally adding two end-points.
- Parameters:
n (int) – Number of equally spaced nodes.
eps (float, optional) – Distance from 0 and 1 of added end nodes. If None (default), do not add endpoints.
- Return type:
array
- Returns:
np.array – Nodes between 0 and 1. Nodes can be seen as the middle points of n equal bins.
Warning
Passing a small eps will effectively clip the scenario to the bounds of the reference on the historical period in most cases. With normal quantile mapping algorithms, this can give strange result when the reference does not show as many extremes as the simulation does.
Notes
For n=4, eps=0 : 0—x——x——x——x—1
- xclim.sdba.utils.get_clusters(data, u1, u2, dim='time')[source]
Get cluster count, maximum and position along a given dim.
See get_clusters_1d. Used by adjustment.ExtremeValues.
- Parameters:
data (1D ndarray) – Values to get clusters from.
u1 (float) – Extreme value threshold, at least one value in the cluster must exceed this.
u2 (float) – Cluster threshold, values above this can be part of a cluster.
dim (str) – Dimension name.
- Return type:
Dataset
- Returns:
xr.Dataset –
- With variables,
nclusters : Number of clusters for each point (with dim reduced), int
start : First index in the cluster (dim reduced, new cluster), int
end : Last index in the cluster, inclusive (dim reduced, new cluster), int
maxpos : Index of the maximal value within the cluster (dim reduced, new cluster), int
maximum : Maximal value within the cluster (dim reduced, new cluster), same dtype as data.
For start, end and maxpos, -1 means NaN and should always correspond to a NaN in maximum. The length along cluster is half the size of “dim”, the maximal theoretical number of clusters.
- xclim.sdba.utils.get_clusters_1d(data, u1, u2)[source]
Get clusters of a 1D array.
A cluster is defined as a sequence of values larger than u2 with at least one value larger than u1.
- Parameters:
data (1D ndarray) – Values to get clusters from.
u1 (float) – Extreme value threshold, at least one value in the cluster must exceed this.
u2 (float) – Cluster threshold, values above this can be part of a cluster.
- Return type:
tuple
[array
,array
,array
,array
]- Returns:
(np.array, np.array, np.array, np.array)
References
getcluster of Extremes.jl (Jalbert [2022]).
- xclim.sdba.utils.get_correction(x, y, kind)[source]
Return the additive or multiplicative correction/adjustment factors.
- Return type:
DataArray
- xclim.sdba.utils.interp_on_quantiles(newx, xq, yq, *, group='time', method='linear', extrapolation='constant')[source]
Interpolate values of yq on new values of x.
Interpolate in 2D with
griddata()
if grouping is used, in 1D otherwise, withinterp1d
. Any NaNs in xq or yq are removed from the input map. Similarly, NaNs in newx are left NaNs.- Parameters:
newx (xr.DataArray) – The values at which to evaluate yq. If group has group information, new should have a coordinate with the same name as the group name In that case, 2D interpolation is used.
xq, yq (xr.DataArray) – Coordinates and values on which to interpolate. The interpolation is done along the “quantiles” dimension if group has no group information. If it does, interpolation is done in 2D on “quantiles” and on the group dimension.
group (str or Grouper) – The dimension and grouping information. (ex: “time” or “time.month”). Defaults to “time”.
method ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method.
extrapolation ({‘constant’, ‘nan’}) – The extrapolation method used for values of newx outside the range of xq. See notes.
Notes
Extrapolation methods:
‘nan’ : Any value of newx outside the range of xq is set to NaN.
‘constant’ : Values of newx smaller than the minimum of xq are set to the first value of yq and those larger than the maximum, set to the last one (first and last non-nan values along the “quantiles” dimension). When the grouping is “time.month”, these limits are linearly interpolated along the month dimension.
- xclim.sdba.utils.invert(x, kind=None)[source]
Invert a DataArray either by addition (-x) or by multiplication (1/x).
If kind is not given, default to the one stored in the “kind” attribute of x.
- Return type:
DataArray
- xclim.sdba.utils.map_cdf(ds, *, y_value, dim)[source]
Return the value in x with the same CDF as y_value in y.
This function is meant to be wrapped in a Grouper.apply.
- Parameters:
ds (xr.Dataset) – Variables: x, Values from which to pick, y, Reference values giving the ranking
y_value (float, array) – Value within the support of y.
dim (str) – Dimension along which to compute quantile.
- Returns:
array – Quantile of x with the same CDF as y_value in y.
- xclim.sdba.utils.map_cdf_1d(x, y, y_value)[source]
Return the value in x with the same CDF as y_value in y.
- xclim.sdba.utils.pc_matrix(arr)[source]
Construct a Principal Component matrix.
This matrix can be used to transform points in arr to principal components coordinates. Note that this function does not manage NaNs; if a single observation is null, all elements of the transformation matrix involving that variable will be NaN.
- Parameters:
arr (numpy.ndarray or dask.array.Array) – 2D array (M, N) of the M coordinates of N points.
- Return type:
ndarray
|Array
- Returns:
numpy.ndarray or dask.array.Array – MxM Array of the same type as arr.
- xclim.sdba.utils.rand_rot_matrix(crd, num=1, new_dim=None)[source]
Generate random rotation matrices.
Rotation matrices are members of the SO(n) group, where n is the matrix size (crd.size). They can be characterized as orthogonal matrices with determinant 1. A square matrix \(R\) is a rotation matrix if and only if \(R^t = R^{−1}\) and \(\mathrm{det} R = 1\).
- Parameters:
crd (xr.DataArray) – 1D coordinate DataArray along which the rotation occurs. The output will be square with the same coordinate replicated, the second renamed to new_dim.
num (int) – If larger than 1 (default), the number of matrices to generate, stacked along a “matrices” dimension.
new_dim (str) – Name of the new “prime” dimension, defaults to the same name as crd + “_prime”.
- Return type:
DataArray
- Returns:
xr.DataArray – float, NxN if num = 1, numxNxN otherwise, where N is the length of crd.
References
Mezzadri [2007]
- xclim.sdba.utils.rank(da, dim='time', pct=False)[source]
Ranks data along a dimension.
Replicates xr.DataArray.rank but as a function usable in a Grouper.apply(). Xarray’s docstring is below:
Equal values are assigned a rank that is the average of the ranks that would have been otherwise assigned to all the values within that set. Ranks begin at 1, not 0. If pct, computes percentage ranks.
- Parameters:
da (xr.DataArray) – Source array.
dim (str, hashable) – Dimension over which to compute rank.
pct (bool, optional) – If True, compute percentage ranks, otherwise compute integer ranks.
- Return type:
DataArray
- Returns:
DataArray – DataArray with the same coordinates and dtype ‘float64’.
Notes
The bottleneck library is required. NaNs in the input array are returned as NaNs.