Bias adjustment and downscaling algorithms
xarray data structures allow for relatively straightforward implementations of simple bias-adjustment and downscaling algorithms documented in Adjustment methods. Each algorithm is split into train and adjust components. The train function will compare two DataArrays x and y, and create a dataset storing the transfer information allowing to go from x to y. This dataset, stored in the adjustment object, can then be used by the adjust method to apply this information to x. x could be the same DataArray used for training, or another DataArray with similar characteristics.
For example, given a daily time series of observations ref, a model simulation over the observational period hist and a model simulation over a future period sim, we would apply a bias-adjustment method such as detrended quantile mapping (DQM) as:
from xclim import sdba
dqm = sdba.adjustment.DetrendedQuantileMapping.train(ref, hist)
scen = dqm.adjust(sim)
Most method can either be applied additively or multiplicatively. Also, most methods can be applied independently on different time groupings (monthly, seasonally) or according to the day of the year and a rolling window width.
When transfer factors are applied in adjustment, they can be interpolated according to the time grouping. This helps avoid discontinuities in adjustment factors at the beginning of each season or month and is computationaly cheaper than computing adjustment factors for each day of the year. (Currently only implemented for monthly grouping)
Application in multivariate settings
When applying univariate adjustment methods to multiple variables, some strategies are recommended to avoid introducing unrealistic artifacts in adjusted outputs.
Minimum and maximum temperature
When adjusting both minimum and maximum temperature, adjustment factors sometimes yield minimum temperatures larger than the maximum temperature on the same day, which of course, is non-sensical. One way to avoid this is to first adjust maximum temperature using an additive adjustment, then adjust the diurnal temperature range (DTR) using a multiplicative adjustment, and then determine minimum temperature by subtracting DTR from the maximum temperature ([Thrasher], [Agbazo])
Relative and specific humidity
When adjusting both relative and specific humidity, we want to preserve the relationship between both. To do this, [Grenier] suggests to first adjust the relative humidity using a multiplicative factor, ensure values are within 0-100%, then apply an additive adjustment factor to the surface pressure before estimating the specific humidity from thermodynamic relationships.
Radiation and precipitation
In theory, short wave radiation should be capped when precipitation is not zero, but there is as of yet no mechanism proposed to do that, see [Hoffman].
References
- Agbazo
Agbazo, M. N., & Grenier, P. (2019). Characterizing and avoiding physical inconsistency generated by the application of univariate quantile mapping on daily minimum and maximum temperatures over Hudson Bay. International Journal of Climatology, joc.6432. https://doi.org/10.1002/joc.6432
- Grenier
Grenier, P. (2018). Two Types of Physical Inconsistency to Avoid with Univariate Quantile Mapping: A Case Study over North America Concerning Relative Humidity and Its Parent Variables. Journal of Applied Meteorology and Climatology, 57(2), 347–364. https://doi.org/10.1175/JAMC-D-17-0177.1
- Hoffman
Hoffmann, H., & Rath, T. (2012). Meteorologically consistent bias correction of climate time series for agricultural models. Theoretical and Applied Climatology, 110(1–2), 129–141. https://doi.org/10.1007/s00704-012-0618-x
- Thrasher
Thrasher, B., Maurer, E. P., McKellar, C., & Duffy, P. B. (2012). Technical Note: Bias correcting climate model simulated daily temperature extremes with quantile mapping. Hydrology and Earth System Sciences, 16(9), 3309–3314. https://doi.org/10.5194/hess-16-3309-2012
SDBA’s user API
Adjustment methods
- class xclim.sdba.adjustment.DetrendedQuantileMapping(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.TrainAdjust
Detrended Quantile Mapping bias-adjustment.
The algorithm follows these steps, 1-3 being the ‘train’ and 4-6, the ‘adjust’ steps.
A scaling factor that would make the mean of hist match the mean of ref is computed.
ref and hist are normalized by removing the “dayofyear” mean.
Adjustment factors are computed between the quantiles of the normalized ref and hist.
sim is corrected by the scaling factor, and either normalized by “dayofyear” and detrended group-wise or directly detrended per “dayofyear”, using a linear fit (modifiable).
Values of detrended sim are matched to the corresponding quantiles of normalized hist and corrected accordingly.
The trend is put back on the result.
\[F^{-1}_{ref}\left\{F_{hist}\left[\frac{\overline{hist}\cdot sim}{\overline{sim}}\right]\right\}\frac{\overline{sim}}{\overline{hist}}\]where \(F\) is the cumulative distribution function (CDF) and \(\overline{xyz}\) is the linear trend of the data. This equation is valid for multiplicative adjustment. Based on the DQM method of [Cannon2015].
- Parameters
Train step
nquantiles (int or 1d array of floats) – The number of quantiles to use. Two endpoints at 1e-6 and 1 - 1e-6 will be added. An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles (+2 endpoints).
kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning an single adjustment group along dimension “time”.Adjust step
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “nearest”.
detrend (int or BaseDetrend instance) – The method to use when detrending. If an int is passed, it is understood as a PolyDetrend (polynomial detrending) degree. Defaults to 1 (linear detrending)
extrapolation ({‘constant’, ‘nan’}) – The type of extrapolation to use. See
xclim.sdba.utils.extrapolate_qm()
for details. Defaults to “constant”.
References
- Cannon2015(1,2)
Cannon, A. J., Sobie, S. R., & Murdock, T. Q. (2015). Bias correction of GCM precipitation by quantile mapping: How well do methods preserve changes in quantiles and extremes? Journal of Climate, 28(17), 6938–6959. https://doi.org/10.1175/JCLI-D-14-00754.1
- class xclim.sdba.adjustment.EmpiricalQuantileMapping(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.TrainAdjust
Empirical Quantile Mapping bias-adjustment.
Adjustment factors are computed between the quantiles of ref and sim. Values of sim are matched to the corresponding quantiles of hist and corrected accordingly.
\[F^{-1}_{ref} (F_{hist}(sim))\]where \(F\) is the cumulative distribution function (CDF) and mod stands for model data.
- Parameters
Train step
nquantiles (int or 1d array of floats) – The number of quantiles to use. Two endpoints at 1e-6 and 1 - 1e-6 will be added. An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles (+2 endpoints).
kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning an single adjustment group along dimension “time”.Adjust step
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “nearset”.
extrapolation ({‘constant’, ‘nan’}) – The type of extrapolation to use. See
xclim.sdba.utils.extrapolate_qm()
for details. Defaults to “constant”.
References
Dequé, M. (2007). Frequency of precipitation and temperature extremes over France in an anthropogenic scenario: Model results and statistical correction according to observed values. Global and Planetary Change, 57(1–2), 16–26. https://doi.org/10.1016/j.gloplacha.2006.11.030
_allow_diff_calendars = False
- class xclim.sdba.adjustment.ExtremeValues(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.TrainAdjust
Adjustment correction for extreme values.
The tail of the distribution of adjusted data is corrected according to the bias between the parametric Generalized Pareto distributions of the simulatated and reference data, [RRJF2021]. The distributions are composed of the maximal values of clusters of “large” values. With “large” values being those above cluster_thresh. Only extreme values, whose quantile within the pool of large values are above q_thresh, are re-adjusted. See Notes.
This adjustment method should be considered experimental and used with care.
- Parameters
Train step
cluster_thresh (Quantity (str with units)) – The threshold value for defining clusters.
q_thresh (float) – The quantile of “extreme” values, [0, 1[. Defaults to 0.95.
ref_params (xr.DataArray, optional) – Distribution parameters to use instead of fitting a GenPareto distribution on ref.
Adjust step
scen (DataArray) – This is a second-order adjustment, so the adjust method needs the first-order adjusted timeseries in addition to the raw “sim”.
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “linear”.
extrapolation ({‘constant’, ‘nan’}) – The type of extrapolation to use. See
extrapolate_qm()
for details. Defaults to “constant”.frac (float) – Fraction where the cutoff happens between the original scen and the corrected one. See Notes, ]0, 1]. Defaults to 0.25.
power (float) – Shape of the correction strength, see Notes. Defaults to 1.0.
Notes
Extreme values are extracted from ref, hist and sim by finding all “clusters”, i.e. runs of consecutive values above cluster_thresh. The q_thresh`th percentile of these values is taken on `ref and hist and becomes thresh, the extreme value threshold. The maximal value of each cluster, if it exceeds that new threshold, is taken and Generalized Pareto distributions are fitted to them, for both ref and hist. The probabilities associated with each of these extremes in hist is used to find the corresponding value according to ref’s distribution. Adjustment factors are computed as the bias between those new extremes and the original ones.
In the adjust step, a Generalized Pareto distributions is fitted on the cluster-maximums of sim and it is used to associate a probability to each extreme, values over the thresh compute in the training, without the clustering. The adjustment factors are computed by interpolating the trained ones using these probabilities and the probabilities computed from hist.
Finally, the adjusted values (\(C_i\)) are mixed with the pre-adjusted ones (scen, \(D_i\)) using the following transition function:
\[V_i = C_i * \tau + D_i * (1 - \tau)\]Where \(\tau\) is a function of sim’s extreme values (unadjusted, \(S_i\)) and of arguments
frac
(\(f\)) andpower
(\(p\)):\[\tau = \left(\frac{1}{f}\frac{S - min(S)}{max(S) - min(S)}\right)^p\]Code based on an internal Matlab source and partly ib the biascorrect_extremes function of the julia package [ClimateTools].
Because of limitations imposed by the lazy computing nature of the dask backend, it is not possible to know the number of cluster extremes in ref and hist at the moment the output data structure is created. This is why the code tries to estimate that number and usually overstimates it. In the training dataset, this translated into a quantile dimension that is too large and variables af and px_hist are assigned NaNs on extra elements. This has no incidence on the calculations themselves but requires more memory than is useful.
References
- RRJF2021
Roy, P., Rondeau-Genesse, G., Jalbert, J., Fournier, É. 2021. Climate Scenarios of Extreme Precipitation Using a Combination of Parametric and Non-Parametric Bias Correction Methods. Submitted to Climate Services, April 2021.
- ClimateTools
- class xclim.sdba.adjustment.LOCI(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.TrainAdjust
Local Intensity Scaling (LOCI) bias-adjustment.
This bias adjustment method is designed to correct daily precipitation time series by considering wet and dry days separately ([Schmidli2006]).
Multiplicative adjustment factors are computed such that the mean of hist matches the mean of ref for values above a threshold.
The threshold on the training target ref is first mapped to hist by finding the quantile in hist having the same exceedance probability as thresh in ref. The adjustment factor is then given by
\[s = \frac{\left \langle ref: ref \geq t_{ref} \right\rangle - t_{ref}}{\left \langle hist : hist \geq t_{hist} \right\rangle - t_{hist}}\]In the case of precipitations, the adjustment factor is the ratio of wet-days intensity.
For an adjustment factor s, the bias-adjustment of sim is:
\[sim(t) = \max\left(t_{ref} + s \cdot (hist(t) - t_{hist}), 0\right)\]- Parameters
Train step
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning an single adjustment group along dimension “time”.thresh (str) – The threshold in ref above which the values are scaled.
Adjust step
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use then interpolating the adjustment factors. Defaults to “linear”.
References
- Schmidli2006
Schmidli, J., Frei, C., & Vidale, P. L. (2006). Downscaling from GCM precipitation: A benchmark for dynamical and statistical downscaling methods. International Journal of Climatology, 26(5), 679–689. DOI:10.1002/joc.1287
- class xclim.sdba.adjustment.NpdfTransform(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.Adjust
N-dimensional probability density function transform.
This adjustment object combines both training and adjustent steps in the adjust class method.
A multivariate bias-adjustment algorithm described by [Cannon18], as part of the MBCn algorithm, based on a color-correction algorithm described by [Pitie05].
This algorithm in itself, when used with QuantileDeltaMapping, is NOT trend-preserving. The full MBCn algorithm includes a reordering step provided here by
xclim.sdba.processing.reordering()
.See notes for an explanation of the algorithm.
- Parameters
base (BaseAdjustment) – An univariate bias-adjustment class. This is untested for anything else than QuantileDeltaMapping.
base_kws (dict, optional) – Arguments passed to the training of the univariate adjustment.
n_escore (int) – The number of elements to send to the escore function. The default, 0, means all elements are included. Pass -1 to skip computing the escore completely. Small numbers result in less significative scores, but the execution time goes up quickly with large values.
n_iter (int) – The number of iterations to perform. Defaults to 20.
pts_dim (str) – The name of the “multivariate” dimension. Defaults to “variables”, which is the normal case when using
xclim.sdba.base.stack_variables()
.adj_kws (dict, optional) – Dictionary of arguments to pass to the adjust method of the univariate adjustment.
rot_matrices (xr.DataArray, optional) – The rotation matrices as a 3D array (‘iterations’, <pts_dim>, <anything>), with shape (n_iter, <N>, <N>). If left empty, random rotation matrices will be automatically generated.
Notes
The historical reference (\(T\), for “target”), simulated historical (\(H\)) and simulated projected (\(S\)) datasets are constructed by stacking the timeseries of N variables together. The algoriths goes into the following steps:
Rotate the datasets in the N-dimensional variable space with \(\mathbf{R}\), a random rotation NxN matrix.
..math
\tilde{\mathbf{T}} = \mathbf{T}\mathbf{R} \\ \tilde{\mathbf{H}} = \mathbf{H}\mathbf{R} \\ \tilde{\mathbf{S}} = \mathbf{S}\mathbf{R}
2. An univariate bias-adjustment \(\mathcal{F}\) is used on the rotated datasets. The adjustments are made in additive mode, for each variable \(i\).
\[\hat{\mathbf{H}}_i, \hat{\mathbf{S}}_i = \mathcal{F}\left(\tilde{\mathbf{T}}_i, \tilde{\mathbf{H}}_i, \tilde{\mathbf{S}}_i\right)\]The bias-adjusted datasets are rotated back.
\[\begin{split}\mathbf{H}' = \hat{\mathbf{H}}\mathbf{R} \\ \mathbf{S}' = \hat{\mathbf{S}}\mathbf{R}\end{split}\]These three steps are repeated a certain number of times, prescribed by argument
n_iter
. At each iteration, a new random rotation matrix is generated.The original algorithm ([Pitie05]), stops the iteration when some distance score converges. Following [Cannon18] and the MBCn implementation in [CannonR], we instead fix the number of iterations.
As done by [Cannon18], the distance score chosen is the “Energy distance” from [SkezelyRizzo] (see
xclim.sdba.processing.escore()
).The random matrices are generated following a method laid out by [Mezzadri].
This is only part of the full MBCn algorithm, see Fourth example : Multivariate bias-adjustment with multiple steps - Cannon 2018 for an example on how to replicate the full method with xclim. This includes a standardization of the simulated data beforehand, an initial univariate adjustment and the reordering of those adjusted series according to the rank structure of the output of this algorithm.
References
- Cannon18(1,2,3,4)
Cannon, A. J. (2018). Multivariate quantile mapping bias correction: An N-dimensional probability density function transform for climate model simulations of multiple variables. Climate Dynamics, 50(1), 31–49. https://doi.org/10.1007/s00382-017-3580-6
- CannonR
- Mezzadri
Mezzadri, F. (2006). How to generate random matrices from the classical compact groups. arXiv preprint math-ph/0609050.
- Pitie05(1,2)
Pitie, F., Kokaram, A. C., & Dahyot, R. (2005). N-dimensional probability density function transfer and its application to color transfer. Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, 2, 1434-1439 Vol. 2. https://doi.org/10.1109/ICCV.2005.166
- SkezelyRizzo(1,2)
Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5)
- class xclim.sdba.adjustment.PrincipalComponents(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.TrainAdjust
Principal component adjustment.
This bias-correction method maps model simulation values to the observation space through principal components ([hnilica2017]). Values in the simulation space (multiple variables, or multiple sites) can be thought of as coordinates along axes, such as variable, temperature, etc. Principal components (PC) are a linear combinations of the original variables where the coefficients are the eigenvectors of the covariance matrix. Values can then be expressed as coordinates along the PC axes. The method makes the assumption that bias-corrected values have the same coordinates along the PC axes of the observations. By converting from the observation PC space to the original space, we get bias corrected values. See notes for a mathematical explanation.
Note that principal components is meant here as the algebraic operation defining a coordinate system based on the eigenvectors, not statistical principal component analysis.
- Parameters
group (Union[str, Grouper]) – The grouping information. pts_dims can also be given through Grouper’s add_dims argument. See Notes. See
xclim.sdba.base.Grouper
for details. The adjustment will be performed on each group independently. Default is “time”, meaning an single adjustment group along dimension “time”.crd_dims (Sequence of str, optional) – The data dimension(s) along which the multiple simulation space dimensions are taken. They are flattened into “coordinate” dimension, see Notes. Default is None in which case all dimensions shared by ref and hist, except those in pts_dims are used. The training algorithm currently doesn’t support any chunking along the coordinate and point dimensions.
pts_dims (Sequence of str, optional) – The data dimensions to flatten into the “points” dimension, see Notes. They will be merged with those given through the add_dims property of group.
Notes
The input data is understood as a set of N points in a \(M\)-dimensional space.
\(N\) is taken along the data coordinates listed in pts_dims and the group (the main dim but also the add_dims).
\(M\) is taken along the data coordinates listed in crd_dims, the default being all except those in pts_dims.
For example, for a 3D matrix of data, say in (lat, lon, time), we could say that all spatial points are independent dimensions of the simulation space by passing
crd_dims=['lat', 'lon']
. For a (5, 5, 365) array, this results in a 25-dimensions space, i.e. \(M = 25\) and \(N = 365\).Thus, the adjustment is equivalent to a linear transformation of these \(N\) points in a \(M\)-dimensional space.
The principal components (PC) of hist and ref are used to defined new coordinate systems, centered on their respective means. The training step creates a matrix defining the transformation from hist to ref:
\[scen = e_{R} + \mathrm{\mathbf{T}}(sim - e_{H})\]Where:
\[\mathrm{\mathbf{T}} = \mathrm{\mathbf{R}}\mathrm{\mathbf{H}}^{-1}\]\(\mathrm{\mathbf{R}}\) is the matrix transforming from the PC coordinates computed on ref to the data coordinates. Similarly, \(\mathrm{\mathbf{H}}\) is transform from the hist PC to the data coordinates (\(\mathrm{\mathbf{H}}\) is the inverse transformation). \(e_R\) and \(e_H\) are the centroids of the ref and hist distributions respectively. Upon running the adjust step, one may decide to use \(e_S\), the centroid of the sim distribution, instead of \(e_H\).
References
- hnilica2017(1,2)
Hnilica, J., Hanel, M. and Puš, V. (2017), Multisite bias correction of precipitation data from regional climate models. Int. J. Climatol., 37: 2934-2946. https://doi.org/10.1002/joc.4890
- class xclim.sdba.adjustment.QuantileDeltaMapping(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.EmpiricalQuantileMapping
Quantile Delta Mapping bias-adjustment.
Adjustment factors are computed between the quantiles of ref and hist. Quantiles of sim are matched to the corresponding quantiles of hist and corrected accordingly.
\[sim\frac{F^{-1}_{ref}\left[F_{sim}(sim)\right]}{F^{-1}_{hist}\left[F_{sim}(sim)\right]}\]where \(F\) is the cumulative distribution function (CDF). This equation is valid for multiplicative adjustment. The algorithm is based on the “QDM” method of [Cannon2015].
- Parameters
Train step
nquantiles (int or 1d array of floats) – The number of quantiles to use. Two endpoints at 1e-6 and 1 - 1e-6 will be added. An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles (+2 endpoints).
kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning an single adjustment group along dimension “time”.Adjust step
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “nearest”.
extrapolation ({‘constant’, ‘nan’}) – The type of extrapolation to use. See
xclim.sdba.utils.extrapolate_qm()
for details. Defaults to “constant”.Extra diagnostics
—————–
In adjustment
quantiles (The quantile of each value of sim. The adjustment factor is interpolated using this as the “quantile” axis on ds.af.)
References
Cannon, A. J., Sobie, S. R., & Murdock, T. Q. (2015). Bias correction of GCM precipitation by quantile mapping: How well do methods preserve changes in quantiles and extremes? Journal of Climate, 28(17), 6938–6959. https://doi.org/10.1175/JCLI-D-14-00754.1
- class xclim.sdba.adjustment.Scaling(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.TrainAdjust
Scaling bias-adjustment.
Simple bias-adjustment method scaling variables by an additive or multiplicative factor so that the mean of hist matches the mean of ref.
- Parameters
Train step
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning an single adjustment group along dimension “time”.kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.
Adjust step
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use then interpolating the adjustment factors. Defaults to “nearest”.
Pre and post processing
- xclim.sdba.processing.adapt_freq(ref: xarray.DataArray, sim: xarray.DataArray, *, group: Union[xclim.sdba.base.Grouper, str], thresh: str = '0 mm d-1') xarray.Dataset [source]
Adapt frequency of values under thresh of sim, in order to match ref.
This is useful when the dry-day frequency in the simulations is higher than in the references. This function will create new non-null values for sim/hist, so that adjustment factors are less wet-biased. Based on [Themessl2012].
- Parameters
ds (xr.Dataset) – With variables : “ref”, Target/reference data, usually observed data. and “sim”, Simulated data.
dim (str) – Dimension name.
group (Union[str, Grouper]) – Grouping information, see base.Grouper
thresh (str) – Threshold below which values are considered zero, a quantity with units.
- Returns
sim_adj (xr.DataArray) – Simulated data with the same frequency of values under threshold than ref. Adjustment is made group-wise.
pth (xr.DataArray) – For each group, the smallest value of sim that was not frequency-adjusted. All values smaller were either left as zero values or given a random value between thresh and pth. NaN where frequency adaptation wasn’t needed.
dP0 (xr.DataArray) – For each group, the percentage of values that were corrected in sim.
Notes
With \(P_0^r\) the frequency of values under threshold \(T_0\) in the reference (ref) and \(P_0^s\) the same for the simulated values, \(\\Delta P_0 = \\frac{P_0^s - P_0^r}{P_0^s}\), when positive, represents the proportion of values under \(T_0\) that need to be corrected.
The correction replaces a proportion \(\\Delta P_0\) of the values under \(T_0\) in sim by a uniform random number between \(T_0\) and \(P_{th}\), where \(P_{th} = F_{ref}^{-1}( F_{sim}( T_0 ) )\) and F(x) is the empirical cumulative distribution function (CDF).
References
- Themessl2012
Themeßl et al. (2012), Empirical-statistical downscaling and error correction of regional climate models and its impact on the climate change signal, Climatic Change, DOI 10.1007/s10584-011-0224-4.
- xclim.sdba.processing.construct_moving_yearly_window(da: xarray.Dataset, window: int = 21, step: int = 1, dim: str = 'movingwin')[source]
Construct a moving window DataArray.
Stacks windows of da in a new ‘movingwin’ dimension. Windows are always made of full years, so calendar with non uniform year lengths are not supported.
Windows are constructed starting at the beginning of da, if number of given years is not a multiple of step, then the last year(s) will be missing as a supplementary window would be incomplete.
- Parameters
da (xr.DataArray) – A DataArray with a time dimension.
window (int) – The length of the moving window as a number of years.
step (int) – The step between each window as a number of years.
dim (str) – The new dimension name. If given, must also be given to unpack_moving_yearly_window.
- Returns
xr.DataArray – A DataArray with a new movingwin dimension and a time dimension with a length of 1 window. This assumes downstream algorithms do not make use of the _absolute_ year of the data. The correct timeseries can be reconstructed with
unpack_moving_yearly_window()
. The coordinates of movingwin are the first date of the windows.
- xclim.sdba.processing.escore(tgt: xarray.DataArray, sim: xarray.DataArray, dims: Sequence[str] = ('variables', 'time'), N: int = 0, scale: bool = False) xarray.DataArray [source]
Energy score, or energy dissimilarity metric, based on [SkezelyRizzo] and [Cannon18].
- Parameters
tgt (DataArray) – Target observations.
sim (DataArray) – Candidate observations. Must have the same dimensions as tgt.
dims (sequence of 2 strings) – The name of the dimensions along which the variables and observation points are listed. tgt and sim can have different length along the second one, but must be equal along the first one. The result will keep all other dimensions.
N (int) – If larger than 0, the number of observations to use in the score computation. The points are taken evenly distributed along obs_dim.
scale (bool) – Whether to scale the data before computing the score. If True, both arrays as scaled according to the mean and standard deviation of tgt along obs_dim. (std computed with ddof=1 and both statistics excluding NaN values.
- Returns
xr.DataArray – e-score with dimensions not in dims.
Notes
Explanation adapted from the “energy” R package documentation. The e-distance between two clusters \(C_i\), \(C_j\) (tgt and sim) of size \(n_i,,n_j\) proposed by Szekely and Rizzo (2005) is defined by:
\[e(C_i,C_j) = \frac{1}{2}\frac{n_i n_j}{n_i + n_j} \left[2 M_{ij} − M_{ii} − M_{jj}\right]\]where
\[M_{ij} = \frac{1}{n_i n_j} \sum_{p = 1}^{n_i} \sum{q = 1}^{n_j} \left\Vert X_{ip} − X{jq} \right\Vert.\]\(\Vert\cdot\Vert\) denotes Euclidean norm, \(X_{ip}\) denotes the p-th observation in the i-th cluster.
The input scaling and the factor \(\frac{1}{2}\) in the first equation are additions of [Cannon18] to the metric. With that factor, the test becomes identical to the one defined by [BaringhausFranz].
References
- BaringhausFranz
Baringhaus, L. and Franz, C. (2004) On a new multivariate two-sample test, Journal of Multivariate Analysis, 88(1), 190–206. https://doi.org/10.1016/s0047-259x(03)00079-4
- xclim.sdba.processing.jitter_over_thresh(x: xarray.DataArray, thresh: str, upper_bnd: str) xarray.Dataset [source]
Replace values greater than threshold by a uniform random noise.
Do not confuse with R’s jitter, which adds uniform noise instead of replacing values.
- Parameters
x (xr.DataArray) – Values.
thresh (str) – Threshold over which to add uniform random noise to values, a quantity with units.
upper_bnd (str) – Maximum possible value for the random noise, a quantity with units.
- Returns
xr.Dataset
Notes
If thresh is low, this will change the mean value of x.
- xclim.sdba.processing.jitter_under_thresh(x: xarray.DataArray, thresh: str)[source]
Replace values smaller than threshold by a uniform random noise.
Do not confuse with R’s jitter, which adds uniform noise instead of replacing values.
- Parameters
x (xr.DataArray) – Values.
thresh (str) – Threshold under which to add uniform random noise to values, a quantity with units.
- Returns
array
Notes
If thresh is high, this will change the mean value of x.
- xclim.sdba.processing.normalize(data: xarray.DataArray, norm: Optional[xarray.DataArray] = None, *, group: Union[xclim.sdba.base.Grouper, str], kind: str = '+') xarray.Dataset [source]
Normalize an array by removing its mean.
Normalization if performed group-wise and according to kind.
- Parameters
data (xr.DataArray) – The variable to normalize.
norm (xr.DataArray, optional) – If present, it is used instead of computing the norm again.
group (Union[str, Grouper]) – Grouping information. See
xclim.sdba.base.Grouper
for details..kind ({‘+’, ‘’}*) – If kind is “+”, the mean is subtracted from the mean and if it is ‘*’, it is divided from the data.
- Returns
xr.DataArray – Groupwise anomaly
- xclim.sdba.processing.reordering(ref: xarray.DataArray, sim: xarray.DataArray, group: str = 'time') xarray.Dataset [source]
Reorders data in sim following the order of ref.
The rank structure of ref is used to reorder the elements of sim along dimension “time”, optionally doing the operation group-wise.
- Parameters
sim (xr.DataArray) – Array to reorder.
ref (xr.DataArray) – Array whose rank order sim should replicate.
group (str) – Grouping information. See
xclim.sdba.base.Grouper
for details.
- Returns
xr.Dataset – sim reordered according to ref’s rank order.
Reference
———
Cannon, A. J. (2018). Multivariate quantile mapping bias correction (An N-dimensional probability density function)
transform for climate model simulations of multiple variables. Climate Dynamics, 50(1), 31–49.
https (//doi.org/10.1007/s00382-017-3580-6)
- xclim.sdba.processing.stack_variables(ds, rechunk=True, dim='variables')[source]
Stack different variables of a dataset into a single DataArray with a new “variables” dimension.
Variable attributes are all added as lists of attributes to the new coordinate, prefixed with “_”.
- Parameters
ds (xr.Dataset) – Input dataset.
rechunk (bool) – If True (default), dask arrays are rechunked with variables : -1.
dim (str) – Name of dimension along which variables are indexed.
- Returns
xr.DataArray – Array with variables stacked along dim dimension. Units are set to “”.
- xclim.sdba.processing.standardize(da: xarray.DataArray, mean: Optional[xarray.DataArray] = None, std: Optional[xarray.DataArray] = None, dim: str = 'time') Tuple[Union[xarray.DataArray, xarray.Dataset], xarray.DataArray, xarray.DataArray] [source]
Standardize a DataArray by centering its mean and scaling it by its standard deviation.
Either of both of mean and std can be provided if need be.
Returns the standardized data, the mean and the standard deviation.
- xclim.sdba.processing.uniform_noise_like(da: xarray.DataArray, low: float = 1e-06, high: float = 0.001) xarray.DataArray [source]
Return a uniform noise array of the same shape as da.
Noise is uniformly distributed between low and high. Alternative method to jitter_under_thresh for avoiding zeroes.
- xclim.sdba.processing.unpack_moving_yearly_window(da: xarray.DataArray, dim: str = 'movingwin')[source]
Unpack a constructed moving window dataset to a normal timeseries, only keeping the central data.
Unpack DataArrays created with
construct_moving_yearly_window()
and recreate a timeseries data. Only keeps the central non-overlapping years. The final timeseries will be (window - step) years shorter than the initial one.The window length and window step are inferred from the coordinates.
- Parameters
da (xr.DataArray) – As constructed by
construct_moving_yearly_window()
.dim (str) – The window dimension name as given to the construction function.
- xclim.sdba.processing.unstack_variables(da, dim=None)[source]
Unstack a DataArray created by stack_variables to a dataset.
- Parameters
da (xr.DataArray) – Array holding different variables along dim dimension.
dim (str) – Name of dimension along which the variables are stacked. If not specified (default), dim is inferred from attributes of the coordinate.
- Returns
xr.Dataset – Dataset holding each variable in an individual DataArray.
Detrending objects
- class xclim.sdba.detrending.LoessDetrend(group='time', kind='+', f=0.2, niter=1, d=0, weights='tricube', equal_spacing=None)[source]
Bases:
xclim.sdba.detrending.BaseDetrend
Detrend time series using a LOESS regression.
The fit is a piecewise linear regression. For each point, the contribution of all neighbors is weighted by a bell-shaped curve (gaussian) with parameters sigma (std). The x-coordinate of the dataarray is scaled to [0,1] before the regression is computed.
- Parameters
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. The fit is performed along the group’s main dim.kind ({’’, ‘+’}*) – The way the trend is removed or added, either additive or multiplicative.
d ([0, 1]) – Order of the local regression. Only 0 and 1 currently implemented.
f (float) – Parameter controling the span of the weights, between 0 and 1.
niter (int) – Number of robustness iterations to execute.
weights ([“tricube”, “gaussian”]) – Shape of the weighting function: “tricube” : a smooth top-hat like curve, f gives the span of non-zero values. “gaussian” : a gaussian curve, f gives the span for 95% of the values.
Notes
LOESS smoothing is computationally expensive. As it relies on a loop on gridpoints, it can be useful to use smaller than usual chunks. Moreover, it suffers from heavy boundary effects. As a rule of thumb, the outermost N * f/2 points should be considered dubious. (N is the number of points along each group)
- class xclim.sdba.detrending.MeanDetrend(*, group: Union[xclim.sdba.base.Grouper, str] = 'time', kind: str = '+', **kwargs)[source]
Bases:
xclim.sdba.detrending.BaseDetrend
Simple detrending removing only the mean from the data, quite similar to normalizing.
- class xclim.sdba.detrending.NoDetrend(*, group: Union[xclim.sdba.base.Grouper, str] = 'time', kind: str = '+', **kwargs)[source]
Bases:
xclim.sdba.detrending.BaseDetrend
Convenience class for polymorphism. Does nothing.
- class xclim.sdba.detrending.PolyDetrend(group='time', kind='+', degree=4, preserve_mean=False)[source]
Bases:
xclim.sdba.detrending.BaseDetrend
Detrend time series using a polynomial regression.
- Parameters
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. The fit is performed along the group’s main dim.kind ({’’, ‘+’}*) – The way the trend is removed or added, either additive or multiplicative.
degree (int) – The order of the polynomial to fit.
preserve_mean (bool) – Whether to preserve the mean when de/re-trending. If True, the trend has its mean removed before it is used.
- class xclim.sdba.detrending.RollingMeanDetrend(group='time', kind='+', win=30, weights=None, min_periods=None)[source]
Bases:
xclim.sdba.detrending.BaseDetrend
Detrend time series using a rolling mean.
- Parameters
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. The fit is performed along the group’s main dim.kind ({’’, ‘+’}*) – The way the trend is removed or added, either additive or multiplicative.
win (int) – The size of the rolling window. Units are the steps of the grouped data, which means this detrending is best use with either group=’time’ or group=’time.dayofyear’. Other grouping will have large jumps included within the windows and :py`:class:LoessDetrend might offer a better solution.
weights (sequence of floats, optional) – Sequence of length win. Defaults to None, which means a flat window.
min_periods (int, optional) – Minimum number of observations in window required to have a value, otherwise the result is NaN. See
xarray.DataArray.rolling()
. Defauls to None, which sets it equal to win. Setting both weights and this is not implemented yet.
Notes
As for the
LoessDetrend
detrending, important boundary effects are to be expected.
sdba utilities
- xclim.sdba.utils.add_cyclic_bounds(da: xarray.DataArray, att: str, cyclic_coords: bool = True) Union[xarray.DataArray, xarray.Dataset] [source]
Reindex an array to include the last slice at the beginning and the first at the end.
This is done to allow interpolation near the end-points.
- Parameters
da (Union[xr.DataArray, xr.Dataset]) – An array
att (str) – The name of the coordinate to make cyclic
cyclic_coords (bool) – If True, the coordinates are made cyclic as well, if False, the new values are guessed using the same step as their neighbour.
- Returns
Union[xr.DataArray, xr.Dataset] – da but with the last element along att prepended and the last one appended.
- xclim.sdba.utils.add_endpoints(da: xarray.DataArray, left: List[Union[int, float, xarray.DataArray, List[int], List[float]]], right: List[Union[int, float, xarray.DataArray, List[int], List[float]]], dim: str = 'quantiles') xarray.DataArray [source]
Add left and right endpoints to a DataArray.
- Parameters
da (DataArray) – Source array.
left ([x, y]) – Values to prepend
right ([x, y]) – Values to append.
dim (str) – Dimension along which to add endpoints.
- xclim.sdba.utils.apply_correction(x: xarray.DataArray, factor: xarray.DataArray, kind: Optional[str] = None) xarray.DataArray [source]
Apply the additive or multiplicative correction/adjustment factors.
If kind is not given, default to the one stored in the “kind” attribute of factor.
- xclim.sdba.utils.best_pc_orientation(A: numpy.ndarray, Binv: numpy.ndarray, val: float = 1000) numpy.ndarray [source]
Return best orientation vector for A.
Eigenvectors returned by pc_matrix do not have a defined orientation. Given an inverse transform Binv and a transform A, this returns the orientation minimizing the projected distance for a test point far from the origin.
This trick is explained in [hnilica2017]. See documentation of sdba.adjustment.PrincipalComponentAdjustment.
- Parameters
A (np.ndarray) – MxM Matrix defining the final transformation.
Binv (np.ndarray) – MxM Matrix defining the (inverse) first transformation.
val (float) – The coordinate of the test point (same for all axes). It should be much greater than the largest furthest point in the array used to define B.
- Returns
np.ndarray – Mx1 vector of orientation correction (1 or -1).
- xclim.sdba.utils.broadcast(grouped: xarray.DataArray, x: xarray.DataArray, *, group: Union[str, xclim.sdba.base.Grouper] = 'time', interp: str = 'nearest', sel: Optional[Mapping[str, xarray.DataArray]] = None) xarray.DataArray [source]
Broadcast a grouped array back to the same shape as a given array.
- Parameters
grouped (xr.DataArray) – The grouped array to broadcast like x.
x (xr.DataArray) – The array to broadcast grouped to.
group (Union[str, Grouper]) – Grouping information. See
xclim.sdba.base.Grouper
for details.interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use,
sel (Mapping[str, xr.DataArray]) – Mapping of grouped coordinates to x coordinates (other than the grouping one).
- Returns
xr.DataArray
- xclim.sdba.utils.ecdf(x: xarray.DataArray, value: float, dim: str = 'time') xarray.DataArray [source]
Return the empirical CDF of a sample at a given value.
- Parameters
x (array) – Sample.
value (float) – The value within the support of x for which to compute the CDF value.
dim (str) – Dimension name.
- Returns
xr.DataArray – Empirical CDF.
- xclim.sdba.utils.ensure_longest_doy(func: Callable) Callable [source]
Ensure that selected day is the longest day of year for x and y dims.
- xclim.sdba.utils.equally_spaced_nodes(n: int, eps: Optional[float] = 0.0001) numpy.array [source]
Return nodes with n equally spaced points within [0, 1] plus two end-points.
- Parameters
n (int) – Number of equally spaced nodes.
eps (float, None) – Distance from 0 and 1 of end nodes. If None, do not add endpoints.
- Returns
np.array – Nodes between 0 and 1.
Notes
For n=4, eps=0 : 0—x——x——x——x—1
- xclim.sdba.utils.extrapolate_qm(qf: xarray.DataArray, xq: xarray.DataArray, method: str = 'constant', abs_bounds: Optional[tuple] = (- inf, inf)) Tuple[xarray.DataArray, xarray.DataArray] [source]
Extrapolate quantile adjustment factors beyond the computed quantiles.
- Parameters
qf (xr.DataArray) – Adjustment factors over quantile coordinates.
xq (xr.DataArray) – Coordinates of the adjustment factors. For example, in EQM, this are the values at each quantile.
method ({“constant”, “nan”}) – Extrapolation method. See notes below.
abs_bounds (2-tuple) – The absolute bounds of xq. Defaults to (-inf, inf).
- Returns
qf (xr.Dataset or xr.DataArray) – Extrapolated adjustment factors.
xq (xr.Dataset or xr.DataArray) – Extrapolated x-values.
Notes
Valid values for method are:
‘nan’
Estimating values above or below the computed values will return a NaN.
‘constant’
The adjustment factor above and below the computed values are equal to the last and first values respectively.
- xclim.sdba.utils.get_clusters(data: xarray.DataArray, u1, u2, dim: str = 'time') xarray.Dataset [source]
Get cluster count, maximum and position along a given dim.
See get_clusters_1d. Used by adjustment.ExtremeValues.
- Parameters
data (1D ndarray) – Values to get clusters from.
u1 (float) – Extreme value threshold, at least one value in the cluster must exceed this.
u2 (float) – Cluster threshold, values above this can be part of a cluster.
dim (str) – Dimension name.
- Returns
xr.Dataset –
- With variables,
nclusters : Number of clusters for each point (with dim reduced), int
start : First index in the cluster (dim reduced, new cluster), int
end : Last index in the cluster, inclusive (dim reduced, new cluster), int
maxpos : Index of the maximal value within the cluster (dim reduced, new cluster), int
maximum : Maximal value within the cluster (dim reduced, new cluster), same dtype as data.
For start, end and maxpos, -1 means NaN and should always correspond to a NaN in maximum. The length along cluster is half the size of “dim”, the maximal theoritical number of clusters.
- xclim.sdba.utils.get_clusters_1d(data: numpy.ndarray, u1: float, u2: float) Tuple[numpy.array, numpy.array, numpy.array, numpy.array] [source]
Get clusters of a 1D array.
A cluster is defined as a sequence of values larger than u2 with at least one value larger than u1.
- Parameters
data (1D ndarray) – Values to get clusters from.
u1 (float) – Extreme value threshold, at least one value in the cluster must exceed this.
u2 (float) – Cluster threshold, values above this can be part of a cluster.
- Returns
(np.array, np.array, np.array, np.array)
References
getcluster of Extremes.jl (read on 2021-04-20) https://github.com/jojal5/Extremes.jl
- xclim.sdba.utils.get_correction(x: xarray.DataArray, y: xarray.DataArray, kind: str) xarray.DataArray [source]
Return the additive or multiplicative correction/adjustment factors.
- xclim.sdba.utils.interp_on_quantiles(newx: xarray.DataArray, xq: xarray.DataArray, yq: xarray.DataArray, *, group: Union[str, xclim.sdba.base.Grouper] = 'time', method: str = 'linear', extrapolation: str = 'constant')[source]
Interpolate values of yq on new values of x.
Interpolate in 2D with
griddata()
if grouping is used, in 1D otherwise, withinterp1d
. Any NaNs in xq or yq are removed from the input map. Similarly, NaNs in newx are left NaNs.- Parameters
newx (xr.DataArray) – The values at which to evaluate yq. If group has group information, new should have a coordinate with the same name as the group name In that case, 2D interpolation is used.
xq, yq (xr.DataArray) – Coordinates and values on which to interpolate. The interpolation is done along the “quantiles” dimension if group has no group information. If it does, interpolation is done in 2D on “quantiles” and on the group dimension.
group (Union[str, Grouper]) – The dimension and grouping information. (ex: “time” or “time.month”). Defaults to “time”.
method ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method.
extrapolation ({‘constant’, ‘nan’}) – The extrapolation method used for values of newx outside the range of xq. See notes.
Notes
Extrapolation methods:
‘nan’ : Any value of newx outside the range of xq is set to NaN.
‘constant’ : Values of newx smaller than the minimum of xq are set to the first value of yq and those larger than the maximum, set to the last one (first and last values along the “quantiles” dimension).
- xclim.sdba.utils.invert(x: xarray.DataArray, kind: Optional[str] = None) xarray.DataArray [source]
Invert a DataArray either additively (-x) or multiplicatively (1/x).
If kind is not given, default to the one stored in the “kind” attribute of x.
- xclim.sdba.utils.map_cdf(ds: xarray.Dataset, *, y_value: xarray.DataArray, dim)[source]
Return the value in x with the same CDF as y_value in y.
This function is meant to be wrapped in a Grouper.apply.
- Parameters
ds (xr.Dataset) – Variables: x, Values from which to pick, y, Reference values giving the ranking
y_value (float, array) – Value within the support of y.
dim (str) – Dimension along which to compute quantile.
- Returns
array – Quantile of x with the same CDF as y_value in y.
- xclim.sdba.utils.map_cdf_1d(x, y, y_value)[source]
Return the value in x with the same CDF as y_value in y.
- xclim.sdba.utils.pc_matrix(arr: Union[numpy.ndarray, dask.array.core.Array]) Union[numpy.ndarray, dask.array.core.Array] [source]
Construct a Principal Component matrix.
This matrix can be used to transform points in arr to principal components coordinates. Note that this function does not manage NaNs; if a single observation is null, all elements of the transformation matrix involving that variable will be NaN.
- Parameters
arr (numpy.ndarray or dask.array.Array) – 2D array (M, N) of the M coordinates of N points.
- Returns
numpy.ndarray or dask.array.Array – MxM Array of the same type as arr.
- xclim.sdba.utils.rand_rot_matrix(crd: xarray.DataArray, num: int = 1, new_dim: Optional[str] = None) xarray.DataArray [source]
Generate random rotation matrices.
Rotation matrices are members of the SO(n) group, where n is the matrix size (crd.size). They can be characterized as orthogonal matrices with determinant 1. A square matrix \(R\) is a rotation matrix if and only if \(R^t = R^{−1}\) and \(\mathrm{det} R = 1\).
- Parameters
crd (xr.DataArray) – 1D coordinate DataArray along which the rotation occurs. The output will be square with the same coordinate replicated, the second renamed to new_dim.
num (int) – If larger than 1 (default), the number of matrices to generate, stacked along a “matrices” dimension.
new_dim (str) – Name of the new “prime” dimension, defaults to the same name as crd + “_prime”.
- Returns
xr.DataArray – float, NxN if num = 1, numxNxN otherwise, where N is the length of crd.
References
Mezzadri, F. (2006). How to generate random matrices from the classical compact groups. arXiv preprint math-ph/0609050.
- xclim.sdba.utils.rank(da: xarray.DataArray, dim: str = 'time', pct: bool = False) xarray.DataArray [source]
Ranks data along a dimension.
Replicates xr.DataArray.rank but as a function usable in a Grouper.apply(). Xarray’s docstring is below:
Equal values are assigned a rank that is the average of the ranks that would have been otherwise assigned to all of the values within that set. Ranks begin at 1, not 0. If pct, computes percentage ranks.
NaNs in the input array are returned as NaNs.
The bottleneck library is required.
- Parameters
da (xr.DataArray)
dim (str, hashable) – Dimension over which to compute rank.
pct (bool, optional) – If True, compute percentage ranks, otherwise compute integer ranks.
- Returns
DataArray – DataArray with the same coordinates and dtype ‘float64’.
- class xclim.sdba.base.Grouper(group: str, window: int = 1, add_dims: Optional[Union[Sequence[str], Set[str]]] = None)[source]
Create the Grouper object.
- Parameters
group (str) – The usual grouping name as xarray understands it. Ex: “time.month” or “time”. The dimension name before the dot is the “main dimension” stored in Grouper.dim and the property name after is stored in Grouper.prop.
window (int) – If larger than 1, a centered rolling window along the main dimension is created when grouping data. Units are the sampling frequency of the data along the main dimension.
add_dims (Optional[Union[Sequence[str], str]]) – Additional dimensions that should be reduced in grouping operations. This behaviour is also controlled by the main_only parameter of the apply method. If any of these dimensions are absent from the dataarrays, they will be omitted.
- apply(func: Union[function, str], da: Union[xarray.DataArray, Mapping[str, xarray.DataArray], xarray.Dataset], main_only: bool = False, **kwargs)[source]
Apply a function group-wise on DataArrays.
- Parameters
func (Union[FunctionType, str]) – The function to apply to the groups, either a callable or a xr.core.groupby.GroupBy method name as a string. The function will be called as func(group, dim=dims, **kwargs). See main_only for the behaviour of dims.
da (Union[xr.DataArray, Mapping[str, xr.DataArray], xr.Dataset]) – The DataArray on which to apply the function. Multiple arrays can be passed through a dictionary. A dataset will be created before grouping.
main_only (bool) – Whether to call the function with the main dimension only (if True) or with all grouping dims (if False, default) (including the window and dimensions given through add_dims). The dimensions used are also written in the “group_compute_dims” attribute. If all the input arrays are missing one of the ‘add_dims’, it is silently omitted.
kwargs – Other keyword arguments to pass to the function.
- Returns
DataArray or Dataset – Attributes “group”, “group_window” and “group_compute_dims” are added.
If the function did not reduce the array:
The output is sorted along the main dimension.
The output is rechunked to match the chunks on the input If multiple inputs with differing chunking were given as inputs, the chunking with the smallest number of chunks is used.
If the function reduces the array:
If there is only one group, the singleton dimension is squeezed out of the output
The output is rechunked as to have only 1 chunk along the new dimension.
Notes
For the special case where a Dataset is returned, but only some of its variable where reduced by the grouping, xarray’s GroupBy.map will broadcast everything back to the ungrouped dimensions. To overcome this issue, function may add a “_group_apply_reshape” attribute set to True on the variables that should be reduced and these will be re-grouped by calling da.groupby(self.name).first().
- get_coordinate(ds=None)[source]
Return the coordinate as in the output of group.apply.
Currently only implemented for groupings with prop == month or dayofyear. For prop == dayfofyear a ds (dataset or dataarray) can be passed to infer the max doy from the available years and calendar.
- get_index(da: Union[xarray.DataArray, xarray.Dataset], interp: Optional[bool] = None)[source]
Return the group index of each element along the main dimension.
- Parameters
da (Union[xr.DataArray, xr.Dataset]) – The input array/dataset for which the group index is returned. It must have Grouper.dim as a coordinate.
interp (bool, optional) – If True, the returned index can be used for interpolation. Only value for month grouping, where integer values represent the middle of the month, all other days are linearly interpolated in between.
- Returns
xr.DataArray – The index of each element along Grouper.dim. If Grouper.dim is time and Grouper.prop is None, an uniform array of True is returned. If Grouper.prop is a time accessor (month, dayofyear, etc), an numerical array is returned, with a special case of month and interp=True. If Grouper.dim is not time, the dim is simply returned.
- group(da: Optional[Union[xarray.DataArray, xarray.Dataset]] = None, main_only=False, **das: xarray.DataArray)[source]
Return a xr.core.groupby.GroupBy object.
More than one array can be combined to a dataset before grouping using the das kwargs. A new window dimension is added if self.window is larger than 1. If Grouper.dim is ‘time’, but ‘prop’ is None, the whole array is grouped together.
When multiple arrays are passed, some of them can be grouped along the same group as self. They are boadcasted, merged to the grouping dataset and regrouped in the output.
Numba-accelerated utilities
LOESS smoothing
- xclim.sdba.loess.loess_smoothing(da: xarray.DataArray, dim: str = 'time', d: int = 1, f: float = 0.5, niter: int = 2, weights: Union[str, Callable] = 'tricube', equal_spacing: Optional[bool] = None)[source]
Locally weighted regression in 1D: fits a nonparametric regression curve to a scatterplot.
Returns a smoothed curve along given dimension. The regression is computed for each point using a subset of neigboring points as given from evaluating the weighting function locally. Follows the procedure of [Cleveland1979].
- Parameters
da (xr.DataArray) – The data to smooth using the loess approach.
dim (str) – Name of the dimension along which to perform the loess.
d ([0, 1]) – Degree of the local regression.
f (float) – Parameter controlling the shape of the weight curve. Behavior depends on the weighting function, but it usually represents the span of the weighting function in reference to x-coordinates normalized from 0 to 1.
niter (int) – Number of robustness iterations to execute.
weights ([“tricube”, “gaussian”] or callable) – Shape of the weighting function, see notes. The user can provide a function or a string: “tricube” : a smooth top-hat like curve. “gaussian” : a gaussian curve, f gives the span for 95% of the values.
equal_spacing (bool, optional) – Whether to use the equal spacing optimization. If None (the default), it is activated only if the x axis is equally-spaced. When activated, dx = x[1] - x[0].
Notes
As stated in [Cleveland1979], the weighting function \(W(x)\) should respect the following conditions:
\(W(x) > 0\) for \(|x| < 1\)
\(W(-x) = W(x)\)
\(W(x)\) is nonincreasing for \(x \ge 0\)
\(W(x) = 0\) for \(|x| \ge 0\)
If a callable is provided, it should only accept the 1D np.ndarray \(x\) which is an absolute value function going from 1 to 0 to 1 around \(x_i\), for all values where \(x - x_i < h_i\) with \(h_i\) the distance of the rth nearest neighbor of \(x_i\), \(r = f * size(x)\).
References
- Cleveland1979(1,2)
Cleveland, W. S., 1979. Robust Locally Weighted Regression and Smoothing Scatterplot, Journal of the American Statistical Association 74, 829–836.
Code adapted from https://gist.github.com/agramfort/850437
Developer tools
Base classes and developper tools
- class xclim.sdba.base.Parametrizable[source]
Bases:
dict
Helper base class resembling a dictionary.
This object is _completely_ defined by the content of its internal dictionary, accessible through item access (self[‘attr’]) or in self.parameters. When serializing and restoring this object, only members of that internal dict are preserved. All other attributes set directly with self.attr = value will not be preserved upon serialization and restoration of the object with [json]pickle. dictionary. Other variables set with self.var = data will be lost in the serialization process. This class is best serialized and restored with jsonpickle.
- property parameters
All parameters as a dictionary. Read-only.
- class xclim.sdba.base.ParametrizableWithDataset[source]
Bases:
xclim.sdba.base.Parametrizable
Parametrizeable class that also has a ds attribute storing a dataset.
- xclim.sdba.base.duck_empty(dims, sizes, dtype='float64', chunks=None)[source]
Return an empty DataArray based on a numpy or dask backend, depending on the chunks argument.
- xclim.sdba.base.map_blocks(reduces=None, **outvars)[source]
Decorator for declaring functions and wrapping them into a map_blocks. It takes care of constructing the template dataset.
Dimension order is not preserved.
The decorated function must always have the signature:
func(ds, **kwargs)
, where ds is a DataArray or a Dataset. It must always output a dataset matching the mapping passed to the decorator.- Parameters
reduces (sequence of strings) – Name of the dimensions that are removed by the function.
**outvars – Mapping from variable names in the output to their new dimensions. The placeholders Grouper.PROP, Grouper.DIM and Grouper.ADD_DIMS can be used to signify group.prop,`group.dim` and group.add_dims respectively. If an output keeps a dimension that another loses, that dimension name must be given in reduces and in the list of new dimensions of the first output.
- xclim.sdba.base.map_groups(reduces=None, main_only=False, **outvars)[source]
Decorator for declaring functions acting only on groups and wrapping them into a map_blocks. See
map_blocks()
.This is the same as map_blocks but adds a call to group.apply() in the mapped func and the default value of reduces is changed.
The decorated function must have the signature:
func(ds, dim, **kwargs)
. Where ds is a DataAray or Dataset, dim is the group.dim (and add_dims). The group argument is stripped from the kwargs, but must evidently be provided in the call.- Parameters
reduces (sequence of str) – Dimensions that are removed from the inputs by the function. Defaults to [Grouper.DIM, Grouper.ADD_DIMS] if main_only is False, and [Grouper.DIM] if main_only is True. See
map_blocks()
.main_only (bool) – Same as for
Grouper.apply()
.
- xclim.sdba.base.parse_group(func: Callable, kwargs=None) Callable [source]
Parse the kwargs given to a function to set the group arg with a Grouper object.
This function can be used as a decorator, in which case the parsing and updating of the kwargs is done at call time. It can also be called with a function from which extract the default group and kwargs to update, in which case it returns the updated kwargs.
- class xclim.sdba.detrending.BaseDetrend(*, group: Union[xclim.sdba.base.Grouper, str] = 'time', kind: str = '+', **kwargs)[source]
Base class for detrending objects.
Defines three methods:
fit(da) : Compute trend from da and return a new _fitted_ Detrend object. detrend(da) : Return detrended array. retrend(da) : Puts trend back on da.
A fitted Detrend object is unique to the trend coordinate of the object used in fit, (usually ‘time’). The computed trend is stored in
Detrend.ds.trend
.Subclasses should implement
_get_trend_group()
or_get_trend()
. The first will be called in agroup.apply(..., main_only=True)
, and should return a single DataArray. The second allows the use of functions wrapped inmap_groups()
and should also return a single DataArray.The subclasses may reimplement
_detrend
and_retrend
.
- class xclim.sdba.adjustment.TrainAdjust(*args, _trained=False, **kwargs)[source]
Base class for adjustment objects obeying the train-adjust scheme.
Children classes should implement these methods:
_train(ref, hist, **kwargs)
, classmethod receiving the training target and data, returning a training dataset and parameters to store in the object._adjust(sim, **kwargs)
, receiving the projected data and some arguments, returning the scen dataarray.
- adjust(sim: xarray.DataArray, *args, **kwargs)[source]
Return bias-adjusted data. Refer to the class documentation for the algorithm details.
- Parameters
sim (DataArray) – Time series to be bias-adjusted, usually a model output.
*args (xr.DataArray) – Other DataArrays needed for the adjustment (usually none).
kwargs – Algorithm-specific keyword arguments, see class doc.
- set_dataset(ds: xarray.Dataset)[source]
Stores an xarray dataset in the ds attribute.
Useful with custom object initialization or if some external processing was performed.
- classmethod train(ref: xarray.DataArray, hist: xarray.DataArray, **kwargs)[source]
Train the adjustment object. Refer to the class documentation for the algorithm details.
- Parameters
ref (DataArray) – Training target, usually a reference time series drawn from observations.
hist (DataArray) – Training data, usually a model output whose biases are to be adjusted.
- class xclim.sdba.adjustment.Adjust(*args, _trained=False, **kwargs)[source]
Adjustment with no intermediate trained object.
Children classes should implement a _adjust classmethod taking as input the three DataArrays and returning the scen dataset/array.
- classmethod adjust(ref: xarray.DataArray, hist: xarray.DataArray, sim: xarray.DataArray, **kwargs)[source]
Return bias-adjusted data. Refer to the class documentation for the algorithm details.
- Parameters
ref (DataArray) – Training target, usually a reference time series drawn from observations.
hist (DataArray) – Training data, usually a model output whose biases are to be adjusted.
sim (DataArray) – Time series to be bias-adjusted, usually a model output.
kwargs – Algorithm-specific keyword arguments, see class doc.