Statistical Downscaling and Bias Adjustment

The xclim.sdba submodule provides bias-adjustment methods and will eventually provide statistical downscaling algorithms. Almost all adjustment algorithms conform to the train - adjust scheme, formalized within TrainAdjust classes. Given a reference time series (ref), historical simulations (hist) and simulations to be adjusted (sim), any bias-adjustment method would be applied by first estimating the adjustment factors between the historical simulation and the observations series, and then applying these factors to sim, which could be a future simulation:

# Create the adjustment object by training it with reference and model data, plus certains arguments
Adj = Adjustment.train(ref, hist, group="time.month")
# Get a scenario by applying the adjustment to a simulated timeseries.
scen = Adj.adjust(sim, interp="linear")
Adj.ds.af  # adjustment factors.

The group argument allows adjustment factors to be estimated independently for different periods: the full time series, months, seasons or day of the year. The interp argument then allows for interpolation between these adjustment factors to avoid discontinuities in the bias-adjusted series (only applicable for monthly grouping).

Warning

If grouping according to the day of the year is needed, the xclim.core.calendar submodule contains useful tools to manage the different calendars that the input data can have. By default, if 2 different calendars are passed, the adjustment factors will always be interpolated to the largest range of day of the years but this can lead to strange values and we recommend converting the data beforehand to a common calendar.

The same interpolation principle is also used for quantiles. Indeed, for methods extracting adjustment factors by quantile, interpolation is also done between quantiles. This can help reduce discontinuities in the adjusted time series, and possibly reduce the number of quantile bins used.

Modular approach

This module adopts a modular approach instead of implementing published and named methods directly. A generic bias adjustment process is laid out as follows:

  • preprocessing on ref, hist and sim (using methods in xclim.sdba.processing or xclim.sdba.detrending)

  • creating and training the adjustment object Adj = Adjustment.train(obs, sim, **kwargs) (from xclim.sdba.adjustment)

  • adjustment scen = Adj.adjust(sim, **kwargs)

  • post-processing on scen (for example: re-trending)

The train-adjust approach allows to inspect the trained adjustment object. The training information is stored in the underlying Adj.ds dataset and always has a af variable with the adjustment factors. Its layout and the other available variables vary between the different algorithm, refer to Available methods.

Parameters needed by the training and the adjustment are saved to the Adj.ds dataset as a adj_params attribute. Other parameters, those only needed by the adjustment are passed in the adjust call and written to the history attribute in the output scenario dataarray.

Grouping

For basic time period grouping (months, day of year, season), passing a string to the methods needing it is sufficient. Most methods acting on grouped data also accept a window int argument to pad the groups with data from adjacent ones. Units of window are the sampling frequency of the main grouping dimension (usually time). For more complex grouping, one can pass a xclim.sdba.base.Grouper directly.

Notes for developers

To be scalable and performant, the sdba module makes use of the special decorators :py:func`xclim.sdba.base.map_blocks` and xclim.sdba.base.map_groups(). However, they have the inconvenient that functions wrapped by them are unable to manage xarray attributes (including units) correctly and their signatures are sometime wrong and often unclear. For this reason, the module is often divided in two parts : the (decorated) compute functions in a “private” file (ex: _adjustment.py) and the user-facing functions or objects in corresponding public file (ex: adjustment.py). See the sdba-advanced notebook for more info on the reasons for this move.

Other restrictions : map_blocks will remove any “auxiliary” coordinates before calling the wrapped function and will add them back on exit.

Available methods

Adjustment objects.

class xclim.sdba.adjustment.BaseAdjustment(*args, _trained=False, **kwargs)[source]

Bases: xclim.sdba.base.ParametrizableWithDataset

Base class for adjustment objects.

Children classes should implement the train and / or the adjust method.

This base class defined the basic input and output checks. It should only be used for a real adjustment if neither TrainAdjust or Adjust fit the algorithm.

__init__(*args, _trained=False, **kwargs)[source]
class xclim.sdba.adjustment.DetrendedQuantileMapping(*args, _trained=False, **kwargs)[source]

Bases: xclim.sdba.adjustment.TrainAdjust

Detrended Quantile Mapping bias-adjustment.

The algorithm follows these steps, 1-3 being the ‘train’ and 4-6, the ‘adjust’ steps.

  1. A scaling factor that would make the mean of hist match the mean of ref is computed.

  2. ref and hist are normalized by removing the “dayofyear” mean.

  3. Adjustment factors are computed between the quantiles of the normalized ref and hist.

  4. sim is corrected by the scaling factor, and either normalized by “dayofyear” and detrended group-wise or directly detrended per “dayofyear”, using a linear fit (modifiable).

  5. Values of detrended sim are matched to the corresponding quantiles of normalized hist and corrected accordingly.

  6. The trend is put back on the result.

\[F^{-1}_{ref}\left\{F_{hist}\left[\frac{\overline{hist}\cdot sim}{\overline{sim}}\right]\right\}\frac{\overline{sim}}{\overline{hist}}\]

where \(F\) is the cumulative distribution function (CDF) and \(\overline{xyz}\) is the linear trend of the data. This equation is valid for multiplicative adjustment. Based on the DQM method of [Cannon2015].

Parameters
  • Train step

  • nquantiles (int or 1d array of floats) – The number of quantiles to use. Two endpoints at 1e-6 and 1 - 1e-6 will be added. An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles (+2 endpoints).

  • kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.

  • group (Union[str, Grouper]) – The grouping information. See xclim.sdba.base.Grouper for details. Default is “time”, meaning an single adjustment group along dimension “time”.

  • Adjust step

  • interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “nearest”.

  • detrend (int or BaseDetrend instance) – The method to use when detrending. If an int is passed, it is understood as a PolyDetrend (polynomial detrending) degree. Defaults to 1 (linear detrending)

  • extrapolation ({‘constant’, ‘nan’}) – The type of extrapolation to use. See xclim.sdba.utils.extrapolate_qm() for details. Defaults to “constant”.

References

Cannon2015(1,2)

Cannon, A. J., Sobie, S. R., & Murdock, T. Q. (2015). Bias correction of GCM precipitation by quantile mapping: How well do methods preserve changes in quantiles and extremes? Journal of Climate, 28(17), 6938–6959. https://doi.org/10.1175/JCLI-D-14-00754.1

class xclim.sdba.adjustment.EmpiricalQuantileMapping(*args, _trained=False, **kwargs)[source]

Bases: xclim.sdba.adjustment.TrainAdjust

Empirical Quantile Mapping bias-adjustment.

Adjustment factors are computed between the quantiles of ref and sim. Values of sim are matched to the corresponding quantiles of hist and corrected accordingly.

\[F^{-1}_{ref} (F_{hist}(sim))\]

where \(F\) is the cumulative distribution function (CDF) and mod stands for model data.

Parameters
  • Train step

  • nquantiles (int or 1d array of floats) – The number of quantiles to use. Two endpoints at 1e-6 and 1 - 1e-6 will be added. An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles (+2 endpoints).

  • kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.

  • group (Union[str, Grouper]) – The grouping information. See xclim.sdba.base.Grouper for details. Default is “time”, meaning an single adjustment group along dimension “time”.

  • Adjust step

  • interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “nearset”.

  • extrapolation ({‘constant’, ‘nan’}) – The type of extrapolation to use. See xclim.sdba.utils.extrapolate_qm() for details. Defaults to “constant”.

References

Dequé, M. (2007). Frequency of precipitation and temperature extremes over France in an anthropogenic scenario: Model results and statistical correction according to observed values. Global and Planetary Change, 57(1–2), 16–26. https://doi.org/10.1016/j.gloplacha.2006.11.030

_allow_diff_calendars = False

class xclim.sdba.adjustment.LOCI(*args, _trained=False, **kwargs)[source]

Bases: xclim.sdba.adjustment.TrainAdjust

Local Intensity Scaling (LOCI) bias-adjustment.

This bias adjustment method is designed to correct daily precipitation time series by considering wet and dry days separately ([Schmidli2006]).

Multiplicative adjustment factors are computed such that the mean of hist matches the mean of ref for values above a threshold.

The threshold on the training target ref is first mapped to hist by finding the quantile in hist having the same exceedance probability as thresh in ref. The adjustment factor is then given by

\[s = \frac{\left \langle ref: ref \geq t_{ref} \right\rangle - t_{ref}}{\left \langle hist : hist \geq t_{hist} \right\rangle - t_{hist}}\]

In the case of precipitations, the adjustment factor is the ratio of wet-days intensity.

For an adjustment factor s, the bias-adjustment of sim is:

\[sim(t) = \max\left(t_{ref} + s \cdot (hist(t) - t_{hist}), 0\right)\]
Parameters
  • Train step

  • group (Union[str, Grouper]) – The grouping information. See xclim.sdba.base.Grouper for details. Default is “time”, meaning an single adjustment group along dimension “time”.

  • thresh (str) – The threshold in ref above which the values are scaled.

  • Adjust step

  • interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use then interpolating the adjustment factors. Defaults to “linear”.

References

Schmidli2006

Schmidli, J., Frei, C., & Vidale, P. L. (2006). Downscaling from GCM precipitation: A benchmark for dynamical and statistical downscaling methods. International Journal of Climatology, 26(5), 679–689. DOI:10.1002/joc.1287

class xclim.sdba.adjustment.NpdfTransform(*args, _trained=False, **kwargs)[source]

Bases: xclim.sdba.adjustment.Adjust

N-dimensional probability density function transform.

This adjustment object combines both training and adjustent steps in the adjust class method.

A multivariate bias-adjustment algorithm described by [Cannon18], as part of the MBCn algorithm, based on a color-correction algorithm described by [Pitie05].

This algorithm in itself, when used with QuantileDeltaMapping, is NOT trend-preserving. The full MBCn algorithm includes a reordering step provided here by xclim.sdba.processing.reordering().

See notes for an explanation of the algorithm.

Parameters
  • base (BaseAdjustment) – An univariate bias-adjustment class. This is untested for anything else than QuantileDeltaMapping.

  • base_kws (dict, optional) – Arguments passed to the training of the univariate adjustment.

  • n_escore (int) – The number of elements to send to the escore function. The default, 0, means all elements are included. Pass -1 to skip computing the escore completely. Small numbers result in less significative scores, but the execution time goes up quickly with large values.

  • n_iter (int) – The number of iterations to perform. Defaults to 20.

  • pts_dim (str) – The name of the “multivariate” dimension. Defaults to “variables”, which is the normal case when using xclim.sdba.base.stack_variables().

  • adj_kws (dict, optional) – Dictionary of arguments to pass to the adjust method of the univariate adjustment.

  • rot_matrices (xr.DataArray, optional) – The rotation matrices as a 3D array (‘iterations’, <pts_dim>, <anything>), with shape (n_iter, <N>, <N>). If left empty, random rotation matrices will be automatically generated.

Notes

The historical reference (\(T\), for “target”), simulated historical (\(H\)) and simulated projected (\(S\)) datasets are constructed by stacking the timeseries of N variables together. The algoriths goes into the following steps:

  1. Rotate the datasets in the N-dimensional variable space with \(\mathbf{R}\), a random rotation NxN matrix.

..math

\tilde{\mathbf{T}} = \mathbf{T}\mathbf{R} \\
\tilde{\mathbf{H}} = \mathbf{H}\mathbf{R} \\
\tilde{\mathbf{S}} = \mathbf{S}\mathbf{R}

2. An univariate bias-adjustment \(\mathcal{F}\) is used on the rotated datasets. The adjustments are made in additive mode, for each variable \(i\).

\[\hat{\mathbf{H}}_i, \hat{\mathbf{S}}_i = \mathcal{F}\left(\tilde{\mathbf{T}}_i, \tilde{\mathbf{H}}_i, \tilde{\mathbf{S}}_i\right)\]
  1. The bias-adjusted datasets are rotated back.

\[\begin{split}\mathbf{H}' = \hat{\mathbf{H}}\mathbf{R} \\ \mathbf{S}' = \hat{\mathbf{S}}\mathbf{R}\end{split}\]

These three steps are repeated a certain number of times, prescribed by argument n_iter. At each iteration, a new random rotation matrix is generated.

The original algorithm ([Pitie05]), stops the iteration when some distance score converges. Following [Cannon18] and the MBCn implementation in [CannonR], we instead fix the number of iterations.

As done by [Cannon18], the distance score chosen is the “Energy distance” from [SkezelyRizzo] (see xclim.sdba.processing.escore()).

The random matrices are generated following a method laid out by [Mezzadri].

This is only part of the full MBCn algorithm, see The MBCn algorithm for an example on how to replicate the full method with xclim. This includes a standardization of the simulated data beforehand, an initial univariate adjustment and the reordering of those adjusted series according to the rank structure of the output of this algorithm.

References

Cannon18(1,2)

Cannon, A. J. (2018). Multivariate quantile mapping bias correction: An N-dimensional probability density function transform for climate model simulations of multiple variables. Climate Dynamics, 50(1), 31–49. https://doi.org/10.1007/s00382-017-3580-6

Pitie05(1,2)

Pitie, F., Kokaram, A. C., & Dahyot, R. (2005). N-dimensional probability density function transfer and its application to color transfer. Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, 2, 1434-1439 Vol. 2. https://doi.org/10.1109/ICCV.2005.166

SkezelyRizzo

Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5)

class xclim.sdba.adjustment.PrincipalComponents(*args, _trained=False, **kwargs)[source]

Bases: xclim.sdba.adjustment.TrainAdjust

Principal component adjustment.

This bias-correction method maps model simulation values to the observation space through principal components ([hnilica2017]). Values in the simulation space (multiple variables, or multiple sites) can be thought of as coordinates along axes, such as variable, temperature, etc. Principal components (PC) are a linear combinations of the original variables where the coefficients are the eigenvectors of the covariance matrix. Values can then be expressed as coordinates along the PC axes. The method makes the assumption that bias-corrected values have the same coordinates along the PC axes of the observations. By converting from the observation PC space to the original space, we get bias corrected values. See notes for a mathematical explanation.

Note that principal components is meant here as the algebraic operation defining a coordinate system based on the eigenvectors, not statistical principal component analysis.

Parameters
  • group (Union[str, Grouper]) – The grouping information. pts_dims can also be given through Grouper’s add_dims argument. See Notes. See xclim.sdba.base.Grouper for details. The adjustment will be performed on each group independently. Default is “time”, meaning an single adjustment group along dimension “time”.

  • crd_dims (Sequence of str, optional) – The data dimension(s) along which the multiple simulation space dimensions are taken. They are flattened into “coordinate” dimension, see Notes. Default is None in which case all dimensions shared by ref and hist, except those in pts_dims are used. The training algorithm currently doesn’t support any chunking along the coordinate and point dimensions.

  • pts_dims (Sequence of str, optional) – The data dimensions to flatten into the “points” dimension, see Notes. They will be merged with those given through the add_dims property of group.

Notes

The input data is understood as a set of N points in a \(M\)-dimensional space.

  • \(N\) is taken along the data coordinates listed in pts_dims and the group (the main dim but also the add_dims).

  • \(M\) is taken along the data coordinates listed in crd_dims, the default being all except those in pts_dims.

For example, for a 3D matrix of data, say in (lat, lon, time), we could say that all spatial points are independent dimensions of the simulation space by passing crd_dims=['lat', 'lon']. For a (5, 5, 365) array, this results in a 25-dimensions space, i.e. \(M = 25\) and \(N = 365\).

Thus, the adjustment is equivalent to a linear transformation of these \(N\) points in a \(M\)-dimensional space.

The principal components (PC) of hist and ref are used to defined new coordinate systems, centered on their respective means. The training step creates a matrix defining the transformation from hist to ref:

\[scen = e_{R} + \mathrm{\mathbf{T}}(sim - e_{H})\]

Where:

\[\mathrm{\mathbf{T}} = \mathrm{\mathbf{R}}\mathrm{\mathbf{H}}^{-1}\]

\(\mathrm{\mathbf{R}}\) is the matrix transforming from the PC coordinates computed on ref to the data coordinates. Similarly, \(\mathrm{\mathbf{H}}\) is transform from the hist PC to the data coordinates (\(\mathrm{\mathbf{H}}\) is the inverse transformation). \(e_R\) and \(e_H\) are the centroids of the ref and hist distributions respectively. Upon running the adjust step, one may decide to use \(e_S\), the centroid of the sim distribution, instead of \(e_H\).

References

hnilica2017

Hnilica, J., Hanel, M. and Puš, V. (2017), Multisite bias correction of precipitation data from regional climate models. Int. J. Climatol., 37: 2934-2946. https://doi.org/10.1002/joc.4890

class xclim.sdba.adjustment.QuantileDeltaMapping(*args, _trained=False, **kwargs)[source]

Bases: xclim.sdba.adjustment.EmpiricalQuantileMapping

Quantile Delta Mapping bias-adjustment.

Adjustment factors are computed between the quantiles of ref and hist. Quantiles of sim are matched to the corresponding quantiles of hist and corrected accordingly.

\[sim\frac{F^{-1}_{ref}\left[F_{sim}(sim)\right]}{F^{-1}_{hist}\left[F_{sim}(sim)\right]}\]

where \(F\) is the cumulative distribution function (CDF). This equation is valid for multiplicative adjustment. The algorithm is based on the “QDM” method of [Cannon2015].

Parameters
  • Train step

  • nquantiles (int or 1d array of floats) – The number of quantiles to use. Two endpoints at 1e-6 and 1 - 1e-6 will be added. An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles (+2 endpoints).

  • kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.

  • group (Union[str, Grouper]) – The grouping information. See xclim.sdba.base.Grouper for details. Default is “time”, meaning an single adjustment group along dimension “time”.

  • Adjust step

  • interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “nearest”.

  • extrapolation ({‘constant’, ‘nan’}) – The type of extrapolation to use. See xclim.sdba.utils.extrapolate_qm() for details. Defaults to “constant”.

  • Extra diagnostics

  • —————–

  • In adjustment

  • quantiles (The quantile of each value of sim. The adjustment factor is interpolated using this as the “quantile” axis on ds.af.)

References

Cannon, A. J., Sobie, S. R., & Murdock, T. Q. (2015). Bias correction of GCM precipitation by quantile mapping: How well do methods preserve changes in quantiles and extremes? Journal of Climate, 28(17), 6938–6959. https://doi.org/10.1175/JCLI-D-14-00754.1

class xclim.sdba.adjustment.Scaling(*args, _trained=False, **kwargs)[source]

Bases: xclim.sdba.adjustment.TrainAdjust

Scaling bias-adjustment.

Simple bias-adjustment method scaling variables by an additive or multiplicative factor so that the mean of hist matches the mean of ref.

Parameters
  • Train step

  • group (Union[str, Grouper]) – The grouping information. See xclim.sdba.base.Grouper for details. Default is “time”, meaning an single adjustment group along dimension “time”.

  • kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.

  • Adjust step

  • interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use then interpolating the adjustment factors. Defaults to “nearest”.

Utilities

SDBA utilities module.

xclim.sdba.utils.equally_spaced_nodes(n: int, eps: Optional[float] = 0.0001) numpy.array[source]

Return nodes with n equally spaced points within [0, 1] plus two end-points.

Parameters
  • n (int) – Number of equally spaced nodes.

  • eps (float, None) – Distance from 0 and 1 of end nodes. If None, do not add endpoints.

Returns

np.array – Nodes between 0 and 1.

Notes

For n=4, eps=0 : 0—x——x——x——x—1

Pre and post processing for bias adjustment.

xclim.sdba.processing.adapt_freq(ref: xarray.core.dataarray.DataArray, sim: xarray.core.dataarray.DataArray, *, group: Union[xclim.sdba.base.Grouper, str], thresh: str = '0 mm d-1') xarray.core.dataset.Dataset[source]

Adapt frequency of values under thresh of sim, in order to match ref.

This is useful when the dry-day frequency in the simulations is higher than in the references. This function will create new non-null values for sim/hist, so that adjustment factors are less wet-biased. Based on [Themessl2012].

Parameters
  • ds (xr.Dataset) – With variables : “ref”, Target/reference data, usually observed data. and “sim”, Simulated data.

  • dim (str) – Dimension name.

  • group (Union[str, Grouper]) – Grouping information, see base.Grouper

  • thresh (str) – Threshold below which values are considered zero, a quantity with units.

Returns

  • sim_adj (xr.DataArray) – Simulated data with the same frequency of values under threshold than ref. Adjustment is made group-wise.

  • pth (xr.DataArray) – For each group, the smallest value of sim that was not frequency-adjusted. All values smaller were either left as zero values or given a random value between thresh and pth. NaN where frequency adaptation wasn’t needed.

  • dP0 (xr.DataArray) – For each group, the percentage of values that were corrected in sim.

Notes

With \(P_0^r\) the frequency of values under threshold \(T_0\) in the reference (ref) and \(P_0^s\) the same for the simulated values, \(\\Delta P_0 = \\frac{P_0^s - P_0^r}{P_0^s}\), when positive, represents the proportion of values under \(T_0\) that need to be corrected.

The correction replaces a proportion \(\\Delta P_0\) of the values under \(T_0\) in sim by a uniform random number between \(T_0\) and \(P_{th}\), where \(P_{th} = F_{ref}^{-1}( F_{sim}( T_0 ) )\) and F(x) is the empirical cumulative distribution function (CDF).

References

Themessl2012

Themeßl et al. (2012), Empirical-statistical downscaling and error correction of regional climate models and its impact on the climate change signal, Climatic Change, DOI 10.1007/s10584-011-0224-4.

xclim.sdba.processing.jitter_under_thresh(x: xarray.core.dataarray.DataArray, thresh: str)[source]

Replace values smaller than threshold by a uniform random noise.

Do not confuse with R’s jitter, which adds uniform noise instead of replacing values.

Parameters
  • x (xr.DataArray) – Values.

  • thresh (str) – Threshold under which to add uniform random noise to values, a quantity with units.

Returns

array

Notes

If thresh is high, this will change the mean value of x.