Statistical Downscaling and Bias Adjustment
The xclim.sdba submodule provides bias-adjustment methods and will eventually provide statistical downscaling algorithms. Almost all adjustment algorithms conform to the train - adjust scheme, formalized within TrainAdjust classes. Given a reference time series (ref), historical simulations (hist) and simulations to be adjusted (sim), any bias-adjustment method would be applied by first estimating the adjustment factors between the historical simulation and the observations series, and then applying these factors to sim, which could be a future simulation:
# Create the adjustment object by training it with reference and model data, plus certains arguments
Adj = Adjustment.train(ref, hist, group="time.month")
# Get a scenario by applying the adjustment to a simulated timeseries.
scen = Adj.adjust(sim, interp="linear")
Adj.ds.af # adjustment factors.
The group argument allows adjustment factors to be estimated independently for different periods: the full time series, months, seasons or day of the year. The interp argument then allows for interpolation between these adjustment factors to avoid discontinuities in the bias-adjusted series (only applicable for monthly grouping).
Warning
If grouping according to the day of the year is needed, the xclim.core.calendar
submodule contains useful tools to manage the
different calendars that the input data can have. By default, if 2 different calendars are passed, the adjustment
factors will always be interpolated to the largest range of day of the years but this can lead to strange values
and we recommend converting the data beforehand to a common calendar.
The same interpolation principle is also used for quantiles. Indeed, for methods extracting adjustment factors by quantile, interpolation is also done between quantiles. This can help reduce discontinuities in the adjusted time series, and possibly reduce the number of quantile bins used.
Modular approach
This module adopts a modular approach instead of implementing published and named methods directly. A generic bias adjustment process is laid out as follows:
preprocessing on ref, hist and sim (using methods in xclim.sdba.processing or xclim.sdba.detrending)
creating and training the adjustment object Adj = Adjustment.train(obs, sim, **kwargs) (from xclim.sdba.adjustment)
adjustment scen = Adj.adjust(sim, **kwargs)
post-processing on scen (for example: re-trending)
The train-adjust approach allows to inspect the trained adjustment object. The training information is stored in the underlying Adj.ds dataset and always has a af variable with the adjustment factors. Its layout and the other available variables vary between the different algorithm, refer to Available methods.
Parameters needed by the training and the adjustment are saved to the Adj.ds dataset as a adj_params attribute. Other parameters, those only needed by the adjustment are passed in the adjust call and written to the history attribute in the output scenario dataarray.
Grouping
For basic time period grouping (months, day of year, season), passing a string to the methods needing it is sufficient.
Most methods acting on grouped data also accept a window int argument to pad the groups with data from adjacent ones.
Units of window are the sampling frequency of the main grouping dimension (usually time). For more complex grouping,
one can pass a xclim.sdba.base.Grouper
directly.
Notes for developers
To be scalable and performant, the sdba module makes use of the special decorators :py:func`xclim.sdba.base.map_blocks`
and xclim.sdba.base.map_groups()
. However, they have the inconvenient that functions wrapped by them are unable
to manage xarray attributes (including units) correctly and their signatures are sometime wrong and often unclear. For
this reason, the module is often divided in two parts : the (decorated) compute functions in a “private” file (ex: _adjustment.py
)
and the user-facing functions or objects in corresponding public file (ex: adjustment.py
). See the sdba-advanced
notebook for more info on the reasons for this move.
Other restrictions : map_blocks will remove any “auxiliary” coordinates before calling the wrapped function and will add them back on exit.
Available methods
Adjustment objects.
- class xclim.sdba.adjustment.BaseAdjustment(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.base.ParametrizableWithDataset
Base class for adjustment objects.
Children classes should implement the train and / or the adjust method.
This base class defined the basic input and output checks. It should only be used for a real adjustment if neither TrainAdjust or Adjust fit the algorithm.
- class xclim.sdba.adjustment.DetrendedQuantileMapping(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.TrainAdjust
Detrended Quantile Mapping bias-adjustment.
The algorithm follows these steps, 1-3 being the ‘train’ and 4-6, the ‘adjust’ steps.
A scaling factor that would make the mean of hist match the mean of ref is computed.
ref and hist are normalized by removing the “dayofyear” mean.
Adjustment factors are computed between the quantiles of the normalized ref and hist.
sim is corrected by the scaling factor, and either normalized by “dayofyear” and detrended group-wise or directly detrended per “dayofyear”, using a linear fit (modifiable).
Values of detrended sim are matched to the corresponding quantiles of normalized hist and corrected accordingly.
The trend is put back on the result.
\[F^{-1}_{ref}\left\{F_{hist}\left[\frac{\overline{hist}\cdot sim}{\overline{sim}}\right]\right\}\frac{\overline{sim}}{\overline{hist}}\]where \(F\) is the cumulative distribution function (CDF) and \(\overline{xyz}\) is the linear trend of the data. This equation is valid for multiplicative adjustment. Based on the DQM method of [Cannon2015].
- Parameters
Train step
nquantiles (int or 1d array of floats) – The number of quantiles to use. Two endpoints at 1e-6 and 1 - 1e-6 will be added. An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles (+2 endpoints).
kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning an single adjustment group along dimension “time”.Adjust step
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “nearest”.
detrend (int or BaseDetrend instance) – The method to use when detrending. If an int is passed, it is understood as a PolyDetrend (polynomial detrending) degree. Defaults to 1 (linear detrending)
extrapolation ({‘constant’, ‘nan’}) – The type of extrapolation to use. See
xclim.sdba.utils.extrapolate_qm()
for details. Defaults to “constant”.
References
- Cannon2015(1,2)
Cannon, A. J., Sobie, S. R., & Murdock, T. Q. (2015). Bias correction of GCM precipitation by quantile mapping: How well do methods preserve changes in quantiles and extremes? Journal of Climate, 28(17), 6938–6959. https://doi.org/10.1175/JCLI-D-14-00754.1
- class xclim.sdba.adjustment.EmpiricalQuantileMapping(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.TrainAdjust
Empirical Quantile Mapping bias-adjustment.
Adjustment factors are computed between the quantiles of ref and sim. Values of sim are matched to the corresponding quantiles of hist and corrected accordingly.
\[F^{-1}_{ref} (F_{hist}(sim))\]where \(F\) is the cumulative distribution function (CDF) and mod stands for model data.
- Parameters
Train step
nquantiles (int or 1d array of floats) – The number of quantiles to use. Two endpoints at 1e-6 and 1 - 1e-6 will be added. An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles (+2 endpoints).
kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning an single adjustment group along dimension “time”.Adjust step
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “nearset”.
extrapolation ({‘constant’, ‘nan’}) – The type of extrapolation to use. See
xclim.sdba.utils.extrapolate_qm()
for details. Defaults to “constant”.
References
Dequé, M. (2007). Frequency of precipitation and temperature extremes over France in an anthropogenic scenario: Model results and statistical correction according to observed values. Global and Planetary Change, 57(1–2), 16–26. https://doi.org/10.1016/j.gloplacha.2006.11.030
_allow_diff_calendars = False
- class xclim.sdba.adjustment.LOCI(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.TrainAdjust
Local Intensity Scaling (LOCI) bias-adjustment.
This bias adjustment method is designed to correct daily precipitation time series by considering wet and dry days separately ([Schmidli2006]).
Multiplicative adjustment factors are computed such that the mean of hist matches the mean of ref for values above a threshold.
The threshold on the training target ref is first mapped to hist by finding the quantile in hist having the same exceedance probability as thresh in ref. The adjustment factor is then given by
\[s = \frac{\left \langle ref: ref \geq t_{ref} \right\rangle - t_{ref}}{\left \langle hist : hist \geq t_{hist} \right\rangle - t_{hist}}\]In the case of precipitations, the adjustment factor is the ratio of wet-days intensity.
For an adjustment factor s, the bias-adjustment of sim is:
\[sim(t) = \max\left(t_{ref} + s \cdot (hist(t) - t_{hist}), 0\right)\]- Parameters
Train step
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning an single adjustment group along dimension “time”.thresh (str) – The threshold in ref above which the values are scaled.
Adjust step
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use then interpolating the adjustment factors. Defaults to “linear”.
References
- Schmidli2006
Schmidli, J., Frei, C., & Vidale, P. L. (2006). Downscaling from GCM precipitation: A benchmark for dynamical and statistical downscaling methods. International Journal of Climatology, 26(5), 679–689. DOI:10.1002/joc.1287
- class xclim.sdba.adjustment.NpdfTransform(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.Adjust
N-dimensional probability density function transform.
This adjustment object combines both training and adjustent steps in the adjust class method.
A multivariate bias-adjustment algorithm described by [Cannon18], as part of the MBCn algorithm, based on a color-correction algorithm described by [Pitie05].
This algorithm in itself, when used with QuantileDeltaMapping, is NOT trend-preserving. The full MBCn algorithm includes a reordering step provided here by
xclim.sdba.processing.reordering()
.See notes for an explanation of the algorithm.
- Parameters
base (BaseAdjustment) – An univariate bias-adjustment class. This is untested for anything else than QuantileDeltaMapping.
base_kws (dict, optional) – Arguments passed to the training of the univariate adjustment.
n_escore (int) – The number of elements to send to the escore function. The default, 0, means all elements are included. Pass -1 to skip computing the escore completely. Small numbers result in less significative scores, but the execution time goes up quickly with large values.
n_iter (int) – The number of iterations to perform. Defaults to 20.
pts_dim (str) – The name of the “multivariate” dimension. Defaults to “variables”, which is the normal case when using
xclim.sdba.base.stack_variables()
.adj_kws (dict, optional) – Dictionary of arguments to pass to the adjust method of the univariate adjustment.
rot_matrices (xr.DataArray, optional) – The rotation matrices as a 3D array (‘iterations’, <pts_dim>, <anything>), with shape (n_iter, <N>, <N>). If left empty, random rotation matrices will be automatically generated.
Notes
The historical reference (\(T\), for “target”), simulated historical (\(H\)) and simulated projected (\(S\)) datasets are constructed by stacking the timeseries of N variables together. The algoriths goes into the following steps:
Rotate the datasets in the N-dimensional variable space with \(\mathbf{R}\), a random rotation NxN matrix.
..math
\tilde{\mathbf{T}} = \mathbf{T}\mathbf{R} \\ \tilde{\mathbf{H}} = \mathbf{H}\mathbf{R} \\ \tilde{\mathbf{S}} = \mathbf{S}\mathbf{R}
2. An univariate bias-adjustment \(\mathcal{F}\) is used on the rotated datasets. The adjustments are made in additive mode, for each variable \(i\).
\[\hat{\mathbf{H}}_i, \hat{\mathbf{S}}_i = \mathcal{F}\left(\tilde{\mathbf{T}}_i, \tilde{\mathbf{H}}_i, \tilde{\mathbf{S}}_i\right)\]The bias-adjusted datasets are rotated back.
\[\begin{split}\mathbf{H}' = \hat{\mathbf{H}}\mathbf{R} \\ \mathbf{S}' = \hat{\mathbf{S}}\mathbf{R}\end{split}\]These three steps are repeated a certain number of times, prescribed by argument
n_iter
. At each iteration, a new random rotation matrix is generated.The original algorithm ([Pitie05]), stops the iteration when some distance score converges. Following [Cannon18] and the MBCn implementation in [CannonR], we instead fix the number of iterations.
As done by [Cannon18], the distance score chosen is the “Energy distance” from [SkezelyRizzo] (see
xclim.sdba.processing.escore()
).The random matrices are generated following a method laid out by [Mezzadri].
This is only part of the full MBCn algorithm, see The MBCn algorithm for an example on how to replicate the full method with xclim. This includes a standardization of the simulated data beforehand, an initial univariate adjustment and the reordering of those adjusted series according to the rank structure of the output of this algorithm.
References
- Cannon18(1,2)
Cannon, A. J. (2018). Multivariate quantile mapping bias correction: An N-dimensional probability density function transform for climate model simulations of multiple variables. Climate Dynamics, 50(1), 31–49. https://doi.org/10.1007/s00382-017-3580-6
- Pitie05(1,2)
Pitie, F., Kokaram, A. C., & Dahyot, R. (2005). N-dimensional probability density function transfer and its application to color transfer. Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, 2, 1434-1439 Vol. 2. https://doi.org/10.1109/ICCV.2005.166
- SkezelyRizzo
Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5)
- class xclim.sdba.adjustment.PrincipalComponents(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.TrainAdjust
Principal component adjustment.
This bias-correction method maps model simulation values to the observation space through principal components ([hnilica2017]). Values in the simulation space (multiple variables, or multiple sites) can be thought of as coordinates along axes, such as variable, temperature, etc. Principal components (PC) are a linear combinations of the original variables where the coefficients are the eigenvectors of the covariance matrix. Values can then be expressed as coordinates along the PC axes. The method makes the assumption that bias-corrected values have the same coordinates along the PC axes of the observations. By converting from the observation PC space to the original space, we get bias corrected values. See notes for a mathematical explanation.
Note that principal components is meant here as the algebraic operation defining a coordinate system based on the eigenvectors, not statistical principal component analysis.
- Parameters
group (Union[str, Grouper]) – The grouping information. pts_dims can also be given through Grouper’s add_dims argument. See Notes. See
xclim.sdba.base.Grouper
for details. The adjustment will be performed on each group independently. Default is “time”, meaning an single adjustment group along dimension “time”.crd_dims (Sequence of str, optional) – The data dimension(s) along which the multiple simulation space dimensions are taken. They are flattened into “coordinate” dimension, see Notes. Default is None in which case all dimensions shared by ref and hist, except those in pts_dims are used. The training algorithm currently doesn’t support any chunking along the coordinate and point dimensions.
pts_dims (Sequence of str, optional) – The data dimensions to flatten into the “points” dimension, see Notes. They will be merged with those given through the add_dims property of group.
Notes
The input data is understood as a set of N points in a \(M\)-dimensional space.
\(N\) is taken along the data coordinates listed in pts_dims and the group (the main dim but also the add_dims).
\(M\) is taken along the data coordinates listed in crd_dims, the default being all except those in pts_dims.
For example, for a 3D matrix of data, say in (lat, lon, time), we could say that all spatial points are independent dimensions of the simulation space by passing
crd_dims=['lat', 'lon']
. For a (5, 5, 365) array, this results in a 25-dimensions space, i.e. \(M = 25\) and \(N = 365\).Thus, the adjustment is equivalent to a linear transformation of these \(N\) points in a \(M\)-dimensional space.
The principal components (PC) of hist and ref are used to defined new coordinate systems, centered on their respective means. The training step creates a matrix defining the transformation from hist to ref:
\[scen = e_{R} + \mathrm{\mathbf{T}}(sim - e_{H})\]Where:
\[\mathrm{\mathbf{T}} = \mathrm{\mathbf{R}}\mathrm{\mathbf{H}}^{-1}\]\(\mathrm{\mathbf{R}}\) is the matrix transforming from the PC coordinates computed on ref to the data coordinates. Similarly, \(\mathrm{\mathbf{H}}\) is transform from the hist PC to the data coordinates (\(\mathrm{\mathbf{H}}\) is the inverse transformation). \(e_R\) and \(e_H\) are the centroids of the ref and hist distributions respectively. Upon running the adjust step, one may decide to use \(e_S\), the centroid of the sim distribution, instead of \(e_H\).
References
- hnilica2017
Hnilica, J., Hanel, M. and Puš, V. (2017), Multisite bias correction of precipitation data from regional climate models. Int. J. Climatol., 37: 2934-2946. https://doi.org/10.1002/joc.4890
- class xclim.sdba.adjustment.QuantileDeltaMapping(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.EmpiricalQuantileMapping
Quantile Delta Mapping bias-adjustment.
Adjustment factors are computed between the quantiles of ref and hist. Quantiles of sim are matched to the corresponding quantiles of hist and corrected accordingly.
\[sim\frac{F^{-1}_{ref}\left[F_{sim}(sim)\right]}{F^{-1}_{hist}\left[F_{sim}(sim)\right]}\]where \(F\) is the cumulative distribution function (CDF). This equation is valid for multiplicative adjustment. The algorithm is based on the “QDM” method of [Cannon2015].
- Parameters
Train step
nquantiles (int or 1d array of floats) – The number of quantiles to use. Two endpoints at 1e-6 and 1 - 1e-6 will be added. An array of quantiles [0, 1] can also be passed. Defaults to 20 quantiles (+2 endpoints).
kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning an single adjustment group along dimension “time”.Adjust step
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use when interpolating the adjustment factors. Defaults to “nearest”.
extrapolation ({‘constant’, ‘nan’}) – The type of extrapolation to use. See
xclim.sdba.utils.extrapolate_qm()
for details. Defaults to “constant”.Extra diagnostics
—————–
In adjustment
quantiles (The quantile of each value of sim. The adjustment factor is interpolated using this as the “quantile” axis on ds.af.)
References
Cannon, A. J., Sobie, S. R., & Murdock, T. Q. (2015). Bias correction of GCM precipitation by quantile mapping: How well do methods preserve changes in quantiles and extremes? Journal of Climate, 28(17), 6938–6959. https://doi.org/10.1175/JCLI-D-14-00754.1
- class xclim.sdba.adjustment.Scaling(*args, _trained=False, **kwargs)[source]
Bases:
xclim.sdba.adjustment.TrainAdjust
Scaling bias-adjustment.
Simple bias-adjustment method scaling variables by an additive or multiplicative factor so that the mean of hist matches the mean of ref.
- Parameters
Train step
group (Union[str, Grouper]) – The grouping information. See
xclim.sdba.base.Grouper
for details. Default is “time”, meaning an single adjustment group along dimension “time”.kind ({‘+’, ‘’}*) – The adjustment kind, either additive or multiplicative. Defaults to “+”.
Adjust step
interp ({‘nearest’, ‘linear’, ‘cubic’}) – The interpolation method to use then interpolating the adjustment factors. Defaults to “nearest”.
Utilities
SDBA utilities module.
- xclim.sdba.utils.equally_spaced_nodes(n: int, eps: Optional[float] = 0.0001) numpy.array [source]
Return nodes with n equally spaced points within [0, 1] plus two end-points.
- Parameters
n (int) – Number of equally spaced nodes.
eps (float, None) – Distance from 0 and 1 of end nodes. If None, do not add endpoints.
- Returns
np.array – Nodes between 0 and 1.
Notes
For n=4, eps=0 : 0—x——x——x——x—1
Pre and post processing for bias adjustment.
- xclim.sdba.processing.adapt_freq(ref: xarray.core.dataarray.DataArray, sim: xarray.core.dataarray.DataArray, *, group: Union[xclim.sdba.base.Grouper, str], thresh: str = '0 mm d-1') xarray.core.dataset.Dataset [source]
Adapt frequency of values under thresh of sim, in order to match ref.
This is useful when the dry-day frequency in the simulations is higher than in the references. This function will create new non-null values for sim/hist, so that adjustment factors are less wet-biased. Based on [Themessl2012].
- Parameters
ds (xr.Dataset) – With variables : “ref”, Target/reference data, usually observed data. and “sim”, Simulated data.
dim (str) – Dimension name.
group (Union[str, Grouper]) – Grouping information, see base.Grouper
thresh (str) – Threshold below which values are considered zero, a quantity with units.
- Returns
sim_adj (xr.DataArray) – Simulated data with the same frequency of values under threshold than ref. Adjustment is made group-wise.
pth (xr.DataArray) – For each group, the smallest value of sim that was not frequency-adjusted. All values smaller were either left as zero values or given a random value between thresh and pth. NaN where frequency adaptation wasn’t needed.
dP0 (xr.DataArray) – For each group, the percentage of values that were corrected in sim.
Notes
With \(P_0^r\) the frequency of values under threshold \(T_0\) in the reference (ref) and \(P_0^s\) the same for the simulated values, \(\\Delta P_0 = \\frac{P_0^s - P_0^r}{P_0^s}\), when positive, represents the proportion of values under \(T_0\) that need to be corrected.
The correction replaces a proportion \(\\Delta P_0\) of the values under \(T_0\) in sim by a uniform random number between \(T_0\) and \(P_{th}\), where \(P_{th} = F_{ref}^{-1}( F_{sim}( T_0 ) )\) and F(x) is the empirical cumulative distribution function (CDF).
References
- Themessl2012
Themeßl et al. (2012), Empirical-statistical downscaling and error correction of regional climate models and its impact on the climate change signal, Climatic Change, DOI 10.1007/s10584-011-0224-4.
- xclim.sdba.processing.jitter_under_thresh(x: xarray.core.dataarray.DataArray, thresh: str)[source]
Replace values smaller than threshold by a uniform random noise.
Do not confuse with R’s jitter, which adds uniform noise instead of replacing values.
- Parameters
x (xr.DataArray) – Values.
thresh (str) – Threshold under which to add uniform random noise to values, a quantity with units.
- Returns
array
Notes
If thresh is high, this will change the mean value of x.