Spatial Analogues

Spatial analogues are maps showing which areas have a present-day climate that is analogous to the future climate of a given place. This type of map can be useful for climate adaptation to see how well regions are coping today under specific climate conditions. For example, officials from a city located in a temperate region that may be expecting more heatwaves in the future can learn from the experience of another city where heatwaves are a common occurrence, leading to more proactive intervention plans to better deal with new climate conditions.

Spatial analogues are estimated by comparing the distribution of climate indices computed at the target location over the future period with the distribution of the same climate indices computed over a reference period for multiple candidate regions. A number of methodological choices thus enter the computation:

Climate indices of interest,

Metrics measuring the difference between both distributions,

Reference data from which to compute the base indices,

A future climate scenario to compute the target indices.

The climate indices chosen to compute the spatial analogues are usually annual values of indices relevant to the intended audience of these maps. For example, in the case of the wine grape industry, the climate indices examined could include the length of the frost-free season, growing degree-days, annual winter minimum temperature and annual number of very cold days [Roy2017].

See Spatial Analogues examples.

Methods to compute the (dis)similarity between samples

This module implements all methods described in [Grenier2013] to measure the dissimilarity between two samples, plus the Székely-Rizzo energy distance, Some of these algorithms can be used to test whether two samples have been drawn from the same distribution. Here, they are used in finding areas with analogue climate conditions to a target climate.

Standardized Euclidean distance

Nearest Neighbour distance

Zech-Aslan energy statistic

Székely-Rizzo energy distance

Friedman-Rafsky runs statistic

Kolmogorov-Smirnov statistic

Kullback-Leibler divergence

All methods accept arrays, the first is the reference (n, D) and the second is the candidate (m, D). Where the climate indicators vary along D and the distribution dimension along n or m. All methods output a single float. See their documentation in Analogue metrics API.

Warning

Some methods are scale-invariant and others are not. This is indicated in the docstring of the methods as it can change the results significantly. In most cases, scale-invariance is desirable and inputs may need to be scaled beforehand for scale-dependent methods.

References

Roy2017: Roy, P., Grenier, P., Barriault, E. et al. Climatic Change (2017) 143: 43. https://doi.org/10.1007/s10584-017-1960-x
Grenier2013(1,2): Grenier, P., A.-C. Parent, D. Huard, F. Anctil, and D. Chaumont, 2013: An assessment of six dissimilarity metrics for climate analogs. J. Appl. Meteor. Climatol., 52, 733–752, https://doi.org/10.1175/JAMC-D-12-0170.1

xclim.analog.spatial_analogs(target: xr.Dataset, candidates: xr.Dataset, dist_dim: str | Sequence[str] = 'time', method: str = 'kldiv', **kwargs)[source]

Compute dissimilarity statistics between target points and candidate points.

Spatial analogues based on the comparison of climate indices. The algorithm compares the distribution of the reference indices with the distribution of spatially distributed candidate indices and returns a value measuring the dissimilarity between both distributions over the candidate grid.

Parameters

target (xr.Dataset) – Dataset of the target indices. Only indice variables should be included in the dataset’s data_vars. They should have only the dimension(s) dist_dim `in common with `candidates.
candidates (xr.Dataset) – Dataset of the candidate indices. Only indice variables should be included in the dataset’s data_vars.
dist_dim (str) – The dimension over which the distributions are constructed. This can be a multi-index dimension.
method ({‘seuclidean’, ‘nearest_neighbor’, ‘zech_aslan’, ‘kolmogorov_smirnov’, ‘friedman_rafsky’, ‘kldiv’}) – Which method to use when computing the dissimilarity statistic.
kwargs – Any other parameter passed directly to the dissimilarity method.

Returns

xr.DataArray – The dissimilarity statistic over the union of candidates’ and target’s dimensions. The range depends on the method.

Analogue metrics API

xclim.analog.friedman_rafsky(x: ndarray, y: ndarray) → float[source]

Compute a dissimilarity metric based on the Friedman-Rafsky runs statistics.

The algorithm builds a minimal spanning tree (the subset of edges connecting all points that minimizes the total edge length) then counts the edges linking points from the same distribution. This method is scale-dependent.

Parameters

x (np.ndarray (n,d)) – Reference sample.
y (np.ndarray (m,d)) – Candidate sample.

Returns

float – Friedman-Rafsky dissimilarity metric ranging from 0 to (m+n-1)/(m+n).

References

Friedman J.H. and Rafsky, L.C. (1979) Multivariate generalisations of the Wald-Wolfowitz and Smirnov two-sample tests. Annals of Stat. Vol.7, No. 4, 697-717. https://doi.org/10.1214/aos/1176344722.

xclim.analog.kldiv(x: np.ndarray, y: np.ndarray, *, k: int | Sequence[int] = 1) → float | Sequence[float][source]

Compute the Kullback-Leibler divergence between two multivariate samples.

where \(r_k(x_i)\) and \(s_k(x_i)\) are, respectively, the euclidean distance to the kth neighbour of \(x_i\) in the x array (excepting \(x_i\)) and in the y array. This method is scale-dependent.

Parameters

x (np.ndarray (n,d)) – Samples from distribution P, which typically represents the true distribution (reference).
y (np.ndarray (m,d)) – Samples from distribution Q, which typically represents the approximate distribution (candidate)
k (int or sequence) – The kth neighbours to look for when estimating the density of the distributions. Defaults to 1, which can be noisy.

Returns

float or sequence – The estimated Kullback-Leibler divergence D(P||Q) computed from the distances to the kth neighbour.

Notes

In information theory, the Kullback–Leibler divergence ([perezcruz08]) is a non-symmetric measure of the difference between two probability distributions P and Q, where P is the “true” distribution and Q an approximation. This nuance is important because \(D(P||Q)\) is not equal to \(D(Q||P)\).

For probability distributions P and Q of a continuous random variable, the K–L divergence is defined as:

\[D_{KL}(P||Q) = \int p(x) \log\left(\frac{p(x)}{q(x)}\right) dx\]

This formula assumes we have a representation of the probability densities \(p(x)\) and \(q(x)\). In many cases, we only have samples from the distribution, and most methods first estimate the densities from the samples and then proceed to compute the K-L divergence. In Perez-Cruz, the authors propose an algorithm to estimate the K-L divergence directly from the sample using an empirical CDF. Even though the CDFs do not converge to their true values, the paper proves that the K-L divergence almost surely does converge to its true value.

References

perezcruz08: Perez-Cruz, F. (2008). Kullback-Leibler divergence estimation of continuous distributions. 2008 IEEE International Symposium on Information Theory, 1666‑1670. https://doi.org/10.1109/ISIT.2008.4595271

xclim.analog.kolmogorov_smirnov(x: ndarray, y: ndarray) → float[source]

Compute the Kolmogorov-Smirnov statistic applied to two multivariate samples as described by Fasano and Franceschini.

This method is scale-dependent.

Parameters

x (np.ndarray (n,d)) – Reference sample.
y (np.ndarray (m,d)) – Candidate sample.

Returns

float – Kolmogorov-Smirnov dissimilarity metric ranging from 0 to 1.

References

Fasano, G., & Franceschini, A. (1987). A multidimensional version of the Kolmogorov-Smirnov test. Monthly Notices of the Royal Astronomical Society, 225, 155‑170. https://doi.org/10.1093/mnras/225.1.155

xclim.analog.nearest_neighbor(x: ndarray, y: ndarray) → ndarray[source]

Compute a dissimilarity metric based on the number of points in the pooled sample whose nearest neighbor belongs to the same distribution.

This method is scale-invariant.

Parameters

x (np.ndarray (n,d)) – Reference sample.
y (np.ndarray (m,d)) – Candidate sample.

Returns

float – Nearest-Neighbor dissimilarity metric ranging from 0 to 1.

References

Henze N. (1988) A Multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann. of Stat., Vol. 16, No.2, 772-783. https://doi.org/10.1214/aos/1176350835.

xclim.analog.seuclidean(x: ndarray, y: ndarray) → float[source]

Compute the Euclidean distance between the mean of a multivariate candidate sample with respect to the mean of a reference sample.

This method is scale-invariant.

Parameters

x (np.ndarray (n,d)) – Reference sample.
y (np.ndarray (m,d)) – Candidate sample.

Returns

float – Standardized Euclidean Distance between the mean of the samples ranging from 0 to infinity.

Notes

This metric considers neither the information from individual points nor the standard deviation of the candidate distribution.

References

Veloz et al. (2011) Identifying climatic analogs for Wisconsin under 21st-century climate-change scenarios. Climatic Change, https://doi.org/10.1007/s10584-011-0261-z.

xclim.analog.szekely_rizzo(x: ndarray, y: ndarray, *, standardize: bool = True) → float[source]

Compute the Székely-Rizzo energy distance dissimilarity metric based on an analogy with Newton’s gravitational potential energy.

This method is scale-invariant when standardize=True (default), scale-dependent otherwise.

Parameters

x (ndarray (n,d)) – Reference sample.
y (ndarray (m,d)) – Candidate sample.
standardize (bool) – If True (default), the standardized euclidean norm is used, instead of the conventional one.

Returns

float – Székely-Rizzo’s energy distance dissimilarity metric ranging from 0 to infinity.

Notes

The e-distance between two variables \(X\), \(Y\) (target and candidates) of sizes \(n,d\) and \(m,d\) proposed by [SR2004] is defined by:

\[e(X, Y) = \frac{n m}{n + m} \left[2\phi_{xy} − \phi_{xx} − \phi_{yy} \right]\]

where

\[\begin{split}\phi_{xy} &= \frac{1}{n m} \sum_{i = 1}^n \sum_{j = 1}^m \left\Vert X_i − Y_j \right\Vert \\ \phi_{xx} &= \frac{1}{n^2} \sum_{i = 1}^n \sum_{j = 1}^n \left\Vert X_i − X_j \right\Vert \\ \phi_{yy} &= \frac{1}{m^2} \sum_{i = 1}^m \sum_{j = 1}^m \left\Vert X_i − Y_j \right\Vert \\\end{split}\]

and where \(\Vert\cdot\Vert\) denotes the Euclidean norm, \(X_i\) denotes the i-th observation of \(X\). When standardized=False, this corresponds to the \(T\) test of [RS2016] (p. 28) and to the eqdist.e function of the energy R package (with two samples) and gives results twice as big as xclim.sdba.processing.escore(). The standardization was added following the logic of [Grenier2013] to make the metric scale-invariant.

References

SR2004: Székely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5). https://www.researchgate.net/publication/228918499_Testing_for_equal_distributions_in_high_dimension
RS2016: Rizzo, M. L., & Székely, G. J. (2016). Energy distance. Wiley Interdisciplinary Reviews: Computational Statistics, 8(1), 27–38. https://doi.org/10.1002/wics.1375

xclim.analog.zech_aslan(x: ndarray, y: ndarray, *, dmin: float = 1e-12) → float[source]

Compute a modified Zech-Aslan energy distance dissimilarity metric based on an analogy with the energy of a cloud of electrical charges.

This method is scale-invariant.

Parameters

x (np.ndarray (n,d)) – Reference sample.
y (np.ndarray (m,d)) – Candidate sample.
dmin (float) – The cut-off for low distances to avoid singularities on identical points.

Returns

float – Zech-Aslan dissimilarity metric ranging from -infinity to infinity.

Notes

The energy measure between two variables \(X\), \(Y\) (target and candidates) of sizes \(n,d\) and \(m,d\) proposed by [AZ03] is defined by:

\[\begin{split}e(X, Y) &= \left[\phi_{xx} + \phi_{yy} - \phi_{xy}\right] \\ \phi_{xy} &= \frac{1}{n m} \sum_{i = 1}^n \sum_{j = 1}^m R\left[SED(X_i, Y_j)\right] \\ \phi_{xx} &= \frac{1}{n^2} \sum_{i = 1}^n \sum_{j = i + 1}^n R\left[SED(X_i, X_j)\right] \\ \phi_{yy} &= \frac{1}{m^2} \sum_{i = 1}^m \sum_{j = i + 1}^m R\left[SED(X_i, Y_j)\right] \\\end{split}\]

where \(X_i\) denotes the i-th observation of \(X\). \(R\) is a weight function and \(SED(A, B)\) denotes the standardized Euclidean distance.

\[\begin{split}R(r) &= \left\{\begin{array}{r l} -\ln r & \text{for } r > d_{min} \\ -\ln d_{min} & \text{for } r \leq d_{min} \end{array}\right. \\ SED(X_i, Y_j) &= \sqrt{\sum_{k=1}^d \frac{\left(X_i(k) - Y_i(k)\right)^2}{\sigma_x(k)\sigma_y(k)}}\end{split}\]

where \(k\) is a counter over dimensions (indices in the case of spatial analogs) and \(\sigma_x(k)\) is the standard deviation of \(X\) in dimension \(k\). Finally, \(d_{min}\) is a cut-off to avoid poles when \(r \to 0\), it is controllable through the dmin parameter.

This version corresponds the \(D_{ZAE}\) test of [Grenier2013] (eq. 7), which is a version of \(\phi_{NM}\) from [AZ03], modified by using the standardized euclidean distance, the log weight function and choosing \(d_{min} = 10^{-12}\).

References

AZ03(1,2): Aslan B. and Zech G. (2003) A new class of binning-free, multivariate goodness-of-fit tests: the energy tests. https://doi.org/10.48550/arXiv.hep-ex/0203010

Utilities for developers

xclim.analog.metric(func)[source]

Register a metric function in the metrics mapping and add some preparation/checking code.

All metric functions accept 2D inputs. This reshapes 1D inputs to (n, 1) and (m, 1). All metric functions are invalid when any non-finite values are present in the inputs.

xclim.analog.standardize(x: np.ndarray, y: np.ndarray) → tuple[np.ndarray, np.ndarray][source]

Standardize x and y by the square root of the product of their standard deviation.

Parameters

x (np.ndarray) – Array to be compared.
y (np.ndarray) – Array to be compared.

Returns

(ndarray, ndarray) – Standardized arrays.