xclim.ensembles package
Ensemble tools
This submodule defines some useful methods for dealing with ensembles of climate simulations. In xclim, an “ensemble” is a Dataset or a DataArray where multiple climate realizations or models are concatenated along the realization dimension.
Submodules
xclim.ensembles._base module
Ensembles Creation and Statistics
- xclim.ensembles._base._ens_align_datasets(datasets, multifile=False, resample_freq=None, calendar='default', cal_kwargs=None, **xr_kwargs)[source]
Create a list of aligned xarray Datasets for ensemble Dataset creation.
- Parameters:
datasets (list[xr.Dataset | xr.DataArray | Path | str | list[Path | str]] or str) – List of netcdf file paths or xarray Dataset/DataArray objects . If multifile is True, ‘datasets’ should be a list of lists where each sublist contains input NetCDF files of a xarray multi-file Dataset. DataArrays should have a name, so they can be converted to datasets. If a string, it is assumed to be a glob pattern for finding datasets.
multifile (bool) – If True climate simulations are treated as xarray multi-file datasets before concatenation. Only applicable when ‘datasets’ is a sequence of file paths.
resample_freq (str, optional) – If the members of the ensemble have the same frequency but not the same offset, they cannot be properly aligned. If resample_freq is set, the time coordinate of each member will be modified to fit this frequency.
calendar (str) – The calendar of the time coordinate of the ensemble. For conversions involving ‘360_day’, the align_on=’date’ option is used. See
xclim.core.calendar.convert_calendar()
. ‘default’ is the standard calendar using np.datetime64 objects.**xr_kwargs – Any keyword arguments to be given to xarray when opening the files.
- Return type:
list
[Dataset
]- Returns:
list[xr.Dataset]
- xclim.ensembles._base.create_ensemble(datasets, multifile=False, resample_freq=None, calendar=None, realizations=None, cal_kwargs=None, **xr_kwargs)[source]
Create an xarray dataset of an ensemble of climate simulation from a list of netcdf files.
Input data is concatenated along a newly created data dimension (‘realization’). Returns an xarray dataset object containing input data from the list of netcdf files concatenated along a new dimension (name:’realization’). In the case where input files have unequal time dimensions, the output ensemble Dataset is created for maximum time-step interval of all input files. Before concatenation, datasets not covering the entire time span have their data padded with NaN values. Dataset and variable attributes of the first dataset are copied to the resulting dataset.
- Parameters:
datasets (list or dict or string) – List of netcdf file paths or xarray Dataset/DataArray objects . If multifile is True, ncfiles should be a list of lists where each sublist contains input .nc files of an xarray multifile Dataset. If DataArray objects are passed, they should have a name in order to be transformed into Datasets. A dictionary can be passed instead of a list, in which case the keys are used as coordinates along the new realization axis. If a string is passed, it is assumed to be a glob pattern for finding datasets.
multifile (bool) – If True, climate simulations are treated as xarray multifile Datasets before concatenation. Only applicable when “datasets” is sequence of list of file paths. Default: False.
resample_freq (Optional[str]) – If the members of the ensemble have the same frequency but not the same offset, they cannot be properly aligned. If resample_freq is set, the time coordinate of each member will be modified to fit this frequency.
calendar (str, optional) – The calendar of the time coordinate of the ensemble. By default, the smallest common calendar is chosen. For example, a mixed input of “noleap” and “360_day” will default to “noleap”. ‘default’ is the standard calendar using np.datetime64 objects (xarray’s “standard” with use_cftime=False).
realizations (sequence, optional) – The coordinate values for the new realization axis. If None (default), the new axis has a simple integer coordinate. This argument shouldn’t be used if datasets is a glob pattern as the dataset order is random.
cal_kwargs (dict, optional) – Additional arguments to pass to py:func:xclim.core.calendar.convert_calendar. For conversions involving ‘360_day’, the align_on=’date’ option is used by default.
**xr_kwargs – Any keyword arguments to be given to xr.open_dataset when opening the files (or to xr.open_mfdataset if multifile is True)
- Return type:
Dataset
- Returns:
xr.Dataset – Dataset containing concatenated data from all input files.
Notes
Input netcdf files require equal spatial dimension size (e.g. lon, lat dimensions). If input data contains multiple cftime calendar types they must be at monthly or coarser frequency.
Examples
from pathlib import Path from xclim.ensembles import create_ensemble ens = create_ensemble(temperature_datasets) # Using multifile datasets, through glob patterns. # Simulation 1 is a list of .nc files (e.g. separated by time): datasets = list(Path("/dir").glob("*.nc")) # Simulation 2 is also a list of .nc files: datasets.extend(Path("/dir2").glob("*.nc")) ens = create_ensemble(datasets, multifile=True)
- xclim.ensembles._base.ensemble_mean_std_max_min(ens, min_members=1, weights=None)[source]
Calculate ensemble statistics between a results from an ensemble of climate simulations.
Returns an xarray Dataset containing ensemble mean, standard-deviation, minimum and maximum for input climate simulations.
- Parameters:
ens (xr.Dataset) – Ensemble dataset (see xclim.ensembles.create_ensemble).
min_members (int, optional) – The minimum number of valid ensemble members for a statistic to be valid. Passing None is equivalent to setting min_members to the size of the realization dimension. The default (1) essentially skips this check.
weights (xr.DataArray, optional) – Weights to apply along the ‘realization’ dimension. This array cannot contain missing values.
- Return type:
Dataset
- Returns:
xr.Dataset – Dataset with data variables of ensemble statistics.
Examples
from xclim.ensembles import create_ensemble, ensemble_mean_std_max_min # Create the ensemble dataset: ens = create_ensemble(temperature_datasets) # Calculate ensemble statistics: ens_mean_std = ensemble_mean_std_max_min(ens)
- xclim.ensembles._base.ensemble_percentiles(ens, values=None, keep_chunk_size=None, min_members=1, weights=None, split=True)[source]
Calculate ensemble statistics between a results from an ensemble of climate simulations.
Returns a Dataset containing ensemble percentiles for input climate simulations.
- Parameters:
ens (xr.Dataset or xr.DataArray) – Ensemble Dataset or DataArray (see xclim.ensembles.create_ensemble).
values (Sequence[int], optional) – Percentile values to calculate. Default: (10, 50, 90).
keep_chunk_size (bool, optional) – For ensembles using dask arrays, all chunks along the ‘realization’ axis are merged. If True, the dataset is rechunked along the dimension with the largest chunks, so that the chunks keep the same size (approximately). If False, no shrinking is performed, resulting in much larger chunks. If not defined, the function decides which is best.
min_members (int, optional) – The minimum number of valid ensemble members for a statistic to be valid. Passing None is equivalent to setting min_members to the size of the realization dimension. The default (1) essentially skips this check.
weights (xr.DataArray, optional) – Weights to apply along the ‘realization’ dimension. This array cannot contain missing values. When given, the function uses xarray’s quantile method which is slower than xclim’s NaN-optimized algorithm.
split (bool) – Whether to split each percentile into a new variable or concatenate the output along a new “percentiles” dimension.
- Return type:
DataArray
|Dataset
- Returns:
xr.Dataset or xr.DataArray – If split is True, same type as ens; dataset otherwise, containing data variable(s) of requested ensemble statistics
Examples
from xclim.ensembles import create_ensemble, ensemble_percentiles # Create ensemble dataset: ens = create_ensemble(temperature_datasets) # Calculate default ensemble percentiles: ens_percs = ensemble_percentiles(ens) # Calculate non-default percentiles (25th and 75th) ens_percs = ensemble_percentiles(ens, values=(25, 50, 75)) # If the original array has many small chunks, it might be more efficient to do: ens_percs = ensemble_percentiles(ens, keep_chunk_size=False)
xclim.ensembles._filters module
- xclim.ensembles._filters._concat_hist(da, **hist)[source]
Concatenate historical scenario with future scenarios along time.
- Parameters:
da (xr.DataArray) – Input data where the historical scenario is stored alongside other, future, scenarios.
hist ({str: str}) – Mapping of the scenario dimension name to the historical scenario coordinate, e.g. scenario=”historical”.
- Returns:
xr.DataArray – Data with the historical scenario is stacked in time before each one of the other scenarios.
Notes
Data goes from:
scenario
time
historical
hhhhhhhhhhhhhhhh------------
ssp245
----------------111111111111
ssp370
----------------222222222222
to:
scenario
time
ssp245
hhhhhhhhhhhhhhhh111111111111
ssp370
hhhhhhhhhhhhhhhh222222222222
- xclim.ensembles._filters._model_in_all_scens(da, dimensions=None)[source]
Return data with only simulations that have at least one member in each scenario.
- Parameters:
da (xr.DataArray) – Input data with dimensions for time, member, model and scenario.
dimensions (dict) – Mapping from original dimension names to standard dimension names: scenario, model, member.
- Returns:
xr.DataArray – Data for models that have values for all scenarios.
Notes
In the following example, model C would be filtered out from the data because it has no member for ssp370.
model
members
ssp245 | ssp370
A
1,2,3
1,2,3
B
1
2,3
C
1,2,3
- xclim.ensembles._filters._single_member(da, dimensions=None)[source]
Return data for a single member per model.
- Parameters:
da (xr.DataArray) – Input data with dimensions for time, member, model and scenario.
dimensions (dict) – Mapping from original dimension names to standard dimension names: scenario, model, member.
- Returns:
xr.DataArray – Data with only one member per model.
Notes
In the following example, the original members would be filtered to return only the first member found for each scenario.
model
member
Selected
ssp245 | ssp370
ssp245 | ssp370
A
1,2,3
1,2,3
1
1
B
1,2
2,3
1
2
xclim.ensembles._partitioning module
Uncertainty Partitioning
This module implements methods and tools meant to partition climate projection uncertainties into different components.
- xclim.ensembles._partitioning.fractional_uncertainty(u)[source]
Return the fractional uncertainty.
- Parameters:
u (xr.DataArray) – Array with uncertainty components along the uncertainty dimension.
- Return type:
DataArray
- Returns:
xr.DataArray – Fractional, or relative uncertainty with respect to the total uncertainty.
- xclim.ensembles._partitioning.hawkins_sutton(da, sm=None, weights=None, baseline=('1971', '2000'), kind='+')[source]
Return the mean and partitioned variance of an ensemble based on method from Hawkins & Sutton (2009).
- Parameters:
da (xr.DataArray) – Time series with dimensions ‘time’, ‘scenario’ and ‘model’.
sm (xr.DataArray, optional) – Smoothed time series over time, with the same dimensions as da. By default, this is estimated using a 4th-order polynomial. Results are sensitive to the choice of smoothing function, use this to set another polynomial order, or a LOESS curve.
weights (xr.DataArray, optional) – Weights to be applied to individual models. Should have model dimension.
baseline ((str, str)) – Start and end year of the reference period.
kind ({‘+’, ‘’}*) – Whether the mean over the reference period should be subtracted (+) or divided by (*).
- Return type:
tuple
[DataArray
,DataArray
]- Returns:
xr.DataArray, xr.DataArray – The mean relative to the baseline, and the components of variance of the ensemble. These components are coordinates along the uncertainty dimension: variability, model, scenario, and total.
Notes
To prepare input data, make sure da has dimensions time, scenario and model, e.g. da.rename({“scen”: “scenario”}).
- To reproduce results from Hawkins and Sutton [2009], input data should meet the following requirements:
annual time series starting in 1950 and ending in 2100;
the same models are available for all scenarios.
To get the fraction of the total variance instead of the variance itself, call fractional_uncertainty on the output.
References
- xclim.ensembles._partitioning.hawkins_sutton_09_weighting(da, obs, baseline=('1971', '2000'))[source]
Return weights according to the ability of models to simulate observed climate change.
Weights are computed by comparing the 2000 value to the baseline mean: w_m = 1 / (x_{obs} + | x_{m, 2000} - x_obs | )
- Parameters:
da (xr.DataArray) – Input data over the historical period. Should have a time and model dimension.
obs (float) – Observed change.
baseline ((str, str)) – Baseline start and end year.
- Returns:
xr.DataArray – Weights over the model dimension.
- xclim.ensembles._partitioning.lafferty_sriver(da, sm=None, bb13=False)[source]
Return the mean and partitioned variance of an ensemble based on method from Lafferty and Sriver (2023).
- Parameters:
da (xr.DataArray) – Time series with dimensions ‘time’, ‘scenario’, ‘downscaling’ and ‘model’.
sm (xr.DataArray) – Smoothed time series over time, with the same dimensions as da. By default, this is estimated using a 4th-order polynomial. Results are sensitive to the choice of smoothing function, use this to set another polynomial order, or a LOESS curve.
bb13 (bool) – Whether to apply the Brekke and Barsugli (2013) method to estimate scenario uncertainty, where the variance over scenarios is computed before taking the mean over models and downscaling methods.
- Return type:
tuple
[DataArray
,DataArray
]- Returns:
xr.DataArray, xr.DataArray – The mean relative to the baseline, and the components of variance of the ensemble. These components are coordinates along the uncertainty dimension: variability, model, scenario, downscaling and total.
Notes
To prepare input data, make sure da has dimensions time, scenario, downscaling and model, e.g. da.rename({“experiment”: “scenario”}).
To get the fraction of the total variance instead of the variance itself, call fractional_uncertainty on the output.
References
Lafferty and Sriver [2023]
xclim.ensembles._reduce module
Ensemble Reduction
Ensemble reduction is the process of selecting a subset of members from an ensemble in order to reduce the volume of computation needed while still covering a good portion of the simulated climate variability.
- xclim.ensembles._reduce._calc_rsq(z, method, make_graph, n_sim, random_state, sample_weights)[source]
Sub-function to kmeans_reduce_ensemble. Calculates r-square profile (r-square versus number of clusters.
- xclim.ensembles._reduce._get_nclust(method=None, n_sim=None, rsq=None, max_clusters=None)[source]
Sub-function to kmeans_reduce_ensemble. Determine number of clusters to create depending on various methods.
- xclim.ensembles._reduce.kkz_reduce_ensemble(data, num_select, *, dist_method='euclidean', standardize=True, **cdist_kwargs)[source]
Return a sample of ensemble members using KKZ selection.
The algorithm selects num_select ensemble members spanning the overall range of the ensemble. The selection is ordered, smaller groups are always subsets of larger ones for given criteria. The first selected member is the one nearest to the centroid of the ensemble, all subsequent members are selected in a way maximizing the phase-space coverage of the group. Algorithm taken from Cannon [2015].
- Parameters:
data (xr.DataArray) – Selection criteria data : 2-D xr.DataArray with dimensions ‘realization’ (N) and ‘criteria’ (P). These are the values used for clustering. Realizations represent the individual original ensemble members and criteria the variables/indicators used in the grouping algorithm.
num_select (int) – The number of members to select.
dist_method (str) – Any distance metric name accepted by scipy.spatial.distance.cdist.
standardize (bool) – Whether to standardize the input before running the selection or not. Standardization consists in translation as to have a zero mean and scaling as to have a unit standard deviation.
**cdist_kwargs – All extra arguments are passed as-is to scipy.spatial.distance.cdist, see its docs for more information.
- Return type:
list
- Returns:
list – Selected model indices along the realization dimension.
References
- xclim.ensembles._reduce.kmeans_reduce_ensemble(data, *, method=None, make_graph=True, max_clusters=None, variable_weights=None, model_weights=None, sample_weights=None, random_state=None)[source]
Return a sample of ensemble members using k-means clustering.
The algorithm attempts to reduce the total number of ensemble members while maintaining adequate coverage of the ensemble uncertainty in an N-dimensional data space. K-Means clustering is carried out on the input selection criteria data-array in order to group individual ensemble members into a reduced number of similar groups. Subsequently, a single representative simulation is retained from each group.
- Parameters:
data (xr.DataArray) – Selection criteria data : 2-D xr.DataArray with dimensions ‘realization’ (N) and ‘criteria’ (P). These are the values used for clustering. Realizations represent the individual original ensemble members and criteria the variables/indicators used in the grouping algorithm.
method (dict, optional) – Dictionary defining selection method and associated value when required. See Notes.
max_clusters (int, optional) – Maximum number of members to include in the output ensemble selection. When using ‘rsq_optimize’ or ‘rsq_cutoff’ methods, limit the final selection to a maximum number even if method results indicate a higher value. Defaults to N.
variable_weights (np.ndarray, optional) – An array of size P. This weighting can be used to influence of weight of the climate indices (criteria dimension) on the clustering itself.
model_weights (np.ndarray, optional) – An array of size N. This weighting can be used to influence which realization is selected from within each cluster. This parameter has no influence on the clustering itself.
sample_weights (np.ndarray, optional) – An array of size N. sklearn.cluster.KMeans() sample_weights parameter. This weighting can be used to influence of weight of simulations on the clustering itself. See: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
random_state (int or np.random.RandomState, optional) – sklearn.cluster.KMeans() random_state parameter. Determines random number generation for centroid initialization. Use to make the randomness deterministic. See: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
make_graph (bool) – output a dictionary of input for displays a plot of R² vs. the number of clusters. Defaults to True if matplotlib is installed in runtime environment.
- Return type:
tuple[list, numpy.ndarray, dict]
Notes
Parameters for method in call must follow these conventions:
- rsq_optimize
Calculate coefficient of variation (R²) of cluster results for n = 1 to N clusters and determine an optimal number of clusters that balances cost/benefit tradeoffs. This is the default setting. See supporting information S2 text in Casajus et al. [2016].
method={‘rsq_optimize’:None}
- rsq_cutoff
Calculate Coefficient of variation (R²) of cluster results for n = 1 to N clusters and determine the minimum numbers of clusters needed for R² > val.
val : float between 0 and 1. R² value that must be exceeded by clustering results.
method={‘rsq_cutoff’: val}
- n_clusters
Create a user determined number of clusters.
val : integer between 1 and N
method={‘n_clusters’: val}
- Return type:
tuple
[list
,ndarray
,dict
]- Returns:
list – Selected model indexes (positions)
np.ndarray – KMeans clustering results
dict – Dictionary of input data for creating R² profile plot. ‘None’ when make_graph=False
References
Casajus, Périé, Logan, Lambert, Blois, and Berteaux [2016]
Examples
import xclim from xclim.ensembles import create_ensemble, kmeans_reduce_ensemble from xclim.indices import hot_spell_frequency # Start with ensemble datasets for temperature: ensTas = create_ensemble(temperature_datasets) # Calculate selection criteria -- Use annual climate change Δ fields between 2071-2100 and 1981-2010 normals. # First, average annual temperature: tg = xclim.atmos.tg_mean(tas=ensTas.tas) his_tg = tg.sel(time=slice("1990", "2019")).mean(dim="time") fut_tg = tg.sel(time=slice("2020", "2050")).mean(dim="time") dtg = fut_tg - his_tg # Then, hot spell frequency as second indicator: hs = hot_spell_frequency(tasmax=ensTas.tas, window=2, thresh_tasmax="10 degC") his_hs = hs.sel(time=slice("1990", "2019")).mean(dim="time") fut_hs = hs.sel(time=slice("2020", "2050")).mean(dim="time") dhs = fut_hs - his_hs # Create a selection criteria xr.DataArray: from xarray import concat crit = concat((dtg, dhs), dim="criteria") # Finally, create clusters and select realization ids of reduced ensemble: ids, cluster, fig_data = kmeans_reduce_ensemble( data=crit, method={"rsq_cutoff": 0.9}, random_state=42, make_graph=False ) ids, cluster, fig_data = kmeans_reduce_ensemble( data=crit, method={"rsq_optimize": None}, random_state=42, make_graph=True )
- xclim.ensembles._reduce.make_criteria(ds)[source]
Reshapes the input into a criteria 2D DataArray.
The reshaping preserves the “realization” dimension but stacks all other dimensions and variables into a new “criteria” dimension, as expected by functions
xclim.ensembles._reduce.kkz_reduce_ensemble()
andxclim.ensembles._reduce.kmeans_reduce_ensemble()
.- Parameters:
ds (Dataset or DataArray) – Must at least have a “realization” dimension. All values are considered independent “criterion” for the ensemble reduction. If a Dataset, variables may have different sizes, but all must include the “realization” dimension.
- Returns:
crit (DataArray) – Same data, reshaped. Old coordinates (and variables) are available as a multiindex.
Notes
One can get back to the original dataset with
crit = make_criteria(ds) ds2 = crit.unstack("criteria").to_dataset("variables")
ds2 will have all variables with the same dimensions, so if the original dataset had variables with different dimensions, the added dimensions are filled with NaNs. The to_dataset part can be skipped if the original input was a DataArray.
- xclim.ensembles._reduce.plot_rsqprofile(fig_data)[source]
Create an R² profile plot using kmeans_reduce_ensemble output.
The R² plot allows evaluation of the proportion of total uncertainty in the original ensemble that is provided by the reduced selected.
Examples
>>> from xclim.ensembles import kmeans_reduce_ensemble, plot_rsqprofile >>> is_matplotlib_installed() >>> crit = xr.open_dataset(path_to_ensemble_file).data >>> ids, cluster, fig_data = kmeans_reduce_ensemble( ... data=crit, method={"rsq_cutoff": 0.9}, random_state=42, make_graph=True ... ) >>> plot_rsqprofile(fig_data)
xclim.ensembles._robustness module
Ensemble Robustness Metrics
Robustness metrics are used to estimate the confidence of the climate change signal of an ensemble. This submodule is inspired by and tries to follow the guidelines of the IPCC, more specifically [Collins et al., 2013] (AR5) and On Climate Change (IPCC) [2023] (AR6).
- xclim.ensembles._robustness.change_significance(fut, ref, test='ttest', weights=None, p_vals=False, **kwargs)[source]
Backwards-compatible implementation of
robustness_fractions()
.- Return type:
tuple
[DataArray
|Dataset
,DataArray
|Dataset
] |tuple
[DataArray
|Dataset
,DataArray
|Dataset
,DataArray
|Dataset
|None
]
- xclim.ensembles._robustness.robustness_categories(changed_or_fractions, agree=None, *, categories=None, ops=None, thresholds=None)[source]
Create a categorical robustness map for mapping hatching patterns.
Each robustness category is defined by a double threshold, one on the fraction of members showing significant change (change_frac) and one on the fraction of member agreeing on the sign of change (agree_frac). When the two thresholds are fulfilled, the point is assigned to the given category. The default values for the comparisons are the ones suggested by the IPCC for its “Advanced approach” described in the Cross-Chapter Box 1 of the Atlas of the AR6 WGI report (on Climate Change (IPCC) [2023]).
- Parameters:
changed_or_fractions (xr.Dataset or xr.DataArray) – Either the fraction of members showing significant change as an array or directly the output of
robustness_fractions()
.agree (xr.DataArray, optional) – The fraction of members agreeing on the sign of change. Only needed if the first argument is the changed array.
categories (list of str, optional) – The label of each robustness categories. They are stored in the semicolon separated flag_descriptions attribute as well as in a compressed form in the flag_meanings attribute. If a point is mapped to two categories, priority is given to the first one in this list.
ops (list of tuples of str, optional) – For each category, the comparison operators for change_frac and agree_frac. None or an empty string means the variable is not needed for this category.
thresholds (list of tuples of float, optional) – For each category, the threshold to be used with the corresponding operator. All should be between 0 and 1.
- Return type:
DataArray
- Returns:
xr.DataArray – Categorical (int) array following the flag variables CF conventions. 99 is used as a fill value for points that do not fall in any category.
- xclim.ensembles._robustness.robustness_coefficient(fut, ref)[source]
Robustness coefficient quantifying the robustness of a climate change signal in an ensemble.
Taken from Knutti and Sedláček [2013].
The robustness metric is defined as R = 1 − A1 / A2 , where A1 is defined as the integral of the squared area between two cumulative density functions characterizing the individual model projections and the multimodel mean projection and A2 is the integral of the squared area between two cumulative density functions characterizing the multimodel mean projection and the historical climate. Description taken from Knutti and Sedláček [2013].
A value of R equal to one implies perfect model agreement. Higher model spread or smaller signal decreases the value of R.
- Parameters:
fut (Union[xr.DataArray, xr.Dataset]) – Future ensemble values along ‘realization’ and ‘time’ (nr, nt). Can be a dataset, in which case the coefficient is computed on each variable.
ref (Union[xr.DataArray, xr.Dataset]) – Reference period values along ‘time’ (nt). Same type as fut.
- Return type:
DataArray
|Dataset
- Returns:
xr.DataArray or xr.Dataset – The robustness coefficient, ]-inf, 1], float. Same type as fut or ref.
References
Knutti and Sedláček [2013]
- xclim.ensembles._robustness.robustness_fractions(fut, ref=None, test=None, weights=None, **kwargs)[source]
Robustness statistics qualifying how members of an ensemble agree on the existence of change and on its sign.
- Parameters:
fut (xr.DataArray) – Future period values along ‘realization’ and ‘time’ (…, nr, nt1) or if ref is None, Delta values along realization (…, nr).
ref (xr.DataArray, optional) – Reference period values along realization’ and ‘time’ (…, nr, nt2). The size of the ‘time’ axis does not need to match the one of fut. But their ‘realization’ axes must be identical and the other coordinates should be the same. If None (default), values of fut are assumed to be deltas instead of a distribution across the future period.
test ({ttest, welch-ttest, mannwhitney-utest, brownforsythe-test, ipcc-ar6-c, threshold}, optional) – Name of the statistical test used to determine if there was significant change. See notes.
weights (xr.DataArray) – Weights to apply along the ‘realization’ dimension. This array cannot contain missing values.
**kwargs – Other arguments specific to the statistical test. See notes.
- Return type:
Dataset
- Returns:
xr.Dataset – Same coordinates as fut and ref, but no time and no realization.
Variables:
- changed :
The weighted fraction of valid members showing significant change. Passing test=None yields change_frac = 1 everywhere. Same type as fut.
- positive :
The weighted fraction of valid members showing positive change, no matter if it is significant or not.
- changed_positive :
The weighted fraction of valid members showing significant and positive change (]0, 1]).
- agree :
The weighted fraction of valid members agreeing on the sign of change. It is the maximum between positive and 1 - positive.
- valid :
The weighted fraction of valid members. A member is valid is there are no NaNs along the time axes of fut and ref.
- pvals :
The p-values estimated by the significance tests. Only returned if the test uses pvals. Has the realization dimension.
Notes
The table below shows the coefficient needed to retrieve the number of members that have the indicated characteristics, by multiplying it by the total number of members (fut.realization.size) and by valid_frac, assuming uniform weights. For compactness, we rename the outputs cf, cpf and pf.
Significant change
Non-significant change
Any change
Any direction
cf
1 - cf
1
Positive change
cpf
pf - cpf
pf
Negative change
(cf - cpf)
1 - pf - (cf -cpf)
1 - pf
Available statistical tests are :
- ttest:
Single sample T-test. Same test as used by Tebaldi et al. [2011].
The future values are compared against the reference mean (over ‘time’). Accepts argument p_change (float, default : 0.05) the p-value threshold for rejecting the hypothesis of no significant change.
- welch-ttest:
Two-sided T-test, without assuming equal population variance.
Same significance criterion and argument as ‘ttest’.
- mannwhitney-utest:
Two-sided Mann-Whiney U-test. Same significance criterion and argument as ‘ttest’.
- brownforsythe-test:
Brown-Forsythe test assuming skewed, non-normal distributions.
Same significance criterion and argument as ‘ttest’.
- ipcc-ar6-c:
The advanced approach used in the IPCC Atlas chapter (on Climate Change (IPCC) [2023]).
Change is considered significant if the delta exceeds a threshold related to the internal variability. If pre-industrial data is given in argument ref_pi, the threshold is defined as \(\sqrt{2}*1.645*\sigma_{20yr}\), where \(\sigma_{20yr}\) is the standard deviation of 20-year means computed from non-overlapping periods after detrending with a quadratic fit. Otherwise, when such pre-industrial control data is not available, the threshold is defined in relation to the historical data (ref) as \(\sqrt{\frac{2}{20}}*1.645*\sigma_{1yr}, where :math:\)sigma_{1yr}` is the inter-annual standard deviation measured after linearly detrending the data. See notebook Ensembles for more details.
- threshold :
Change is considered significant when it exceeds an absolute or relative threshold. Accepts one argument, either “abs_thresh” or “rel_thresh”.
- None :
Significant change is not tested. Members showing any positive change are included in the pos_frac output.
References
Tebaldi, Arblaster, and Knutti [2011] On Climate Change (IPCC) [2023]
Example
This example computes the mean temperature in an ensemble and compares two time periods, qualifying significant change through a single sample T-test.
>>> from xclim import ensembles >>> ens = ensembles.create_ensemble(temperature_datasets) >>> tgmean = xclim.atmos.tg_mean(tas=ens.tas, freq="YS") >>> fut = tgmean.sel(time=slice("2020", "2050")) >>> ref = tgmean.sel(time=slice("1990", "2020")) >>> fractions = ensembles.robustness_fractions(fut, ref, test="ttest")