Download this notebook from github.
Ensembles
An important aspect of climate models is that they are run multiple times with some initial perturbations to see how they replicate the natural variability of the climate. Through xclim.ensembles, xclim provides an easy interface to compute ensemble statistics on different members. Most methods perform checks and conversion on top of simpler xarray
methods, providing an easier interface to use.
create_ensemble
Our first step is to create an ensemble. This method takes a list of files defining the same variables over the same coordinates and concatenates them into one dataset with an added dimension realization
.
Using xarray
a very simple way of creating an ensemble dataset would be :
import xarray
xarray.open_mfdataset(files, concat_dim='realization')
However, this is only successful when the dimensions of all the files are identical AND only if the calendar type of each netcdf file is the same
xclim’s create_ensemble()
method overcomes these constraints, selecting the common time period to all files and assigns a standard calendar type to the dataset.
Input netcdf files still require equal spatial dimension size (e.g. lon, lat dimensions).
Given files all named ens_tas_m[member number].nc
, we use glob
to get a list of all those files.
[ ]:
from pathlib import Path
import xarray as xr
# Set display to HTML style (for fancy output)
xr.set_options(display_style="html", display_width=50)
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
from xclim import ensembles
ens = ensembles.create_ensemble(data_folder.glob("ens_tas_m*.nc")).load()
ens.close()
[ ]:
plt.style.use("seaborn-v0_8-dark")
plt.rcParams["figure.figsize"] = (13, 5)
ens.tas.plot(hue="realization")
plt.show()
[ ]:
ens.tas # Attributes of the first dataset to be opened are copied to the final output
Ensemble statistics
Beyond creating an ensemble dataset, the xclim.ensembles
module contains functions for calculating statistics between realizations
Ensemble mean, standard-deviation, max & min
In the example below, we use xclim’s ensemble_mean_std_max_min()
to calculate statistics across the 10 realizations in our test dataset. Output variables are created combining the original variable name tas
with additional ending indicating the statistic calculated on the realization dimension : _mean
, _stdev
, _min
, _max
The resulting output now contains 4 derived variables from the original single variable in our ensemble dataset.
[ ]:
ens_stats = ensembles.ensemble_mean_std_max_min(ens)
ens_stats
Ensemble percentiles
Here, we use xclim’s ensemble_percentiles()
to calculate percentile values across the 10 realizations. The output has now a percentiles
dimension instead of realization
. Split variables can be created instead, by specifying split=True
(the variable name tas
will be appended with _p{x}
). Compared to NumPy’s percentile()
and xarray’s quantile()
, this method handles more efficiently dataset with invalid values and the chunking along the realization dimension (which
is automatic when dask arrays are used).
[ ]:
ens_perc = ensembles.ensemble_percentiles(ens, values=[15, 50, 85], split=False)
ens_perc
[ ]:
fig, ax = plt.subplots()
ax.fill_between(
ens_stats.time.values,
ens_stats.tas_min,
ens_stats.tas_max,
alpha=0.3,
label="Min-Max",
)
ax.fill_between(
ens_perc.time.values,
ens_perc.tas.sel(percentiles=15),
ens_perc.tas.sel(percentiles=85),
alpha=0.5,
label="Perc. 15-85",
)
ax._get_lines.get_next_color() # Hack to get different line
ax._get_lines.get_next_color()
ax.plot(ens_stats.time.values, ens_stats.tas_mean, linewidth=2, label="Mean")
ax.plot(
ens_perc.time.values, ens_perc.tas.sel(percentiles=50), linewidth=2, label="Median"
)
ax.legend()
plt.show()
Change significance and model agreement
When communicating climate change through plots of projected change, it is often useful to add information on the statistical significance of the values. A common way to represent this information without overloading the figures is through hatching patterns superimposed on the primary data. Two aspects are usually shown :
change significance : whether most of the ensemble members project a statistically significant climate change signal, in comparison to their internal variability.
model agreement : whether the different ensemble members agree on the sign of the change.
We can then divide the plotted points into categories each with its own hatching pattern, usually leaving the robust data (models agree and enough show a significant change) without hatching.
Xclim provides some tools to help in generating these hatching masks. First is xc.ensembles.robustness_fractions that can characterize the change significance and sign agreement across ensemble members. To demonstrate its usage, we’ll first generate some fake annual mean temperature data. Here, ref
is the data on the reference period and fut
is a future projection. There are 5 different members in the
ensemble. We tweaked the generation so that all models agree on significant change in the “south” while agreement and signifiance of change decreases as we go north and east.
[ ]:
import numpy as np
import xarray as xr
from matplotlib.patches import Rectangle
xr.set_options(keep_attrs=True)
# Reference period
ref = xr.DataArray(
20 * np.random.random_sample((5, 30, 10, 10)) + 275,
dims=("realization", "time", "lat", "lon"),
coords={
"time": xr.date_range("1990", periods=30, freq="YS"),
"lat": np.arange(40, 50),
"lon": np.arange(-70, -60),
},
attrs={"units": "K"},
)
# Future
fut = xr.DataArray(
20 * np.random.random_sample((5, 30, 10, 10)) + 275,
dims=("realization", "time", "lat", "lon"),
coords={
"time": xr.date_range("2070", periods=30, freq="YS"),
"lat": np.arange(40, 50),
"lon": np.arange(-70, -60),
},
attrs={"units": "K"},
)
# Add change.
fut = fut + xr.concat(
[
xr.DataArray(np.linspace(15, north_delta, num=10), dims=("lat",))
for north_delta in [15, 10, 0, -7, -10]
],
"realization",
)
deltas = (fut.mean("time") - ref.mean("time")).assign_attrs(
long_name="Temperature change"
)
mean_delta = deltas.mean("realization")
deltas.plot(col="realization")
Change significance can be determined in a lot of different ways. Xclim provides some simple and some more complicated statistical test in robustness_fractions
. In this example, we’ll follow the suggestions found in the Cross-Chapter Box 1 of the IPCC Atlas chapter (AR6, WG1). Specifically, we are following Approach C, using the alternative for when pre-industrial control data is not available.
We first compute the different fractions for each robustness aspect.
[ ]:
fractions = ensembles.robustness_fractions(fut, ref, test="ipcc-ar6-c")
fractions
In this output we have:
changed
: The fraction of members showing significant change.positive
: The fraction of members showing positive change, no matter if it is significant or not.changed_positive
: The fraction of members showing significant AND positive change.agree
: The fraction of members agreeing on the sign of change. This is the maximum betweenpositive
and1 - positive
.valid
: The fraction of “valid” members. A member is valid is there are no NaNs along the time axes offut
andref
. In our case, it is 1 everywhere.
For example, here’s the plot of the fraction of members showing significant change.
[ ]:
fractions.changed.plot(figsize=(6, 4))
Xclim provides all this so that one can construct their own robustness maps the way they want. Often, hatching overlays are based on categories defined by some thresholds on the significant change and agreement fractions. The `xclim.ensembles.robustness_categories
<../apidoc/xclim.ensembles.rst#xclim.ensembles._robustness.robustness_categories>`__ function helps for that common case and defaults to the categories and thresholds used by the IPCC in its Atlas.
[ ]:
robustness = ensembles.robustness_categories(fractions)
robustness
The output is a categorical map following the “flag variables” CF conventions. Parameters needed for plotting are found in the attributes.
[ ]:
robustness.plot(figsize=(6, 4))
Matplotlib doesn’t provide an easy way of plotting categorial data with a proper legend, so our real plotting script is a bit more complicated, but xclim’s output makes it easier.
[ ]:
cmap = mpl.colors.ListedColormap(["none"]) # So we can deactivate pcolor's colormapping
fig, ax = plt.subplots(figsize=(6, 4))
mean_delta.plot(ax=ax)
# For each flag value plot the corresponding hatch.
for val, ha in zip(robustness.flag_values, [None, "\\\\\\", "xxx"]):
ax.pcolor(
robustness.lon,
robustness.lat,
robustness.where(robustness == val),
hatch=ha,
cmap=cmap,
)
ax.legend(
handles=[
Rectangle((0, 0), 2, 2, fill=False, hatch=h, label=lbl)
for h, lbl in zip(["\\\\\\", "xxx"], robustness.flag_descriptions[1:])
],
bbox_to_anchor=(0.0, 1.1),
loc="upper left",
ncols=2,
);