{
“cells”: [
{

“cell_type”: “code”, “execution_count”: null, “metadata”: {

“nbsphinx”: “hidden”

}, “outputs”: [], “source”: [

“# This cell is not visible when the documentation is built.n”, “n”, “import numpy as npn”, “import xarray as xrn”, “n”, “# Create ‘toy’ criteria selection datan”, “np.random.normal(loc=3.5, scale=1.5, size=50)n”, “#crit[‘delta_annual_tavg’]n”, “np.random.seed(0)n”, “test = xr.DataArray(np.random.normal(loc=3, scale=1.5, size=50), dims=[‘realization’]).assign_coords(horizon=’2041-2070’)n”, “test = xr.concat((test,xr.DataArray(np.random.normal(loc=5.34, scale=2, size=50), dims=[‘realization’]).assign_coords(horizon=’2071-2100’)), dim=’horizon’)n”, “n”, “ds_crit = xr.Dataset()n”, “n”, “ds_crit[‘delta_annual_tavg’] = testn”, “test = xr.DataArray(np.random.normal(loc=5, scale=5, size=50), dims=[‘realization’]).assign_coords(horizon=’2041-2070’)n”, “test = xr.concat((test,xr.DataArray(np.random.normal(loc=10, scale=8, size=50), dims=[‘realization’]).assign_coords(horizon=’2071-2100’)), dim=’horizon’)n”, “ds_crit[‘delta_annual_prtot’] = testn”, “test = xr.DataArray(np.random.normal(loc=0, scale=3, size=50), dims=[‘realization’]).assign_coords(horizon=’2041-2070’)n”, “test = xr.concat((test,xr.DataArray(np.random.normal(loc=2, scale=4, size=50), dims=[‘realization’]).assign_coords(horizon=’2071-2100’)), dim=’horizon’)n”, “ds_crit[‘delta_JJA_prtot’] = test”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“Ensemble-Reduction Techniquesn”, “=============================n”, “n”, “xclim.ensembles provides means of reducing the number of candidates in a sample to get a reasonable and representative spread of outcomes using a reduced number of candidates. By reducing the number of realizations in a strategic manner, we can significantly reduce the number of realizations to examine, while maintaining statistical representation of original dataset. This is particularly useful when computation power or time is a factor.n”, “n”, “For more information on the principles and methods behind ensemble reduction techniques, see: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152495 and https://doi.org/10.1175/JCLI-D-14-00636.1n”, “n”, “Selection Criteria\n”, “The following example considers 50 member ensemble with a total of 6 criteria considered (3 variable deltas * 2 time horizons). Our goal is to reduce this number to a more manageable size while preserving the range of uncertainty across our different criteria.”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [

“import matplotlib.pyplot as pltn”, “import numpy as npn”, “import xarray as xrn”, “from xclim import ensemblesn”, “n”, “# Using an xarray dataset of our criterian”, “ds_crit”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [

“plt.style.use(‘seaborn-dark’)n”, “plt.rcParams[‘figure.figsize’] = (13, 5)n”, “fig = plt.figure(figsize=(11,9))n”, “ax = plt.axes(projection=’3d’)n”, “n”, “for h in ds_crit.horizon:n”, ” ax.scatter(ds_crit.sel(horizon=h).delta_annual_tavg, ds_crit.sel(horizon=h).delta_annual_prtot,n”, ” ds_crit.sel(horizon=h).delta_JJA_prtot,label=f"delta {h.values}")n”, “n”, “ax.set_xlabel(‘delta_annual_tavg (C)’)n”, “ax.set_ylabel(‘delta_annual_prtot (%)’)n”, “ax.set_zlabel(‘delta_JJA_prtot (%)’)n”, “plt.legend()n”, “plt.show()”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“Ensemble reduction techniques in xclim require a 2D array with dimensions of criteria (values) and realization (runs/simulations).”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [

“# Create 2d xr.DataArray containing criteria valuesn”, “crit=Nonen”, “for h in ds_crit.horizon:n”, ” for v in ds_crit.data_vars:n”, ” if crit is None:n”, ” crit = ds_crit[v].sel(horizon=h)n”, ” else:n”, ” crit = xr.concat((crit, ds_crit[v].sel(horizon=h)), dim=’criteria’)n”, “crit.name= ‘criteria’n”, “crit.shape”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“#### K-Means reduce ensemblen”, “n”, “The kmeans_reduce_ensemble works by grouping realizations into sub-groups based on the provided critera and retaining a representative realization per sub-group.n”, “n”, “For a real-world example of the K-means clustering algorithm applied to climate data selection, see: https://doi.org/10.1371/journal.pone.0152495 and https://doi.org/10.1175/JCLI-D-11-00440.1n”, “n”, “The following example uses method = dict(n_clusters=25) in order to take the original 50 realizations and reduce them down to 25. The function itself returns the ids (indexes: int) of the realizations, which can then be used to select the data from the original ensemble.”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [

“ids, cluster, fig_data = ensembles.kmeans_reduce_ensemble(data=crit, method={‘n_clusters’:25}, random_state=42, make_graph=True)n”, “ds_crit.isel(realization = ids)”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“With this reduced number, we can now compare the distribution of the selection versus the original ensemble of simulations.”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [

“plt.style.use(‘seaborn-dark’)n”, “fig = plt.figure(figsize=(11,9))n”, “ax = plt.axes(projection=’3d’)n”, “n”, “for h in ds_crit.horizon:n”, ” ax.scatter(ds_crit.sel(horizon=h,realization=ids).delta_annual_tavg, ds_crit.sel(horizon=h, realization=ids).delta_annual_prtot ,n”, ” ds_crit.sel(horizon=h,realization=ids).delta_JJA_prtot,label=f"delta {h.values} - selected")n”, “n”, “ax.set_xlabel(‘delta_annual_tavg (C)’)n”, “ax.set_ylabel(‘delta_annual_prtot (%)’)n”, “ax.set_zlabel(‘delta_JJA_prtot (%)’)n”, “plt.legend()n”, “plt.show()”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“The function optionally produces a data dictionary for figure production of the associated R² profile.n”, “n”, “The function ensembles.plot_rsqprofile provides plotting for evaluating the proportion of total variance in climate realizations that is covered by the selection.n”, “n”, “In this case ~88% of the total variance in original ensemble is covered by the selection.”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [

“ensembles.plot_rsqprofile(fig_data)”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“Alternatively we can use method = {‘rsq_cutoff’:Float} or method = {‘rsq_optimize’:None}n”, “* For example with rsq_cutoff we instead find the number of realizations needed to cover the provided $R^{2}$ value”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [

“ids1, cluster1, fig_data1 = ensembles.kmeans_reduce_ensemble(data=crit, method={‘rsq_cutoff’:0.75}, random_state=42, make_graph=True)n”, “ensembles.plot_rsqprofile(fig_data1)n”, “ds_crit.isel(realization=ids1)”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“#### KKZ reduce ensemblen”, “n”, “xclim also makes available a similar ensemble reduction algorithm, ensembles.kkz_reduce_ensemble. see: https://doi.org/10.1175/JCLI-D-14-00636.1n”, “n”, “The advantage of this algorithm is largely that fewer realizations are needed in order to reach the same level of representative members than the K-means clustering algorithm, as the KKZ methods tends towards identifying members that fall towards the extremes of criteria values. n”, “n”, “This technique also produces nested selection results, where additional increase in desired selection size does not alter the previous choices, which is not the case for the K-means algorithm.”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [

“ids = ensembles.kkz_reduce_ensemble(crit, num_select=10)n”, “ds_crit.isel(realization=ids)”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [

“plt.style.use(‘seaborn-dark’)n”, “fig = plt.figure(figsize=(9,9))n”, “ax = plt.axes(projection=’3d’)n”, “n”, “for h in ds_crit.horizon:n”, “n”, ” ax.scatter(ds_crit.sel(horizon=h,realization=ids).delta_annual_tavg, ds_crit.sel(horizon=h, realization=ids).delta_annual_prtot ,n”, ” ds_crit.sel(horizon=h,realization=ids).delta_JJA_prtot,label=f"delta {h.values} - selected")n”, “n”, “ax.set_xlabel(‘delta_annual_tavg (C)’)n”, “ax.set_ylabel(‘delta_annual_prtot (%)’)n”, “ax.set_zlabel(‘delta_JJA_prtot (%)’)n”, “plt.legend()n”, “plt.show()”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“#### KKZ algorithm vs K-Means algorithmn”, “n”, “To give a better sense of the differences between Nested (KKZ) and Unnested (K-Means) results, we can progressively identify members that would be chosen by each algorithm through iterative fashion.n”, “n”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [

“## NESTED results using KKZn”, “for n in np.arange(5, 14, 2):n”, ” ids = ensembles.kkz_reduce_ensemble(crit, num_select=n)n”, ” print(ids)n”, ” “

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {

“pycharm”: {

“name”: “#%%n”

}

}, “outputs”: [], “source”: [

“## UNNESTED results using k-meansn”, “for n in np.arange(5, 14, 2):n”, ” ids, cluster, fig_data = ensembles.kmeans_reduce_ensemble(crit, method={‘n_clusters’:n}, random_state=42, make_graph=True)n”, ” print(ids)”

]

}, {

“cell_type”: “markdown”, “metadata”: {

“pycharm”: {

“name”: “#%% mdn”

}

}, “source”: [

“While the Nested feature of the KKZ results is typically advantageous, it can sometimes result in unbalanced coverage of the original ensemble. In general careful consideration and validation of selection results is suggested when `n` is small, regardless of the technique used.n”, “n”, “To illustrate a simple example using only 2 of our criteria shows differences in results between the two techniques:n”, “n”, “The KKZ algorithm iteratively maximizes distance from previous selected candidates - potentially resulting in ‘off-center’ results versus the original ensemblen”, “n”, “The K-means algorithm will redivide the data space with each iteration producing results that are consistently centered on the original ensemble but lacking coverage in the extremes “

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [

“df = crit.isel(criteria=[0,1])n”, “n”, “# Use standardized data in the plot so that selection distances is better visualized n”, “df = (df- df.mean("realization")) / df.std("realization")n”, “n”, “plt.figure(figsize=(18,3))n”, “for n in np.arange(1, 6):n”, ” plt.subplot(1,5,n, aspect=’equal’)n”, ” plt.scatter(df.isel(criteria=0), df.isel(criteria=1))n”, ” ids_KKZ = ensembles.kkz_reduce_ensemble(crit.isel(criteria=[0,1]), num_select=n)n”, ” plt.scatter(df.isel(criteria=0, realization=ids_KKZ), df.isel(criteria=1, realization=ids_KKZ), s=100)n”, ” plt.title(f’KKZ={n}’)n”, ” if n ==1 :n”, ” plt.ylabel(‘standardized delta_annual_prtot’)n”, ” if n == 3:n”, ” plt.xlabel(‘standardized delta_annual_tavg’)n”, “plt.suptitle(‘KKZ selection results’)n”, “n”, “plt.figure(figsize=(18,3))n”, “for n in np.arange(1, 6):n”, ” plt.subplot(1, 5,n, aspect=’equal’)n”, ” plt.scatter(df.isel(criteria=0), df.isel(criteria=1))n”, ” ids_Kmeans, c, figdata = ensembles.kmeans_reduce_ensemble(crit.isel(criteria=[0,1]), method={‘n_clusters’:n}, random_state=42, make_graph=True)n”, ” plt.scatter(df.isel(criteria=0, realization=ids_Kmeans), df.isel(criteria=1, realization=ids_Kmeans), s=100)n”, ” plt.title(f’Kmeans={n}’)n”, ” if n ==1 :n”, ” plt.ylabel(‘standardized delta_annual_prtot’)n”, ” if n == 3:n”, ” plt.xlabel(‘standardized delta_annual_tavg’)n”, “plt.suptitle(‘K-means selection results’)n”, “plt.show()”

]

}, {

“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: []

}

], “metadata”: {

“celltoolbar”: “Edit Metadata”, “kernelspec”: {

“display_name”: “Python 3”, “language”: “python”, “name”: “python3”

}, “language_info”: {

“codemirror_mode”: {

“name”: “ipython”, “version”: 3

}, “file_extension”: “.py”, “mimetype”: “text/x-python”, “name”: “python”, “nbconvert_exporter”: “python”, “pygments_lexer”: “ipython3”, “version”: “3.8.8”

}, “toc”: {

“base_numbering”: 1, “nav_menu”: {}, “number_sections”: true, “sideBar”: true, “skip_h1_title”: false, “title_cell”: “Table of Contents”, “title_sidebar”: “Contents”, “toc_cell”: false, “toc_position”: {}, “toc_section_display”: true, “toc_window_display”: false

}

}, “nbformat”: 4, “nbformat_minor”: 4

}