- {
- “cells”: [
- {
“cell_type”: “markdown”, “metadata”: {}, “source”: [
“# Extending xclimn”, “n”, “xclim tries to make it easy for users to add their own indices and indicators. The following goes into details on how to create _indices_ and document them so that xclim can parse most of the metadata directly. We then explain the multiple ways new _Indicators_ can be created and, finally, how we can regroup and structure them in virtual submodules.n”, “n”, “Central to xclim are the Indicators, objects computating indices over climate variables, but xclim also provides other modules:n”, “n”, “n”, “n”, “Where subset is a phantom module, kept for legacy code, as it only redirects the calls to clisops.core.subset.n”, “n”, “This introduction will focus on the Indicator/Indice part of xclim and how one can extend it by implementing new ones.n”, “n”, “n”, “## Indices vs Indicatorsn”, “n”, “Internally and in the documentation, xclim makes a distinction between "indices" and "indicators".n”, ” n”, “### indicen”, “n”, ” * A python function accepting DataArrays and other parameters (usually bultin types)n”, ” * Returns one or several DataArrays. n”, ” * Handles the units : checks input units and set proper CF-compliant output units. But doesn’t usually prescribe specific units, the output will at minimum have the proper dimensionality.n”, ” * Performs no other checks or set any (non-unit) metadata.n”, ” * Accessible through [xclim.indices](../indices.rst).n”, ” n”, “### indicatorn”, “n”, ” * An instance of a subclass of xclim.core.indicator.Indicator that wraps around an indice (stored in its compute property). n”, ” * Returns one or several DataArrays.n”, ” * Handles missing values, performs input data and metadata checks (see [usage](usage.ipynb#Health-checks-and-metadata-attributes)).n”, ” * Always ouputs data in the same units.n”, ” * Adds dynamically generated metadata to the output after computation.n”, ” * Accessible through [xclim.indicators ](../indicators_api.rst)n”, “n”, “Most metadata stored in the Indicators is parsed from the underlying indice documentation, so defining indices with complete documentation and an appropriate signature helps the process. The two next sections go into details on the definition of both objects.n”, “n”, “#### Call sequencen”, “n”, “The following graph shows the steps done when calling an Indicator. Attributes and methods of the Indicator object relating to those steps are listed on the right side.n”, “n”, “”
]
}, {
“cell_type”: “markdown”, “metadata”: {}, “source”: [
“## Defining new indicesn”, “n”, “The annotated example below shows the general template to be followed when defining proper _indices_. In the comments Ind is the indicator instance that would be created from this function.n”, “n”, “<div class="alert alert-info">n”, “n”, “Note that it is not _needed_ to follow these standards when writing indices that will be wrapped in indicators. Problems in parsing will not raise errors at runtime, but will result in Indicators with poorer metadata than expected by most users, especially those that dynamically use indicators in other applications where the code is inaccessible, like web services.n”, ” n”, “</div>n”, “n”, “n”, “n”, “The following code is another example.”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“import xarray as xrn”, “import xclim as xcn”, “from xclim.core.units import declare_units, convert_units_ton”, “from xclim.indices.generic import threshold_countn”, “n”, “@declare_units(tasmax="[temperature]", thresh="[temperature]")n”, “def tx_days_compare(tasmax: xr.DataArray, thresh: str = "0 degC", op: str = ‘>’, freq: str = "YS"):n”, ” r"""Number of days where maximum daily temperature. is above or under a threshold.n”, ” n”, ” The daily maximum temperature is compared to a threshold using a given operator and the numbern”, ” of days where the condition is true is returned.n”, ” n”, ” It assumes a daily input.n”, “n”, ” Parametersn”, ” ———-n”, ” tasmax : xarray.DataArray n”, ” Maximum daily temperature.n”, ” thresh : strn”, ” Threshold temperature to compare to.n”, ” op : {‘>’, ‘<’}n”, ” The operator to use.n”, ” # A fixed set of choices can be imposed. Only strings, numbers, booleans or None are accepted.n”, ” freq : strn”, ” Resampling frequency.n”, “n”, ” Returnsn”, ” ——-n”, ” xarray.DataArray, [temperature]n”, ” Maximum value of daily maximum temperature.n”, “n”, ” Notesn”, ” —–n”, ” Let \(TX_{ij}\) be the maximum temperature at day \(i\) of period \(j\). Then the maximumn”, ” daily maximum temperature for period \(j\) is:n”, “n”, ” .. math::n”, “n”, ” TXx_j = max(TX_{ij})n”, ” n”, ” Referencesn”, ” ———-n”, ” Smith, John and Tremblay, Robert, An dummy citation for examples in documentation. J. RTD. (2020).n”, ” """n”, ” thresh = convert_units_to(thresh, tasmax)n”, ” out = threshold_count(tasmax, op, thresh, freq)n”, ” out.attrs[‘units’] = "days"n”, ” return out”
]
}, {
“cell_type”: “markdown”, “metadata”: {}, “source”: [
“### Naming and conventionsn”, “n”, “Variable names should correspond to CMIP6 variables, whenever possible. The file xclim/data/variables.yml lists all variables that xclim can use when generating indicators from yaml files (see below), and new indices should try to reflect these also. For new variables, the xclim.testing.get_all_CMIP6_variables function downloads the official table of CMIP6 variables and puts everything in a dictionary. If possible, use variables names from this list, add them to variables.yml as needed.n”, “n”, “### Generic functions for common operationsn”, “n”, “The [xclim.indices.generic](../indices.rst#generic-indices-submodule) submodule contains useful functions for common computations (like threshold_count or select_resample_op) and many basic indice functions, as defined by [clix-meta](https://github.com/clix-meta/clix-meta). In order to reduce duplicate code, their use is recommended for xclim’s indices. As previously said, the units handling has to be made explicitly when non trivial, [xclim.core.units](../api.rst#module-xclim.core.units) also exposes a few helpers for that (like convert_units_to, to_agg_units or rate2amount).n”, “n”, “## Defining new indicatorsn”, “n”, “xclim’s Indicators are instances of (subclasses of) xclim.core.indicator.Indicator. While they are the central to xclim, their construction can be somewhat tricky as a lot happens backstage. Essentially, they act as self-aware functions, taking a set of input variables (DataArrays) and parameters (usually strings, integers or floats), performing some health checks on them and returning one or multiple DataArrays, with CF-compliant (and potentially translated) metadata attributes, masked according to a given missing value set of rules.n”, “They define the following key attributes:n”, “n”, “* the identifier, as string that uniquely identifies the indicator,n”, “* the realm, one of "atmos", "land", "seaIce" or "ocean", classifying the domain of use of the indicator.n”, “* the compute function that returns one or more DataArrays, the "indice",n”, “* the cfcheck and datacheck methods that make sure the inputs are appropriate and valid.n”, “* the missing function that masks elements based on null values in the input.n”, “* all metadata attributes that will be attributed to the output and that document the indicator:n”, ” - Indicator-level attribute are : title, abstract, keywords, references and notes.n”, ” - Ouput variables attributes (respecting CF conventions) are: var_name, standard_name, long_name, units, cell_methods, description and comment. n”, “n”, “Output variables attributes are regrouped in Indicator.cf_attrs and input parameters are documented in Indicator.parameters.n”, “n”, “A particularity of Indicators is that each instance corresponds to a single class: when creating a new indicator, a new class is automatically created. This is done for easy construction of indicators based on others, like shown further down.n”, “n”, “See the [class documentation](../api.rst#module-xclim.core.indicator) for more info on the meaning of each attribute. The [indicators](https://github.com/Ouranosinc/xclim/tree/master/xclim/indicators) module contains over 50 examples of indicators to draw inspiration from.n”, “n”, “### Metadata parsing vs explicit settingn”, “n”, “As explained above, most metadata can be parsed from the indice’s signature and docstring. Otherwise, it can always be set when creating a new Indicator instance or a new subclass. When _creating_ an indicator, output metadata attributes can be given as strings, or list of strings in the case of indicator returning multiple outputs. However, they are stored in the cf_attrs list of dictionaries on the instance.n”, “n”, “### Internationalization of metadatan”, “n”, “xclim offers the possibility to translate the main Indicator metadata field and automatically add the translations to the outputs. The mechnanic is explained in the [Internationalization](../internationalization.rst) page.n”, “n”, “### Inputs and checksn”, “n”, “There are two ways that xclim uses to decide which input arguments of the indicator’s call function are considered _variables_ and which are _parameters_. n”, “n”, “- The nvar indicator integer attribute sets the number of arguments that are sent to the datacheck and cfcheck methods (in the signature’s order).n”, “- The annotations of the underlying indice (the compute method). Arguments annotated with the xarray.DataArray type are considered _variables_ and can be read from the dataset passed in ds.n”, “n”, “### Indicator creationn”, “n”, “There a two ways for creating indicators:n”, “n”, “1) By initializing an existing indicator (sub)classn”, “2) From a dictionaryn”, “n”, “The first method is best when defining indicators in scripts of external modules and are explained here. The second is best used when building virtual modules through YAML files, and is explained further down and in the [submodule doc](../api.rst#module-xclim.core.indicator).n”, “n”, “Creating a new indicator that simply modifies a few metadata output of an existing one is a simple call like:”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“from xclim.core.indicator import registryn”, “from xclim.core.utils import wrapped_partialn”, “from xclim.indices import tg_meann”, “n”, “# An indicator based on tg_mean, but returning Celsius and fixed on annual resamplingn”, “tg_mean_c = registry[‘TG_MEAN’](n”, ” identifier=’tg_mean_c’,n”, ” units=’degC’,n”, ” title=’Mean daily mean temperature but in degC’,n”, ” compute=wrapped_partial(tg_mean, freq=’YS’) # We inject the freq arg.n”, “)”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“print(tg_mean_c.__doc__)”
]
}, {
“cell_type”: “markdown”, “metadata”: {}, “source”: [
“The registry is a dictionary mapping indicator identifiers (in uppercase) to their class. This way, we could subclass tg_mean to create our new indicator. tg_mean_c is the exact same as atmos.tg_mean, but outputs the result in Celsius instead of Kelvins, has a different title and resamples to "YS". The identifier keyword is here needed in order to differentiate the new indicator from tg_mean itself. If it wasn’t given, a warning would have been raised and further subclassing of tg_mean would have in fact subclassed tg_mean_c, which is not wanted!n”, “n”, “This method of class initialization is good for the cases where only metadata and perhaps the compute function is changed. However, to modify the CF compliance and data checks, we recommend creating a class first:”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“class TG_MAX_C(registry[‘TG_MAX’]):n”, ” identifier = "tg_max_c"n”, ” missing = "wmo"n”, ” title = ‘Maximum daily mean temperature’n”, ” units = ‘degC’n”, “n”, ” @staticmethodn”, ” def cfcheck(tas):n”, ” xc.core.cfchecks.check_valid(tas, "standard_name", "air_temperature")n”, ” # Add a very strict check on the long name.n”, ” # glob-like wildcards can be used (here *)n”, ” xc.core.cfchecks.check_valid(tas, "long_name", "Surface * daily temperature")n”, “n”, ” @staticmethodn”, ” def datacheck(tas):n”, ” xc.core.datachecks.check_daily(tas)n”, ” n”, “tg_max_c = TG_MAX_C()”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“from xclim.testing import open_datasetn”, “ds = open_dataset(‘ERA5/daily_surface_cancities_1990-1993.nc’)n”, “n”, “ds.tas.attrs[‘long_name’] = ‘Surface average daily temperature’n”, “n”, “with xc.set_options(cf_compliance=’raise’): n”, ” # The data passes the test we implemented ("average" is caught by the *)n”, ” tmaxc = tg_max_c(tas=ds.tas)n”, “tmaxc”
]
}, {
“cell_type”: “markdown”, “metadata”: {}, “source”: [
“A caveat of this method is that the new indicator is added to the registry with a non-trivial name. When an indicator subclass is created in a module outside xclim.indicators, the name of its parent module is prepended to its identifier in the registry. Here, the module is __main__, so:”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“‘__main__.TG_MAX_C’ in registry”
]
}, {
“cell_type”: “markdown”, “metadata”: {}, “source”: [
“A simple way to workaround this is to provided a (fake) module name. Passing one of atmos, land, seaIce or ocean will result in a normal entry in the registry. However, one could want to keep the distinction between the newly created indicators and the "official" ones by passing a custom name upon instantiation:”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“# Fake module is passed upon instantiationn”, “tg_max_c2 = TG_MAX_C(module =’example’)n”, “print(tg_max_c2.__module__)n”, “print(‘example.TG_MAX_C’ in registry)”
]
}, {
“cell_type”: “markdown”, “metadata”: {}, “source”: [
“One pattern to create multiple indicators is to write a standard subclass that declares all the attributes that are common to indicators, then call this subclass with the custom attributes. See for example in [xclim.indicators.atmos](https://github.com/Ouranosinc/xclim/blob/master/xclim/indicators/atmos/_temperature.py) how indicators based on daily mean temperatures are created from the Tas subclass of the Daily class.”
]
}, {
“cell_type”: “markdown”, “metadata”: {}, “source”: [
“## Virtual modulesn”, “n”, “xclim gives users the ability to generate their own modules from existing indices library. These mappings can help in emulating existing libraries (such as ICCLIM), with the added benefit of CF-compliant metadata, multilingual metadata support, and optimized calculations using federated resources (using Dask). This can be used for example to tailor existing indices with predefined thresholds without having to rewrite indices.n”, “n”, “Presently, xclim is capable of approximating the indices developed in ICCLIM (https://icclim.readthedocs.io/en/latest/intro.html), ANUCLIM (https://fennerschool.anu.edu.au/files/anuclim61.pdf) and clix-meta (https://github.com/clix-meta/clix-meta) and is open to contributions of new indices and library mappings.n”, “n”, “This notebook serves as an example of how one might go about creating their own library of mapped indices. Two ways are possible:n”, “n”, “1. From a YAML file (recommended way)n”, “2. From a mapping (dictionary) of indicatorsn”, “n”, “### YAML filen”, “n”, “The first method is based on the YAML syntax proposed by clix-meta, expanded to xclim’s needs. The full documentation on that syntax is [here](../api.rst#module-xclim.core.indicator). This notebook shows an example different complexities of indicator creation. It creates a minimal python module defining a indice, creates a YAML file with the metadata for several indicators and then parses it into xclim.”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {
“nbsphinx”: “hidden”
}, “outputs”: [], “source”: [
“# Workaround absence of syntax highlighting in notebooksn”, “from pygments.formatters import Terminal256Formattern”, “from pygments.lexers import YamlLexer, PythonLexer, JsonLexern”, “from pygments import highlightn”, “n”, “with open(‘example.py’) as f:n”, ” pydata = f.read()n”, “n”, “with open(‘example.yml’) as f:n”, ” ymldata = f.read()n”, “n”, “with open(‘example.fr.json’) as f:n”, ” jsondata = f.read()n”, ” n”, “highlighted_py = highlight(pydata, PythonLexer(), Terminal256Formatter(style=’manni’))n”, “highlighted_yaml = highlight(ymldata, YamlLexer(), Terminal256Formatter(style=’manni’))n”, “highlighted_json = highlight(jsondata, JsonLexer(), Terminal256Formatter(style=’manni’))”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“# These variables were generated by a hidden cell above that syntax-colored them.n”, “print(‘Content of example.py :’)n”, “print(highlighted_py)n”, “print(’\n\nContent of example.yml :’)n”, “print(highlighted_yaml)n”, “print(’\n\nContent of example.fr.json :’)n”, “print(highlighted_json)”
]
}, {
“cell_type”: “markdown”, “metadata”: {}, “source”: [
“example.yml created a module of 4 indicators.n”, “n”, “- RX1day is simply the same as registry[‘RX1DAY’], but with an updated long_name.n”, “- RX5day is based on registry[‘MAX_N_DAY_PRECIPITATION_AMOUNT’], changed the long_name and injects the window and freq arguments.n”, “- R75pdays is based on registry[‘DAYS_OVER_PRECIP_THRESH’], injects the thresh argument and changes the description of the per argument. Passing "data: {per}" tells xclim the value is still to be determined by the user, but other parameter’s metadata field might be changed.n”, “- fd is a more complex example. As there were no base: entry, the Daily class serves as a base. As it is pretty much empty, a lot has to be given explicitly:n”, ” * A list of allowed resampling frequency is passedn”, ” * Many output metadata fields are givenn”, ” * A index_function name if given (here it refers to a function in xclim.indices.generic).n”, ” * Some parameters are injected.n”, ” * The input variable data is mapped to a known variable. Functions in xclim.indices.generic are indeed generic. Here we tell xclim that the data argument is minimal daily temperature. This will set the proper units check, default value and CF-compliance checks.n”, “- R95p is similar to fd but here the index_function is not defined in xclim but rather in example.py.n”, “- R99p is the same as R95p but changes the injected value. In order to avoid rewriting the output metadata, and allowed periods, we based it on R95p : as the latter was defined within the current yaml file, the identifier is prefixed by a dot (.). However, in order to _inject_ a parameter we still need to repeat the index_function name (and retrigger the indice function wrapping process under the hood).n”, “- LPRatio is a version of "liquid precip ratio" where we we force the use of tas (instead of having it an optional variable). We also inject a specific threshold.n”, “n”, “A few ways of prescribing _default_ or _allowed_ periods (resampling frequencies) are shown here. In fd and R95p, only the default value of freq is given. R75pdays will keep the default value in the signature of the underlying indice. RX1day goes in more details by prescribing a default value and a list of _allowed_ values. xclim will be relax and accept any freq values equivalent to those listed here. Finally, RX5day directly injects the freq argument, so that it doesn’t even appear in the docstring.n”, “n”, “Additionnaly, the yaml specified a realm and references to be used on all indices and provided a submodule docstring. Creating the module is then simply:n”, “n”, “Finally, french translations for the main attributes and the new indicaters are given in example.fr.json. Even though new indicator objects are created for each yaml entry, non-specified translations are taken from the base classes if missing in the json file.n”, “n”, “Note that all files are named the same way : example.<ext>, with the translations having an additionnal suffix giving the locale name. In the next cell, we build the module by passing only the path without extension. This absence of extension is what tells xclim to try to parse a module (*.py) and custom translations (*.<locale>.json). Those two could also be read beforehand and passed through the indices= and translations= arguments.”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“import xclim as xcn”, “n”, “example = xc.core.indicator.build_indicator_module_from_yaml(‘example’, mode=’raise’)”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“print(example.__doc__)n”, “print(’–‘)n”, “print(xc.indicators.example.R99p.__doc__)”
]
}, {
“cell_type”: “markdown”, “metadata”: {}, “source”: [
“Useful for using this technique in large projects, we can iterate over the indicators like so:”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“ds2 = ds.assign(per=xc.core.calendar.percentile_doy(ds.pr, window=5, per=75).isel(percentiles=0, drop=True))n”, “n”, “outs = []n”, “with xc.set_options(metadata_locales=’fr’):n”, ” for name, ind in example.iter_indicators():n”, ” print(f’Indicator: {name}’)n”, ” print(f’\tIdentifier: {ind.identifier}’)n”, ” print(f’\tTitle: {ind.title}’)n”, ” out = ind(ds=ds2) # Use all default arguments and variables from the dataset,n”, ” outs.append(out)”
]
}, {
“cell_type”: “markdown”, “metadata”: {}, “source”: [
“out contains all the computed indices, with translated metadata.n”, “Note that this merge doesn’t make much sense with the current list of indicators since they have different frequencies (freq).”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“out = xr.merge(outs)n”, “out.attrs = {‘title’: ‘Indicators computed from the example module.’} # Merge puts the attributes of the first variable, we don’t want that.n”, “out”
]
}, {
“cell_type”: “markdown”, “metadata”: {}, “source”: [
“### Mapping of indicatorsn”, “n”, “For more complex mappings, submodules can be constructed from Indicators directly. This is not the recommended way, but can sometimes be a workaround when the YAML version is lacking features.”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“from xclim.core.indicator import build_indicator_module, registryn”, “from xclim.core.utils import wrapped_partialn”, “n”, “mapping = dict(n”, ” egg_cooking_season=registry["MAXIMUM_CONSECUTIVE_WARM_DAYS"](n”, ” module=’awesome’,n”, ” compute=wrapped_partial(xc.indices.maximum_consecutive_tx_days, thresh="35 degC"),n”, ” long_name="Season for outdoor egg cooking.",n”, ” ),n”, ” fish_feeling_days=registry["WETDAYS"](n”, ” module=’awesome’,n”, ” compute=wrapped_partial(xc.indices.wetdays, thresh="14.0 mm/day"),n”, ” long_name="Days where we feel we are fishes"n”, ” ),n”, ” sweater_weather=xc.atmos.tg_minn”, “)n”, “n”, “awesome = build_indicator_module(n”, ” name="awesome",n”, ” objs=mapping,n”, ” doc="""n”, ” =========================n”, ” My Awesome Custom indicesn”, ” =========================n”, ” There are only 3 indices that really matter when you come down to brass tacks.n”, ” This mapping library exposes them to users who want to perform real deal n”, ” climate science.n”, ” """,n”, “)”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“print(xc.indicators.awesome.__doc__)”
]
}, {
“cell_type”: “code”, “execution_count”: null, “metadata”: {}, “outputs”: [], “source”: [
“# Let’s look at our new awesome modulen”, “print(awesome.__doc__)n”, “for name, ind in awesome.iter_indicators():n”, ” print(f"{name} : {ind}")”
]
}
], “metadata”: {
- “kernelspec”: {
“display_name”: “Python 3”, “language”: “python”, “name”: “python3”
}, “language_info”: {
- “codemirror_mode”: {
“name”: “ipython”, “version”: 3
}, “file_extension”: “.py”, “mimetype”: “text/x-python”, “name”: “python”, “nbconvert_exporter”: “python”, “pygments_lexer”: “ipython3”, “version”: “3.8.8”
}, “toc”: {
“base_numbering”: 1, “nav_menu”: {}, “number_sections”: true, “sideBar”: true, “skip_h1_title”: false, “title_cell”: “Table of Contents”, “title_sidebar”: “Contents”, “toc_cell”: false, “toc_position”: {}, “toc_section_display”: true, “toc_window_display”: false
}
}, “nbformat”: 4, “nbformat_minor”: 4
}