# Extending xclim

xclim tries to make it easy for users to add their own indices and indicators. The following goes into details on how to create indices and document them so that xclim can parse most of the metadata directly. We then explain the multiple ways new Indicators can be created and, finally, how we can regroup and structure them in virtual submodules.

Central to xclim are the Indicators, objects computating indices over climate variables, but xclim also provides other modules:

Where subset is a phantom module, kept for legacy code, as it only redirects the calls to clisops.core.subset.

This introduction will focus on the Indicator/Indice part of xclim and how one can extend it by implementing new ones.

## Indices vs Indicators

Internally and in the documentation, xclim makes a distinction between “indices” and “indicators”.

### indice

• A python function accepting DataArrays and other parameters (usually bultin types)

• Returns one or several DataArrays.

• Handles the units : checks input units and set proper CF-compliant output units. But doesn’t usually prescribe specific units, the output will at minimum have the proper dimensionality.

• Performs no other checks or set any (non-unit) metadata.

• Accessible through xclim.indices.

### indicator

• An instance of a subclass of xclim.core.indicator.Indicator that wraps around an indice (stored in its compute property).

• Returns one or several DataArrays.

• Handles missing values, performs input data and metadata checks (see usage).

• Always ouputs data in the same units.

• Adds dynamically generated metadata to the output after computation.

• Accessible through xclim.indicators

Most metadata stored in the Indicators is parsed from the underlying indice documentation, so defining indices with complete documentation and an appropriate signature helps the process. The two next sections go into details on the definition of both objects.

#### Call sequence

The following graph shows the steps done when calling an Indicator. Attributes and methods of the Indicator object relating to those steps are listed on the right side.

## Defining new indices

The annotated example below shows the general template to be followed when defining proper indices. In the comments Ind is the indicator instance that would be created from this function.

Note that it is not needed to follow these standards when writing indices that will be wrapped in indicators. Problems in parsing will not raise errors at runtime, but might raise warnings and will result in Indicators with poorer metadata than expected by most users, especially those that dynamically use indicators in other applications where the code is inaccessible, like web services.

The following code is another example.

[1]:
import xarray as xr
import xclim as xc
from xclim.core.units import declare_units, convert_units_to
from xclim.indices.generic import threshold_count

@declare_units(tasmax="[temperature]", thresh="[temperature]")
def tx_days_compare(
tasmax: xr.DataArray, thresh: str = "0 degC", op: str = ">", freq: str = "YS"
):
r"""Number of days where maximum daily temperature. is above or under a threshold.

The daily maximum temperature is compared to a threshold using a given operator and the number
of days where the condition is true is returned.

It assumes a daily input.

Parameters
----------
tasmax : xarray.DataArray
Maximum daily temperature.
thresh : str
Threshold temperature to compare to.
op : {'>', '<'}
The operator to use.
# A fixed set of choices can be imposed. Only strings, numbers, booleans or None are accepted.
freq : str
Resampling frequency.

Returns
-------
xarray.DataArray, [temperature]
Maximum value of daily maximum temperature.

Notes
-----
Let :math:TX_{ij} be the maximum temperature at day :math:i of period :math:j. Then the maximum
daily maximum temperature for period :math:j is:

.. math::

TXx_j = max(TX_{ij})

References
----------
Smith, John and Tremblay, Robert, An dummy citation for examples in documentation. J. RTD. (2020).
"""
thresh = convert_units_to(thresh, tasmax)
out = threshold_count(tasmax, op, thresh, freq)
out.attrs["units"] = "days"
return out

### Naming and conventions

Variable names should correspond to CMIP6 variables, whenever possible. The file xclim/data/variables.yml lists all variables that xclim can use when generating indicators from yaml files (see below), and new indices should try to reflect these also. For new variables, the xclim.testing.get_all_CMIP6_variables function downloads the official table of CMIP6 variables and puts everything in a dictionary. If possible, use variables names from this list, add them to variables.yml as needed.

### Generic functions for common operations

The xclim.indices.generic submodule contains useful functions for common computations (like threshold_count or select_resample_op) and many basic indice functions, as defined by clix-meta. In order to reduce duplicate code, their use is recommended for xclim’s indices. As previously said, the units handling has to be made explicitly when non trivial, xclim.core.units also exposes a few helpers for that (like convert_units_to, to_agg_units or rate2amount).

## Defining new indicators

xclim’s Indicators are instances of (subclasses of) xclim.core.indicator.Indicator. While they are the central to xclim, their construction can be somewhat tricky as a lot happens backstage. Essentially, they act as self-aware functions, taking a set of input variables (DataArrays) and parameters (usually strings, integers or floats), performing some health checks on them and returning one or multiple DataArrays, with CF-compliant (and potentially translated) metadata attributes, masked according to a given missing value set of rules. They define the following key attributes:

• the identifier, as string that uniquely identifies the indicator, usually all caps.

• the realm, one of “atmos”, “land”, “seaIce” or “ocean”, classifying the domain of use of the indicator.

• the compute function that returns one or more DataArrays, the “indice”,

• the cfcheck and datacheck methods that make sure the inputs are appropriate and valid.

• the missing function that masks elements based on null values in the input.

• all metadata attributes that will be attributed to the output and that document the indicator:

• Indicator-level attribute are : title, abstract, keywords, references and notes.

• Ouput variables attributes (respecting CF conventions) are: var_name, standard_name, long_name, units, cell_methods, description and comment.

Output variables attributes are regrouped in Indicator.cf_attrs and input parameters are documented in Indicator.parameters.

A particularity of Indicators is that each instance corresponds to a single class: when creating a new indicator, a new class is automatically created. This is done for easy construction of indicators based on others, like shown further down.

See the class documentation for more info on the meaning of each attribute. The indicators module contains over 50 examples of indicators to draw inspiration from.

### Identifier vs python name

An indicator’s identifier is not the same as the name it has within the python module. For example, xc.atmos.relative_humidity has hurs as its identifier. As explained below, indicator classes can be accessed through xc.core.indicator.registry with their identifier.

### Metadata parsing vs explicit setting

As explained above, most metadata can be parsed from the indice’s signature and docstring. Otherwise, it can always be set when creating a new Indicator instance or a new subclass. When creating an indicator, output metadata attributes can be given as strings, or list of strings in the case of indicator returning multiple outputs. However, they are stored in the cf_attrs list of dictionaries on the instance.

### Internationalization of metadata

xclim offers the possibility to translate the main Indicator metadata field and automatically add the translations to the outputs. The mechnanic is explained in the Internationalization page.

### Inputs and checks

xclim decides which input arguments of the indicator’s call function are considered variables and which are parameters using the annotations of the underlying indice (the compute method). Arguments annotated with the xarray.DataArray type are considered variables and can be read from the dataset passed in ds.

### Indicator creation

There a two ways for creating indicators:

1. By initializing an existing indicator (sub)class

2. From a dictionary

The first method is best when defining indicators in scripts of external modules and are explained here. The second is best used when building virtual modules through YAML files, and is explained further down and in the submodule doc.

Creating a new indicator that simply modifies a few metadata output of an existing one is a simple call like:

[2]:
from xclim.core.indicator import registry
from xclim.core.utils import wrapped_partial

# An indicator based on tg_mean, but returning Celsius and fixed on annual resampling
tg_mean_c = registry["TG_MEAN"](
identifier="tg_mean_c",
units="degC",
title="Mean daily mean temperature but in degC",
parameters=dict(freq="YS"),  # We inject the freq arg.
)
[3]:
print(tg_mean_c.__doc__)
Mean daily mean temperature but in degC (realm: atmos)

Resample the original daily mean temperature series by taking the mean over each period.

This indicator will check for missing values according to the method "from_context".
Based on indice :py:func:~xclim.indices._simple.tg_mean.
With injected parameters: freq=YS.

Parameters
----------
tas : str or DataArray
Mean daily temperature.
Default : ds.tas. [Required units : [temperature]]
ds : Dataset, optional
A dataset with the variables given by name.
Default : None.
indexer :
Indexing parameters to compute the indicator on a temporal subset of the data. It accepts the same arguments as :py:func:xclim.indices.generic.select_time.
Default : None.

Returns
-------
tg_mean : DataArray
Mean daily mean temperature (air_temperature) [K]
cell_methods: time: mean within days time: mean over days
description: {freq} mean of daily mean temperature.

Notes
-----
Let :math:TN_i be the mean daily temperature of day :math:i, then for a period :math:p starting at
day :math:a and finishing on day :math:b:

.. math::

TG_p = \frac{\sum_{i=a}^{b} TN_i}{b - a + 1}

The registry is a dictionary mapping indicator identifiers (in uppercase) to their class. This way, we could subclass tg_mean to create our new indicator. tg_mean_c is the exact same as atmos.tg_mean, but outputs the result in Celsius instead of Kelvins, has a different title and removes control over the freq argument, resampling to “YS”. The identifier keyword is here needed in order to differentiate the new indicator from tg_mean itself. If it wasn’t given, a warning would have been raised and further subclassing of tg_mean would have in fact subclassed tg_mean_c, which is not wanted!

By default, indicator classes are registered in xclim.core.indicator.registry, using their identifier which is prepended by the indicator’s module if that indicator is declared outisde xclim. An “child” indicator inherits it’s module from its parent:

[4]:
tg_mean_c.__module__ == xc.atmos.tg_mean.__module__
[4]:
True

To create indicators with a different module, for example, in a goal to differentiate them in the registry, two methods can be used : passing module to the constructor, or using conventional class inheritance.

[5]:
# Passing module
tg_mean_c2 = registry["TG_MEAN_C"](module="test")  # we didn't change the identifier!
print(tg_mean_c2.__module__)
"test.TG_MEAN_C" in registry
xclim.indicators.test
[5]:
True
[6]:
# Conventionnal class inheritance, uses the current module name
class TG_MEAN_C3(registry["TG_MEAN_C"]):
pass  # nothing to change really

tg_mean_c3 = TG_MEAN_C3()

print(tg_mean_c3.__module__)
"__main__.TG_MEAN_C" in registry
__main__
[6]:
True

While the former method is shorter, the latter is what xclim uses internally as it provides some clean code structure. See the code in the github repo.

## Virtual modules

xclim gives users the ability to generate their own modules from existing indices library. These mappings can help in emulating existing libraries (such as ICCLIM), with the added benefit of CF-compliant metadata, multilingual metadata support, and optimized calculations using federated resources (using Dask). This can be used for example to tailor existing indices with predefined thresholds without having to rewrite indices.

Presently, xclim is capable of approximating the indices developed in ICCLIM (https://icclim.readthedocs.io/en/latest/intro.html), ANUCLIM (https://fennerschool.anu.edu.au/files/anuclim61.pdf) and clix-meta (https://github.com/clix-meta/clix-meta) and is open to contributions of new indices and library mappings.

This notebook serves as an example of how one might go about creating their own library of mapped indices. Two ways are possible:

1. From a YAML file (recommended way)

2. From a mapping (dictionary) of indicators

### YAML file

The first method is based on the YAML syntax proposed by clix-meta, expanded to xclim’s needs. The full documentation on that syntax is here. This notebook shows an example different complexities of indicator creation. It creates a minimal python module defining a indice, creates a YAML file with the metadata for several indicators and then parses it into xclim.

[8]:
# These variables were generated by a hidden cell above that syntax-colored them.
print("Content of example.py :")
print(highlighted_py)
print("\n\nContent of example.yml :")
print(highlighted_yaml)
print("\n\nContent of example.fr.json :")
print(highlighted_json)
Content of example.py :
import xarray as xr

from xclim.core.units import declare_units, rate2amount

@declare_units(pr="[precipitation]")
def extreme_precip_accumulation_and_days(
pr: xr.DataArray, perc: float = 95, freq: str = "YS"
):
"""Total precipitation accumulation during extreme events and number of days of such precipitation.

The perc percentile of the precipitation (including all values, not in a day-of-year manner)
is computed. Then, for each period, the days where pr is above the threshold are accumulated,
to get the total precip related to those extreme events.

Parameters
----------
pr: xr.DataArray
Precipitation flux (both phases).
perc: float
Percentile corresponding to "extreme" precipitation, [0-100].
freq: str
Resampling frequency.

Returns
-------
xarray.DataArray
Precipitation accumulated during events where pr was above the {perc}th percentile of the whole series.
xarray.DataArray
Number of days where pr was above the {perc}th percentile of the whole series.
"""
pr_thresh = pr.quantile(perc / 100, dim="time").drop_vars("quantile")

extreme_days = pr >= pr_thresh
pr_extreme = rate2amount(pr).where(extreme_days)

out1 = pr_extreme.resample(time=freq).sum()
out1.attrs["units"] = pr_extreme.units

out2 = extreme_days.resample(time=freq).sum()
out2.attrs["units"] = "days"
return out1, out2

Content of example.yml :
doc: |
==============
Example module
==============

This module is an example of YAML generated xclim submodule.
realm: atmos
references: xclim documentation https://xclim.readthedocs.io
indicators:
RX1day:
base: rx1day
cf_attrs:
long_name: Highest 1-day precipitation amount
RX5day:
base: max_n_day_precipitation_amount
cf_attrs:
long_name: Highest 5-day precipitation amount
parameters:
freq: QS-DEC
window: 5
R75pdays:
base: days_over_precip_thresh
parameters:
per:
description: Daily 75th percentile of wet day precipitation flux.
thresh: 1 mm/day
fd:
compute: count_occurrences
input:
data: tasmin
cf_attrs:
cell_methods: 'time: minimum within days time: sum over days'
long_name: Number of Frost Days (Tmin < 0°C)
standard_name: number_of_days_with_air_temperature_below_threshold
units: days
var_name: fd
parameters:
condition: <
threshold: 0 degC
freq:
default: YS
references: ETCCDI
R95p:
compute: extreme_precip_accumulation_and_days
cf_attrs:
- cell_methods: 'time: sum within days time: sum over days'
long_name: Annual total PRCP when RR > {perc}th percentile
units: m
var_name: R95p
- long_name: Annual number of days when RR > {perc}th percentile
units: days
var_name: R95p_days
parameters:
perc: 95
references: climdex
R99p:
base: .R95p
cf_attrs:
- var_name: R99p
- var_name: R99p_days
parameters:
perc: 99

Content of example.fr.json :
{
"FD": {
"title": "Nombre de jours de gel",
"long_name" : "Nombre de jours de gel (Tmin < 0°C)",
"description": "Nombre de jours où la température minimale passe sous 0°C."
},
"R95P": {
"title": "Précpitations accumulées lors des jours de fortes pluies (> {perc}e percentile)"
},
"R95P.R95p": {
"long_name": "Accumulation {freq:f} des précipitations lors des jours de fortes pluies (> {perc}e percentile)",
"description": "Épaisseur équivalent des précipitations accumulées lors des jours où la pluie est plus forte que le {perc}e percentile de la série."
},
"R95P.R95p_days": {
"long_name": "Nombre de jours de fortes pluies (> {perc}e percentile)",
"description": "Nombre de jours où la pluie est plus forte que le {perc}e percentile de la série."
},
"R99P.R99p": {
"long_name": "Accumulation {freq:f} des précipitations lors des jours de fortes pluies (> {perc}e percentile)",
"description": "Épaisseur équivalent des précipitations accumulées lors des jours où la pluie est plus forte que le {perc}e percentile de la série."
},
"R99P.R99p_days": {
"long_name": "Nombre de jours de fortes pluies (> {perc}e percentile)",
"description": "Nombre de jours où la pluie est plus forte que le {perc}e percentile de la série."
}
}

example.yml created a module of 4 indicators.

Values of the base arguments are the identifier of the associated indicators, and those can be different than their name within the python modules. For example, xc.atmos.relative_humidity has HURS as identifier. One can always access xc.atmos.relative_humidity.identifier to get the correct name to use.

• RX1day is simply the same as registry['RX1DAY'], but with an updated long_name.

• RX5day is based on registry['MAX_N_DAY_PRECIPITATION_AMOUNT'], changed the long_name and injects the window and freq arguments.

• R75pdays is based on registry['DAYS_OVER_PRECIP_THRESH'], injects the thresh argument and changes the description of the per argument.

• fd is a more complex example. As there were no base: entry, the Daily class serves as a base. As it is pretty much empty, a lot has to be given explicitly:

• Many output metadata fields are given

• A compute function name if given (here it refers to a function in xclim.indices.generic).

• Some parameters are injected, the default for freq is modified.

• The input variable data is mapped to a known variable. Functions in xclim.indices.generic are indeed generic. Here we tell xclim that the data argument is minimum daily temperature. This will set the proper units check, default value and CF-compliance checks.

• R95p is similar to fd but here the compute is not defined in xclim but rather in example.py. Also, the custom function returns two outputs, so the output section is a list of mappings rather than a mapping directly.

• R99p is the same as R95p but changes the injected value. In order to avoid rewriting the output metadata, and allowed periods, we based it on R95p : as the latter was defined within the current yaml file, the identifier is prefixed by a dot (.).

Additionnaly, the yaml specified a realm and references to be used on all indices and provided a submodule docstring. Creating the module is then simply:

Finally, french translations for the main attributes and the new indicaters are given in example.fr.json. Even though new indicator objects are created for each yaml entry, non-specified translations are taken from the base classes if missing in the json file.

Note that all files are named the same way : example.<ext>, with the translations having an additionnal suffix giving the locale name. In the next cell, we build the module by passing only the path without extension. This absence of extension is what tells xclim to try to parse a module (*.py) and custom translations (*.<locale>.json). Those two could also be read beforehand and passed through the indices= and translations= arguments.

#### Validation of the YAML file

Using yamale, it is possible to check if the yaml file is valid. xclim ships with a schema (in xclim/data/schema.yml) file. The file can be located with:

[9]:
from importlib.resources import path

with path("xclim.data", "schema.yml") as f:
print(f)

And the validation can be executed either in a python session:

[10]:
import yamale

with path("xclim.data", "schema.yml") as f:
schema = yamale.make_schema(f)
data = yamale.make_data("example.yml")  # in the current folder
yamale.validate(schema, data)
[10]:

No errors means it passed. The validation can also be run through the command line with:

yamale -s path/to/schema.yml path/to/module.yml

#### Loading the module and computating of the indices.

[11]:
import xclim as xc

example = xc.core.indicator.build_indicator_module_from_yaml("example", mode="raise")
[12]:
print(example.__doc__)
print("--")
print(xc.indicators.example.R99p.__doc__)
==============
Example module
==============

This module is an example of YAML generated xclim submodule.

--
Total precipitation accumulation during extreme events and number of days of such precipitation. (realm: atmos)

The perc percentile of the precipitation (including all values, not in a day-of-year manner) is computed. Then, for each period, the days where pr is above the threshold are accumulated, to get the total precip related to those extreme events.

This indicator will check for missing values according to the method "from_context".
Based on indice :py:func:~example.extreme_precip_accumulation_and_days.
With injected parameters: perc=99.

Parameters
----------
pr : str or DataArray
Precipitation flux (both phases).
Default : ds.pr. [Required units : [precipitation]]
freq : offset alias (string)
Resampling frequency.
Default : YS.
ds : Dataset, optional
A dataset with the variables given by name.
Default : None.

Returns
-------
R99p : DataArray
Annual total PRCP when RR > {perc}th percentile [m]
cell_methods: time: sum within days time: sum over days
R99p_days : DataArray
Annual number of days when RR > {perc}th percentile [days]

References
----------

Useful for using this technique in large projects, we can iterate over the indicators like so:

[13]:
from xclim.testing import open_dataset

ds = open_dataset("ERA5/daily_surface_cancities_1990-1993.nc")
ds2 = ds.assign(
per=xc.core.calendar.percentile_doy(ds.pr, window=5, per=75).isel(
percentiles=0, drop=True
)
)

outs = []
for name, ind in example.iter_indicators():
print(f"Indicator: {name}")
print(f"\tIdentifier: {ind.identifier}")
print(f"\tTitle: {ind.title}")
out = ind(ds=ds2)  # Use all default arguments and variables from the dataset
if isinstance(out, tuple):
outs.extend(out)
else:
outs.append(out)
Indicator: RX1day
Identifier: RX1day
Title: Highest 1-day precipitation amount for a period (frequency).
Indicator: RX5day
Identifier: RX5day
Title: Highest precipitation amount cumulated over a n-day moving window.
Indicator: R75pdays
Identifier: R75pdays
Title: Number of wet days with daily precipitation over a given percentile.
Indicator: fd
Identifier: fd
Title: Calculate the number of times some condition is met.
Indicator: R95p
Identifier: R95p
Title: Total precipitation accumulation during extreme events and number of days of such precipitation.
Indicator: R99p
Identifier: R99p
Title: Total precipitation accumulation during extreme events and number of days of such precipitation.

out contains all the computed indices, with translated metadata. Note that this merge doesn’t make much sense with the current list of indicators since they have different frequencies (freq).

[14]:
out = xr.merge(outs)
out.attrs = {
"title": "Indicators computed from the example module."
}  # Merge puts the attributes of the first variable, we don't want that.
out
[14]:
<xarray.Dataset>
Dimensions:                  (time: 21, location: 5)
Coordinates:
* time                     (time) datetime64[ns] 1989-12-01 ... 1993-12-01
lat                      (location) float32 44.5 45.5 63.75 52.0 48.5
* location                 (location) object 'Halifax' ... 'Victoria'
lon                      (location) float32 -63.5 -73.5 -68.5 -106.8 -123.2
Data variables:
RX1day                   (location, time) float32 nan 61.13 nan ... nan nan
RX5day                   (location, time) float64 nan nan 84.1 ... 26.55 nan
days_over_precip_thresh  (location, time) float64 nan 93.0 nan ... nan nan
fd                       (location, time) float64 nan 92.0 nan ... nan nan
R95p                     (location, time) float64 nan 0.7553 nan ... nan nan
R95p_days                (location, time) float64 nan 24.0 nan ... nan nan
R99p                     (location, time) float64 nan 0.2054 nan ... nan nan
R99p_days                (location, time) float64 nan 4.0 nan ... nan nan
Attributes:
title:    Indicators computed from the example module.

### Mapping of indicators

For more complex mappings, submodules can be constructed from Indicators directly. This is not the recommended way, but can sometimes be a workaround when the YAML version is lacking features.

[15]:
from xclim.core.indicator import build_indicator_module, registry
from xclim.core.utils import wrapped_partial

mapping = dict(
egg_cooking_season=registry["MAXIMUM_CONSECUTIVE_WARM_DAYS"](
module="awesome",
compute=xc.indices.maximum_consecutive_tx_days,
parameters=dict(thresh="35 degC"),
long_name="Season for outdoor egg cooking.",
),
fish_feeling_days=registry["WETDAYS"](
module="awesome",
compute=xc.indices.wetdays,
parameters=dict(thresh="14.0 mm/day"),
long_name="Days where we feel we are fishes",
),
sweater_weather=xc.atmos.tg_min.__class__(module="awesome"),
)

awesome = build_indicator_module(
name="awesome",
objs=mapping,
doc="""
=========================
My Awesome Custom indices
=========================
There are only 3 indices that really matter when you come down to brass tacks.
This mapping library exposes them to users who want to perform real deal
climate science.
""",
)
[16]:
print(xc.indicators.awesome.__doc__)

=========================
My Awesome Custom indices
=========================
There are only 3 indices that really matter when you come down to brass tacks.
This mapping library exposes them to users who want to perform real deal
climate science.

[17]:
# Let's look at our new awesome module
print(awesome.__doc__)
for name, ind in awesome.iter_indicators():
print(f"{name} : {ind}")

=========================
My Awesome Custom indices
=========================
There are only 3 indices that really matter when you come down to brass tacks.
This mapping library exposes them to users who want to perform real deal
climate science.

egg_cooking_season : <xclim.indicators.awesome.MAXIMUM_CONSECUTIVE_WARM_DAYS object at 0x7f4e692b4b50>
fish_feeling_days : <xclim.indicators.awesome.WETDAYS object at 0x7f4e692bd190>
sweater_weather : <xclim.indicators.awesome.TG_MIN object at 0x7f4e692bd1d0>