Download this notebook from github.

Customizing and controlling xclim

xclim’s behaviour can be controlled globally or contextually through xclim.set_options, which acts the same way as xarray.set_options. For the extension of xclim with the addition of indicators, see the Extending xclim notebook.

[1]:
import xarray as xr
import xclim
from xclim.testing import open_dataset

Let’s create fake data with some missing values and mask every 10th, 20th and 30th of the month.This represents 9.6-10% of masked data for all months except February where it is 7.1%.

[2]:
tasmax = (
    xr.tutorial.open_dataset("air_temperature")
    .air.resample(time="D")
    .max(keep_attrs=True)
)
tasmax = tasmax.where(tasmax.time.dt.day % 10 != 0)

Checks

Above, we created fake temperature data from a xarray tutorial dataset that doesn’t have all the standard CF attributes. By default, when triggering a computation with an Indicator from xclim, warnings will be raised:

[3]:
tx_mean = xclim.atmos.tx_mean(tasmax=tasmax, freq="MS")  # compute monthly max tasmax
/home/docs/checkouts/readthedocs.org/user_builds/xclim/envs/stable/lib/python3.7/site-packages/xclim/core/cfchecks.py:40: UserWarning: Variable does not have a `cell_methods` attribute.
  getattr(vardata, "cell_methods", None), data["cell_methods"]
/home/docs/checkouts/readthedocs.org/user_builds/xclim/envs/stable/lib/python3.7/site-packages/xclim/core/cfchecks.py:43: UserWarning: Variable does not have a `standard_name` attribute.
  check_valid(vardata, "standard_name", data["standard_name"])

Setting cf_compliance to 'log' mutes those warnings and sends them to the log instead.

[4]:
xclim.set_options(cf_compliance="log")

tx_mean = xclim.atmos.tx_mean(tasmax=tasmax, freq="MS")  # compute monthly max tasmax

Adding translated metadata

With the help of its internationalization module (xclim.core.locales), xclim can add translated metadata to the output of the indicators. The metadata is not translated on-the-fly, but translations are manually written for each indicator and metadata field. Currently, all indicators have a french translation, but users can add more choices. See Internationalization and Extending xclim.

In the example below, notice the added long_name_fr and description_fr attributes. Also, the use of set_options as a context makes this configuration transient, only valid within the context.

[5]:
with xclim.set_options(metadata_locales=["fr"]):
    out = xclim.atmos.tx_max(tasmax=tasmax)
out.attrs
[5]:
{'long_name': 'Maximum daily maximum temperature',
 'units': 'K',
 'precision': 2,
 'GRIB_id': 11,
 'GRIB_name': 'TMP',
 'var_desc': 'Air temperature',
 'dataset': 'NMC Reanalysis',
 'level_desc': 'Surface',
 'statistic': 'Individual Obs',
 'parent_stat': 'Other',
 'actual_range': array([185.16, 322.1 ], dtype=float32),
 'cell_methods': ' time: maximum within days time: maximum over days',
 'history': "[2022-01-07 21:37:03] tx_max: TX_MAX(tasmax=air, freq='YS') - xclim version: 0.32.1.",
 'standard_name': 'air_temperature',
 'description': 'Annual maximum of daily maximum temperature.',
 'long_name_fr': 'Maximum de la température journalière',
 'description_fr': 'Maximum annuel de la température journalière maximale.'}

Missing values

One can also globally change the missing method.

Change the default missing method to “pct” and set its tolerance to 8%:

[6]:
xclim.set_options(check_missing="pct", missing_options={"pct": {"tolerance": 0.08}})

tx_mean = xclim.atmos.tx_mean(tasmax=tasmax, freq="MS")  # compute monthly max tasmax
tx_mean.sel(time="2013", lat=75, lon=200)
[6]:
<xarray.DataArray 'tx_mean' (time: 12)>
array([      nan, 242.76694,       nan,       nan,       nan,       nan,
             nan,       nan,       nan,       nan,       nan,       nan],
      dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 2013-01-01 2013-02-01 ... 2013-12-01
    lat      float32 75.0
    lon      float32 200.0
Attributes:
    long_name:      Mean daily maximum temperature
    units:          K
    precision:      2
    GRIB_id:        11
    GRIB_name:      TMP
    var_desc:       Air temperature
    dataset:        NMC Reanalysis
    level_desc:     Surface
    statistic:      Individual Obs
    parent_stat:    Other
    actual_range:   [185.16 322.1 ]
    cell_methods:    time: maximum within days time: mean over days
    history:        [2022-01-07 21:37:03] tx_mean: TX_MEAN(tasmax=air, freq='...
    standard_name:  air_temperature
    description:    Monthly mean of daily maximum temperature.

Only February has non-masked data. Let’s say we want to use the “wmo” method (and its default options), but only once, we can do:

[7]:
with xclim.set_options(check_missing="wmo"):
    tx_mean = xclim.atmos.tx_mean(
        tasmax=tasmax, freq="MS"
    )  # compute monthly max tasmax
tx_mean.sel(time="2013", lat=75, lon=200)
[7]:
<xarray.DataArray 'tx_mean' (time: 12)>
array([246.4122 , 242.76694, 250.18001, 260.53598, 268.20145, 274.92004,
       277.01144, 273.31146, 270.30484, 263.94357, 254.68298, 251.45862],
      dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 2013-01-01 2013-02-01 ... 2013-12-01
    lat      float32 75.0
    lon      float32 200.0
Attributes:
    long_name:      Mean daily maximum temperature
    units:          K
    precision:      2
    GRIB_id:        11
    GRIB_name:      TMP
    var_desc:       Air temperature
    dataset:        NMC Reanalysis
    level_desc:     Surface
    statistic:      Individual Obs
    parent_stat:    Other
    actual_range:   [185.16 322.1 ]
    cell_methods:    time: maximum within days time: mean over days
    history:        [2022-01-07 21:37:03] tx_mean: TX_MEAN(tasmax=air, freq='...
    standard_name:  air_temperature
    description:    Monthly mean of daily maximum temperature.

This method checks that there is less than nm=5 invalid values in a month and that there are no consecutive runs of nc>=4 invalid values. Thus, every month is now valid.

Finally, it is possible for advanced users to register their own method. Xclim’s missing methods are in fact based on class instances. Thus, to create a custom missing class, one should implement a subclass based on xclim.core.checks.MissingBase and overriding at least the is_missing method. The method should take a null argument and a count argument.

  • null is a DataArrayResample instance of the resampled mask of invalid values in the input dataarray.

  • count is the number of days in each resampled periods and any number of other keyword arguments.

The is_missing method should return a boolean mask, at the same frequency as the indicator output (same as count), where True values are for elements that are considered missing and masked on the output.

When registering the class with the xclim.core.checks.register_missing_method decorator, the keyword arguments will be registered as options for the missing method. One can also implement a validate static method that receives only those options and returns whether they should be considered valid or not.

[8]:
from xclim.core.missing import register_missing_method
from xclim.core.missing import MissingBase
from xclim.indices.run_length import longest_run


@register_missing_method("consecutive")
class MissingConsecutive(MissingBase):
    """Any period with more than max_n consecutive missing values is considered invalid"""

    def is_missing(self, null, count, max_n=5):
        return null.map(longest_run, dim="time") >= max_n

    @staticmethod
    def validate(max_n):
        return max_n > 0

The new method is now accessible and usable with:

[9]:
with xclim.set_options(
    check_missing="consecutive", missing_options={"consecutive": {"max_n": 2}}
):
    tx_mean = xclim.atmos.tx_mean(
        tasmax=tasmax, freq="MS"
    )  # compute monthly max tasmax
tx_mean.sel(time="2013", lat=75, lon=200)
[9]:
<xarray.DataArray 'tx_mean' (time: 12)>
array([246.4122 , 242.76694, 250.18001, 260.53598, 268.20145, 274.92004,
       277.01144, 273.31146, 270.30484, 263.94357, 254.68298, 251.45862],
      dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 2013-01-01 2013-02-01 ... 2013-12-01
    lat      float32 75.0
    lon      float32 200.0
Attributes:
    long_name:      Mean daily maximum temperature
    units:          K
    precision:      2
    GRIB_id:        11
    GRIB_name:      TMP
    var_desc:       Air temperature
    dataset:        NMC Reanalysis
    level_desc:     Surface
    statistic:      Individual Obs
    parent_stat:    Other
    actual_range:   [185.16 322.1 ]
    cell_methods:    time: maximum within days time: mean over days
    history:        [2022-01-07 21:37:05] tx_mean: TX_MEAN(tasmax=air, freq='...
    standard_name:  air_temperature
    description:    Monthly mean of daily maximum temperature.