Source code for xclim.core.indicator

# -*- coding: utf-8 -*-
# noqa: D205,D400
"""
Indicators utilities
====================

The `Indicator` class wraps indices computations with pre- and post-processing functionality. Prior to computations,
the class runs data and metadata health checks. After computations, the class masks values that should be considered
missing and adds metadata attributes to the output object.

There are many ways to construct indicators. A good place to start is `this notebook <notebooks/extendxclim.ipynb#Defining-new-indicators>`_.

Dictionary and YAML parser
--------------------------

To construct indicators dynamically, xclim can also use dictionaries and parse them from YAML files.
This is especially useful for generating whole indicator "submodules" from files.
This functionality is based on and extends the work of [clix-meta](https://github.com/clix-meta/clix-meta/).

YAML file structure
~~~~~~~~~~~~~~~~~~~

Indicator-defining yaml files are structured in the following way:

.. code-block:: yaml

    module: <module name>  # Defaults to the file name
    realm: <realm>  # If given here, applies to all indicators that do no give it.
    base: <base indicator class>  # Defaults to "Daily"
    doc: <module docstring>  # Defaults to a minimal header, only valid if the module doesn't already exists.
    indices:
      <identifier>:
        base: <base indicator class>  # Defaults to module-wide base class or "Daily".
                                      # If the name startswith a '.', the base class is taken from the current module (thus an indicator declared _above_)
        realm: <realm>  # Defaults to the module-wide realm or "atmos"
        reference: <references>
        references: <references>  # Plural or singular accepted (for harmonizing clix-meta and xclim)
        keywords: <keywords>
        notes: <notes>
        title: <title>
        abstract: <abstract>
        period: annual  # Becomes the default value for `freq`. Translates to "YS", "QS-DEC", "MS" or "W-SUN". See xclim.core.units.FREQ_NAMES.
        output:
          var_name: <var_name>  # Defaults to "identifier",
          standard_name: <standard_name>
          long_name: <long_name>
          description: <description>
          comment: <comment>
          units: <units>  # Defaults to ""
          cell_methods:
            - <dim1> : <method 1>
            ...

        index_function:
          name: <function name>  # Refering to a function in the passed indices module, xclim.indices.generic or xclim.indices
          # When using Indicator.from_dict, the "name" field can also be a function (instead of a string)
          parameters:  # See below for details on that section.
            <param name>: <param data>  # Simplest case when only injecting is needed.
            <param name>:  # Most complex case where we want to change parameters metadata. Also retrocompatible with clix-meta.
              kind: <param kind>  # Optional, one of quantity, operator or reducer
              data: <param data>
              units: <param units>
              operator: <param data>
              reducer: <param data>
            ...

        input:
          <var1> : <variable type 1>  # <var1> refers to a name in the function above, see below.
          ...
      ...  # and so on.

All fields are optional. Other fields can be found in the yaml file, but they will not be used by xclim.
In the following, the section under `<identifier>` is refered to as `data`. When creating indicators from
a dictionary, with :py:meth:`Indicator.from_dict`, the input dict must follow the structure of `data`.

Indicator parameters
~~~~~~~~~~~~~~~~~~~~
`clix-meta` defines three kinds of parameters:

    - "quantity", a quantity with a magnitude and some units, (equivalent to xclim.core.utils.InputKind.QUANTITY_STR)
      The value is given through the magnitude in "data" and units in "units".
    - "operator", one of "<", "<=", ">", ">=", "==", "!=", an operator for conditional computations.
      The value is given in "operator".
    - "reducer", one of "maximum", "minimum", "mean", "sum", a reducing method name.
      The value is given in "reducer".

xclim supports both this syntax and a simpler one where only the "data" key is given.
As YAML is able to cast simple python literals, no passing of "kind" is needed, if a string parameter could be
mistranslated to a boolean or a number, simply use quotes to isolate it. To pass a number sequence, use
the yaml list syntax.

Inputs
~~~~~~
As xclim has strict definitions of possible input variables (see :py:data:`xclim.core.yaml.variables`),
the mapping of `data.input` simply links a variable name from the function in `data.index_function.name`
to one of those official variables.

"""
import re
import warnings
import weakref
from collections import OrderedDict, defaultdict
from copy import deepcopy
from inspect import Parameter, _empty, signature  # noqa
from os import PathLike
from pathlib import Path
from types import ModuleType
from typing import Any, Callable, Dict, List, Mapping, Optional, Sequence, Type, Union

import numpy as np
from boltons.funcutils import copy_function, wraps
from xarray import DataArray, Dataset
from yaml import safe_load

from .. import indices
from . import datachecks
from .calendar import parse_offset
from .cfchecks import cfcheck_from_name
from .formatting import (
    AttrFormatter,
    default_formatter,
    gen_call_string,
    generate_indicator_docstring,
    merge_attributes,
    parse_cell_methods,
    parse_doc,
    update_history,
)
from .locales import (
    TRANSLATABLE_ATTRS,
    get_local_attrs,
    get_local_formatter,
    load_locale,
    read_locale_file,
)
from .options import METADATA_LOCALES, MISSING_METHODS, MISSING_OPTIONS, OPTIONS
from .units import FREQ_NAMES, convert_units_to, declare_units, units  # noqa
from .utils import (
    VARIABLES,
    InputKind,
    MissingVariableError,
    infer_kind_from_parameter,
    load_module,
    raise_warn_or_log,
    wrapped_partial,
)

# Indicators registry
registry = dict()  # Main class registry
_indicators_registry = defaultdict(list)  # Private instance registry


[docs]class IndicatorRegistrar: """Climate Indicator registering object.""" def __new__(cls): """Add subclass to registry.""" name = cls.__name__.upper() module = cls.__module__ # If the module is not one of xclim's default, prepend the submodule name. if module.startswith("xclim.indicators"): submodule = module.split(".")[2] if submodule not in ["atmos", "land", "ocean", "seaIce"]: name = f"{submodule}.{name}" else: name = f"{module}.{name}" if name in registry: warnings.warn( f"Class {name} already exists and will be overwritten.", stacklevel=1 ) registry[name] = cls cls._registry_id = name return super().__new__(cls) def __init__(self): _indicators_registry[self.__class__].append(weakref.ref(self))
[docs] @classmethod def get_instance(cls): """Return first found instance. Raises `ValueError` if no instance exists. """ for inst_ref in _indicators_registry[cls]: inst = inst_ref() if inst is not None: return inst raise ValueError( f"There is no existing instance of {cls.__name__}. " "Either none were created or they were all garbage-collected." )
[docs]class Indicator(IndicatorRegistrar): r"""Climate indicator base class. Climate indicator object that, when called, computes an indicator and assigns its output a number of CF-compliant attributes. Some of these attributes can be *templated*, allowing metadata to reflect the value of call arguments. Instantiating a new indicator returns an instance but also creates and registers a custom subclass. Parameters in `Indicator._cf_names` will be added to the output variable(s). When creating new `Indicators` subclasses, if the compute function returns multiple variables, attributes may be given as lists of strings or strings. In the latter case, the same value is used on all variables. Compared to their base `compute` function, indicators add the possibility of using dataset as input, with the injected argument `ds` in the call signature. All arguments that were indicated by the compute function to be DataArrays through annotations will be promoted to also accept strings that correspond to variable names in the `ds` dataset. Parameters ---------- identifier: str Unique ID for class registry, should be a valid slug. realm : {'atmos', 'seaIce', 'land', 'ocean'} General domain of validity of the indicator. Indicators created outside xclim.indicators must set this attribute. compute: func The function computing the indicators. It should return one or more DataArray. var_name: str or Sequence[str] Output variable(s) name(s). May use tags {<tag>}. If the indicator outputs multiple variables, var_name *must* be a list of the same length. standard_name: str or Sequence[str] Variable name (CF). long_name: str or Sequence[str] Descriptive variable name. Parsed from `compute` docstring if not given. units: str or Sequence[str] Representative units of the physical quantity (CF). cell_methods: str or Sequence[str] List of blank-separated words of the form "name: method" (CF). description: str or Sequence[str] Sentence meant to clarify the qualifiers of the fundamental quantities, such as which surface a quantity is defined on or what the flux sign conventions are. comment: str or Sequence[str] Miscellaneous information about the data or methods used to produce it. title: str A succinct description of what is in the computed outputs. Parsed from `compute` docstring if None. abstract: str A long description of what is in the computed outputs. Parsed from `compute` docstring if None. keywords: str Comma separated list of keywords. Parsed from `compute` docstring if None. references: str Published or web-based references that describe the data or methods used to produce it. Parsed from `compute` docstring if None. notes: str Notes regarding computing function, for example the mathematical formulation. Parsed from `compute` docstring if None. missing: {any, wmo, pct, at_least_n, skip, from_context} The name of the missing value method. See `xclim.core.missing.MissingBase` to create new custom methods. If None, this will be determined by the global configuration (see `xclim.set_options`). Defaults to "from_context". freq: {"D", "H", None} The expected frequency of the input data. Use None if irrelevant. missing_options : dict, None Arguments to pass to the `missing` function. If None, this will be determined by the global configuration. context: str The `pint` unit context, for example use 'hydro' to allow conversion from kg m-2 s-1 to mm/day. allowed_periods : Sequence[str], optional A list of allowed periods, i.e. base parts of the `freq` parameter. For example, indicators meant to be computed annually only will have `allowed_periods=["Y", "A"]`. `None` means, "any period" or that the indicator doesn't take a `freq` argument. Notes ----- All subclasses created are available in the `registry` attribute and can be used to define custom subclasses or parse all available instances. """ # Allowed metadata attributes on the output variables _cf_names = [ "var_name", "standard_name", "long_name", "units", "cell_methods", "description", "comment", ] # metadata fields that are formatted as free text. _text_fields = ["long_name", "description", "comment"] _funcs = ["compute", "cfcheck", "datacheck"] # Will become the class's name identifier = None missing = "from_context" missing_options = None context = "none" freq = None allowed_periods = None # Variable metadata (_cf_names, those that can be lists or strings) # A developper should access those through cf_attrs on instances var_name = None standard_name = "" long_name = "" units = "" cell_methods = "" description = "" comment = "" # Global metadata (must be strings, not attributed to the output) realm = None title = "" abstract = "" keywords = "" references = "" notes = "" parameters: Mapping[str, Any] """A dictionary mapping metadata about the input parameters to the indicator. Contains : "default", "description", "kind" and, sometimes, "units" and "choices". "kind" refers to the constants of :py:class:`xclim.core.utils.InputKind`. """ cf_attrs: Sequence[Mapping[str, Any]] """A list of metadata information for each output of the indicator. It minimally contains a "var_name" entry, and may contain : "standard_name", "long_name", "units", "cell_methods", "description" and "comment". """ def __new__(cls, **kwds): """Create subclass from arguments.""" identifier = kwds.get("identifier", cls.identifier) if identifier is None: raise AttributeError("`identifier` has not been set.") kwds["var_name"] = kwds.get("var_name", cls.var_name) or identifier # Parse and update compute's signature. kwds["compute"] = kwds.get("compute", None) or cls.compute # Updated to allow string variable names and the ds arg. # Parse docstring of the compute function, its signature and its parameters kwds["_indcompute"], docmeta, params = _parse_indice( kwds["compute"], passed=kwds.get("parameters"), ds={ "annotation": Dataset, "description": "A dataset with the variables given by name.", }, ) # The update signature kwds["_sig"] = kwds["_indcompute"].__signature__ # The input parameters' name kwds["_parameters"] = tuple(kwds["_sig"].parameters.keys()) # All fields parsed by parse_doc except "parameters" # i.e. : title, abstract, notes, references, long_name for name, value in docmeta.items(): if not getattr(cls, name): # Set if neither the class attr is set nor the kwds attr kwds.setdefault(name, value) # The input parameters' metadata # We dump whatever the base class had and take what was parsed from the current compute function. kwds["parameters"] = params # By default skip missing values handling if there is no resampling. # Dont only check if freq is in current parameters but also if it was injected earlier. if "freq" not in params and "freq" not in getattr( kwds["compute"], "_injected", {} ): kwds["missing"] = "skip" # Parse kwds to organize cf_attrs # Must be done after parsing var_name # And before converting callables to staticmethods kwds["cf_attrs"] = cls._parse_cf_attrs(kwds) # Convert function objects to static methods. for key in cls._funcs + cls._cf_names: if key in kwds and callable(kwds[key]): kwds[key] = staticmethod(kwds[key]) # Infer realm for built-in xclim instances if cls.__module__.startswith(__package__.split(".")[0]): xclim_realm = cls.__module__.split(".")[2] else: xclim_realm = None # Priority given to passed realm -> parent's realm -> location of the class declaration (official inds only) kwds.setdefault("realm", cls.realm or xclim_realm) if kwds["realm"] not in ["atmos", "seaIce", "land", "ocean"]: raise AttributeError( "Indicator's realm must be given as one of 'atmos', 'seaIce', 'land' or 'ocean'" ) # Create new class object new = type(identifier.upper(), (cls,), kwds) # Forcing the module is there so YAML-generated submodules are correctly seen by IndicatorRegistrar. if "module" in kwds: new.__module__ = f"xclim.indicators.{kwds['module']}" else: # If the module was not forced, set the module to the base class' module. # Otherwise all indicators will have module `xclim.core.indicator`. new.__module__ = cls.__module__ # Generate docstring new._indcompute.__doc__ = new.__doc__ = generate_indicator_docstring(new) # Add the created class to the registry # This will create an instance from the new class and call __init__. return super().__new__(new) @classmethod def _parse_cf_attrs( cls, kwds: Dict[str, Any] ) -> Union[List[Dict[str, str]], List[Dict[str, Union[str, Callable]]]]: """CF-compliant metadata attributes for all output variables.""" # Get number of outputs n_outs = ( len(kwds["var_name"]) if isinstance(kwds["var_name"], (list, tuple)) else 1 ) # Populate cf_attrs from attribute set during class creation and __new__ cf_attrs = [{} for i in range(n_outs)] for name in cls._cf_names: values = kwds.get(name, getattr(cls, name)) if not isinstance(values, (list, tuple)): values = [values] * n_outs elif len(values) != n_outs: raise ValueError( f"Attribute {name} has {len(values)} elements but " f"should have {n_outs} according to passed var_name." ) for attrs, value in zip(cf_attrs, values): if value: attrs[name] = value return cf_attrs
[docs] @classmethod def from_dict( cls, data: dict, identifier: str, module: Optional[str] = None, ): """Create an indicator subclass and instance from a dictionary of parameters. Parameters ---------- data: dict The exact structure of this dictionary is detailed in the submodule documentation. identifier : str The name of the subclass and internal indicator name. module : str The module name of the indicator. This is meant to be used only if the indicator is part of a dynamically generated submodule, to override the module of the base class. """ # Make cell methods. YAML will generate a list-of-dict structure, put it back in a space-divided string if data.get("output", {}).get("cell_methods") is not None: cell_methods = parse_cell_methods(data["output"]["cell_methods"]) else: cell_methods = None params = {} if "input" in data: # Override input metadata input_units = {} for varname, name in data["input"].items(): # Indicator's new will put the name of the variable as its default, # we override this with the real variable name. # Also take the canonical units and description from the yaml of official variables. # Description overrides the one parsed from the generic compute docstring # Canonical units go into the declare_units wrapper. params[varname] = { "default": name, "description": VARIABLES[name]["description"], } input_units[varname] = VARIABLES[name]["canonical_units"] else: input_units = None metadata_placeholders = {} if "index_function" in data: # Generate compute function # data.index_function.name refers to a function in xclim.indices.generic # or xclim.indices (in this order of priority). It can also directly be a function. # data.index_function.parameters is a list of injected arguments. funcname = data["index_function"].get("name") if funcname is None: # No index function given, reuse the one from the base class. compute = cls.compute elif callable(funcname): compute = funcname else: compute = getattr( indices.generic, funcname, getattr(indices, funcname, None) # noqa ) if compute is None: raise ImportError( f"Indice function {funcname} not found in xclim.indices or xclim.indices.generic." ) injected_params = {} # In clix-meta, when there are no parameters, the key is still there with a None value. for name, param in (data["index_function"].get("parameters") or {}).items(): if not isinstance(param, dict): # Simplest case for injecting, passing a value directly. value = param # Handle clix-meta cases elif param.get("kind") == "quantity" and isinstance( param["data"], (str, int, float) ): # A string with units, but not a placeholder (where data is a dict) value = f"{param['data']} {param['units']}" elif param.get("kind") in ["reducer", "operator"]: # clix-meta defined kinds :value is stored in a field of the same name as the kind. value = param[param["kind"]] else: # All other xclim-defined kinds in "data" value = param["data"] if isinstance(value, dict): # User-chosen parameter. placeholder. # It should be a string, this is a bug from clix-meta. value = list(value.keys())[0] # Get default from parent class if possible. default = ( getattr(cls, "parameters", {}) .get(name, {}) .get("default", None) ) params[name] = { "default": param.get("default", default), "description": param.get( "description", param.get("standard_name", name) ), } if "units" in param: params[name]["units"] = param["units"] input_units = input_units or {} input_units[name] = param["units"] # We will need to replace placeholders in metadata strings (only for clix-meta indicators) if value != name: metadata_placeholders["{" + value + "}"] = "{" + name + "}" else: # Injected parameter injected_params[name] = value if input_units is not None: compute = declare_units(**input_units)(compute) compute = wrapped_partial(compute, **injected_params) else: compute = None # Allowed resampling frequencies allowed_periods = None if "period" in data: if isinstance(data["period"], str): deffreq = data["period"] elif "default" in data["period"]: # old-clix-meta version where multiple allowed values can be listed. # Kept in xclim. deffreq = data["period"]["default"] else: # clix-meta when a special season specification exists : xclim doesn't support it. deffreq = list(data["period"].keys())[0] params["freq"] = {"default": FREQ_NAMES[deffreq][1]} if "allowed" in data["period"]: allowed_periods = [] for period_name in data["period"]["allowed"]: allowed_periods.append(FREQ_NAMES[period_name][0]) kwargs = dict( # General identifier=identifier, module=module, realm=data.get("realm"), keywords=data.get("keywords"), references=data.get("references", data.get("reference")), notes=data.get("notes"), # Indicator-specific metadata title=data.get("title"), abstract=data.get("abstract"), # Output meta var_name=data.get("output", {}).get("var_name", identifier), standard_name=data.get("output", {}).get("standard_name"), long_name=data.get("output", {}).get("long_name"), description=data.get("output", {}).get("description"), comment=data.get("output", {}).get("comment"), units=data.get("output", {}).get("units"), cell_methods=cell_methods, # Input data, override defaults given in generic compute's signature. parameters=params or None, # None if an empty dict compute=compute, # Checks allowed_periods=allowed_periods, ) for cf_name in cls._cf_names: if isinstance(kwargs[cf_name], str): for old, new in metadata_placeholders.items(): kwargs[cf_name] = kwargs[cf_name].replace(old, new) # Remove kwargs passed as "None", they will be taken from the base class instead. # For most parameters it would be ok to pass a None anyway (we figure that out in __new__), # but some would not like that. return cls(**{k: v for k, v in kwargs.items() if v is not None})
def __init__(self, **kwds): """Run checks and organizes the metadata.""" # keywords of kwds that are class attributes have already been set in __new__ self._check_identifier(self.identifier) if self.missing == "from_context" and self.missing_options is not None: raise ValueError( "Cannot set `missing_options` with `missing` method being from context." ) # Validate hard-coded missing options kls = MISSING_METHODS[self.missing] self._missing = kls.execute if self.missing_options: kls.validate(**self.missing_options) # Validation is done : register the instance. super().__init__() # Update call signature self.__call__ = wraps(self._indcompute)(self.__call__) def __call__(self, *args, **kwds): """Call function of Indicator class.""" # For convenience n_outs = len(self.cf_attrs) # Put the variables in `das`, parse them according to the annotations # das : OrderedDict of variables (required + non-None optionals) # params : OrderedDict of parameters INCLUDING unpacked kwargs # all_params: OrderedDict of parameters with PACKED kwargs <- this is needed by _update_attrs and _mask because of `indexer`. # AND includes injected arguments <- this is needed by update_attrs and missing (when "freq" is injected) das, params, all_params = self._parse_variables_from_call(args, kwds) # Metadata attributes from templates var_id = None var_attrs = [] for attrs in self.cf_attrs: if n_outs > 1: var_id = attrs["var_name"] var_attrs.append( self._update_attrs( all_params.copy(), das, attrs, names=self._cf_names, var_id=var_id ) ) # Pre-computation validation checks on DataArray arguments self._bind_call(self.datacheck, **das) self._bind_call(self.cfcheck, **das) # Check if the period is allowed: if ( self.allowed_periods is not None and "freq" in all_params and parse_offset(all_params["freq"])[1] not in self.allowed_periods ): raise ValueError( f"Resampling frequency {all_params['freq']} is not allowed for indicator " f"{self.identifier} (needs something equivalent to one of {self.allowed_periods})." ) # Compute the indicator values, ignoring NaNs and missing values. outs = self.compute(**das, **params) if isinstance(outs, DataArray): outs = [outs] if len(outs) != n_outs: raise ValueError( f"Indicator {self.identifier} was wrongly defined. Expected {n_outs} outputs, got {len(outs)}." ) # Convert to output units outs = [ convert_units_to(out, attrs.get("units", ""), self.context) for out, attrs in zip(outs, var_attrs) ] # Update variable attributes for out, attrs in zip(outs, var_attrs): var_name = attrs.pop("var_name") out.attrs.update(attrs) out.name = var_name if self.missing != "skip": # Mask results that do not meet criteria defined by the `missing` method. # This means all outputs must have the same dimensions as the broadcasted inputs (excluding time) mask = self._mask(*das.values(), **all_params) outs = [out.where(~mask) for out in outs] # Return a single DataArray in case of single output, otherwise a tuple if n_outs == 1: return outs[0] return tuple(outs) def _assign_named_args(self, ba): """Assign inputs passed as strings from ds.""" ds = ba.arguments.pop("ds") for name, param in self._sig.parameters.items(): if ( self.parameters[name]["kind"] in ( InputKind.VARIABLE, InputKind.OPTIONAL_VARIABLE, ) and isinstance(ba.arguments[name], str) ): if ds is not None: try: ba.arguments[name] = ds[ba.arguments[name]] except KeyError: raise MissingVariableError( f"For input '{name}', variable '{ba.arguments[name]}' was not found in the input dataset." ) else: raise ValueError( f"Passing variable names as string requires giving the `ds` dataset (got {name}='{ba.arguments[name]}')" ) def _parse_variables_from_call(self, args, kwds): """Extract variable and optional variables from call arguments.""" # Bind call arguments to `compute` arguments and set defaults. ba = self._sig.bind(*args, **kwds) ba.apply_defaults() # Assign inputs passed as strings from ds. self._assign_named_args(ba) das = OrderedDict() for name, param in self.parameters.items(): kind = param["kind"] # If a variable pop the arg if kind in (InputKind.VARIABLE, InputKind.OPTIONAL_VARIABLE): data = ba.arguments.pop(name) # If a non-optional variable OR None, store the arg if kind == InputKind.VARIABLE or data is not None: das[name] = data # Remove **kwargs from bind object and put all those params in "kwargs" to be passed to compute. params = ba.arguments.copy() for param in self._sig.parameters.values(): if param.kind == param.VAR_KEYWORD: kwargs = params.pop(param.name) params.update(**kwargs) # Add injected kwargs to the all_params all_params = ba.arguments all_params.update(getattr(self._indcompute, "_injected", {})) return das, params, all_params def _bind_call(self, func, **das): """Call function using `__call__` `DataArray` arguments. This will try to bind keyword arguments to `func` arguments. If this fails, `func` is called with positional arguments only. Notes ----- This method is used to support two main use cases. In use case #1, we have two compute functions with arguments in a different order: `func1(tasmin, tasmax)` and `func2(tasmax, tasmin)` In use case #2, we have two compute functions with arguments that have different names: `generic_func(da)` and `custom_func(tas)` For each case, we want to define a single `cfcheck` and `datacheck` methods that will work with both compute functions. Passing a dictionary of arguments will solve #1, but not #2. """ # First try to bind arguments to function. try: ba = signature(func).bind(**das) except TypeError: # If this fails, simply call the function using positional arguments return func(*das.values()) else: # Call the func using bound arguments return func(*ba.args, **ba.kwargs) @classmethod def _get_translated_metadata( cls, locale, var_id=None, names=None, append_locale_name=True ): """Get raw translated metadata for the curent indicator and a given locale. All available translated metadata from the current indicator and those it is based on are merged, with highest priority to the current one. """ var_id = var_id or "" if var_id: var_id = "." + var_id family_tree = [] cl = cls while hasattr(cl, "_registry_id"): family_tree.append(cl._registry_id + var_id) # The indicator mechanism always has single inheritance. cl = cl.__bases__[0] return get_local_attrs( family_tree, locale, names=names, append_locale_name=append_locale_name, ) @classmethod def _update_attrs(cls, args, das, attrs, var_id=None, names=None): """Format attributes with the run-time values of `compute` call parameters. Cell methods and history attributes are updated, adding to existing values. The language of the string is taken from the `OPTIONS` configuration dictionary. Parameters ---------- args: Mapping[str, Any] Keyword arguments of the `compute` call. das: Mapping[str, DataArray] Input arrays. attrs : Mapping[str, str] The attributes to format and update. var_id : str The identifier to use when requesting the attributes translations. Defaults to the class name (for the translations) or the `identifier` field of the class (for the history attribute). If given, the identifier will be converted to uppercase to get the translation attributes. This is meant for multi-outputs indicators. names : Sequence[str] List of attribute names for which to get a translation. Returns ------- dict Attributes with {} expressions replaced by call argument values. With updated `cell_methods` and `history`. `cell_methods` is not added is `names` is given and those not contain `cell_methods`. """ out = cls._format(attrs, args) for locale in OPTIONS[METADATA_LOCALES]: out.update( cls._format( cls._get_translated_metadata( locale, var_id=var_id, names=names or list(attrs.keys()) ), args=args, formatter=get_local_formatter(locale), ) ) # Get history and cell method attributes from source data attrs = defaultdict(str) if names is None or "cell_methods" in names: attrs["cell_methods"] = merge_attributes( "cell_methods", new_line=" ", missing_str=None, **das ) if "cell_methods" in out: attrs["cell_methods"] += " " + out.pop("cell_methods") # Use of OrderedDict to ensure inputs (das) get listed before parameters (args). # In the history attr, call signature will be all keywords # and might be in a different order than the real function (but order doesn't really matter with keywords). kwargs = OrderedDict(**das) kwargs.update(**args) attrs["history"] = update_history( gen_call_string(cls._registry_id, **kwargs), new_name=out.get("var_name"), **das, ) attrs.update(out) return attrs @staticmethod def _check_identifier(identifier: str) -> None: """Verify that the identifier is a proper slug.""" if not re.match(r"^[-\w]+$", identifier): warnings.warn( "The identifier contains non-alphanumeric characters. It could make life " "difficult for downstream software reusing this class.", UserWarning, )
[docs] @classmethod def translate_attrs( cls, locale: Union[str, Sequence[str]], fill_missing: bool = True ): """Return a dictionary of unformated translated translatable attributes. Translatable attributes are defined in :py:const:`xclim.core.locales.TRANSLATABLE_ATTRS`. Parameters ---------- locale : Union[str, Sequence[str]] The POSIX name of the locale or a tuple of a locale name and a path to a json file defining the translations. See `xclim.locale` for details. fill_missing : bool If True (default fill the missing attributes by their english values. """ def _translate(var_attrs, names, var_id=None): attrs = cls._get_translated_metadata( locale, var_id=var_id, names=names, append_locale_name=False, ) if fill_missing: for name in names: if name not in attrs and var_attrs.get(name): attrs[name] = var_attrs.get(name) return attrs # Translate global attrs attrs = _translate( cls.__dict__, # Translate only translatable attrs that are not variable attrs set(TRANSLATABLE_ATTRS).difference(set(cls._cf_names)), ) # Translate variable attrs attrs["outputs"] = [] var_id = None for var_attrs in cls.cf_attrs: # Translate for each variable if len(cls.cf_attrs) > 1: var_id = var_attrs["var_name"] attrs["outputs"].append( _translate( var_attrs, set(TRANSLATABLE_ATTRS).intersection(cls._cf_names), var_id=var_id, ) ) return attrs
[docs] def json(self, args=None): """Return a serializable dictionary representation of the class. Parameters ---------- args : mapping, optional Arguments as passed to the call method of the indicator. If not given, the default arguments will be used when formatting the attributes. Notes ----- This is meant to be used by a third-party library wanting to wrap this class into another interface. """ names = ["identifier", "title", "abstract", "keywords"] out = {key: getattr(self, key) for key in names} out = self._format(out, args) # Format attributes out["outputs"] = [self._format(attrs, args) for attrs in self.cf_attrs] out["notes"] = self.notes # We need to deepcopy, otherwise empty defaults get overwritten! # All those tweaks are to ensure proper serialization of the returned dictionary. out["parameters"] = deepcopy(self.parameters) for param in out["parameters"].values(): if param["default"] is _empty: param.pop("default") param["kind"] = param["kind"].value # Get the int. if "choices" in param: # A set is stored, convert to list param["choices"] = list(param["choices"]) return out
@classmethod def _format( cls, attrs: dict, args: dict = None, formatter: AttrFormatter = default_formatter, ): """Format attributes including {} tags with arguments. Parameters ---------- attrs: dict Attributes containing tags to replace with arguments' values. args : dict, optional Function call arguments. If not given, the default arguments will be used when formatting the attributes. formatter : AttrFormatter """ # Use defaults if args is None: args = {k: v["default"] for k, v in cls.parameters.items()} args.update(getattr(cls._indcompute, "_injected", {})) out = {} for key, val in attrs.items(): mba = {"indexer": "annual"} # Add formatting {} around values to be able to replace them with _attrs_mapping using format. for k, v in args.items(): if isinstance(v, dict): if v: dk, dv = v.copy().popitem() if dk == "month": dv = "m{}".format(dv) mba[k] = dv elif isinstance(v, units.Quantity): mba[k] = "{:g~P}".format(v) elif isinstance(v, (int, float)): mba[k] = "{:g}".format(v) else: mba[k] = v if callable(val): val = val(**mba) out[key] = formatter.format(val, **mba) if key in cls._text_fields: out[key] = out[key].strip().capitalize() return out def _default_freq(self, **indexer): """Return default frequency.""" if self.freq in ["D", "H"]: return indices.generic.default_freq(**indexer) return None def _mask(self, *args, **kwds): """Return whether mask for output values, based on the output of the `missing` method.""" from functools import reduce indexer = kwds.get("indexer") or {} freq = kwds.get("freq") if "freq" in kwds else self._default_freq(**indexer) options = self.missing_options or OPTIONS[MISSING_OPTIONS].get(self.missing, {}) # We flag periods according to the missing method. skip variables without a time coordinate. miss = ( self._missing(da, freq, self.freq, options, indexer) for da in args if "time" in da.coords ) return reduce(np.logical_or, miss) # The following static methods are meant to be replaced to define custom indicators.
[docs] @staticmethod def compute(*args, **kwds): """Compute the indicator. This would typically be a function from `xclim.indices`. """ raise NotImplementedError
[docs] @staticmethod def cfcheck(**das): """Compare metadata attributes to CF-Convention standards. Default cfchecks use the specifications in `xclim.core.utils.VARIABLES`, assuming the indicator's inputs are using the CMIP6/xclim variable names correctly. Variables absent from these default specs are silently ignored. When subclassing this method, use functions decorated using `xclim.core.options.cfcheck`. """ for varname, vardata in das.items(): try: cfcheck_from_name(varname, vardata) except KeyError: # Silently ignore unknown variables. pass
[docs] @staticmethod def datacheck(**das): """Verify that input data is valid. When subclassing this method, use functions decorated using `xclim.core.options.datacheck`. For example, checks could include: - assert temporal frequency is daily - assert no precipitation is negative - assert no temperature has the same value 5 days in a row """ pass
[docs]class Daily(Indicator): """Indicator defined for inputs at daily frequency.""" freq = "D"
[docs] @staticmethod def datacheck(**das): # noqa for key, da in das.items(): if "time" in da.coords and da.time.ndim == 1 and len(da.time) > 3: datachecks.check_daily(da)
[docs]class Hourly(Indicator): """Indicator defined for inputs at strict hourly frequency, meaning 3-hourly inputs would raise an error.""" freq = "H"
[docs] @staticmethod def datacheck(**das): # noqa for key, da in das.items(): datachecks.check_freq(da, "H")
def _parse_indice(indice: Callable, passed=None, **new_kwargs): """Parse an indice function and return corresponding elements needed for constructing an indicator. Parameters ---------- indice : Callable A indice function, written according to xclim's guidelines. new_kwargs : Mapping from name to dicts containing the necessary info for injecting new keyword-only arguments into the indice_wrapper function. The meta dict can include (all optional): `default`, `description`, `annotation`. Returns ------- indice_wrapper : callable A function with a new signature including the injected args in new_kwargs. docmeta : Mapping[str, str] A dictionary of the metadata attributes parsed in the docstring. params : Mapping[str, Mapping[str, Any]] A dictionary of metadata for each input parameter of the indice. The metadata dictionaries include the following entries: "default", "description", "kind" and, optionally, "choices" and "units". "kind" is one of the constants in :py:class:`xclim.core.utils.InputKind`. """ # Base signature sig = signature(indice) passed = passed or {} # Update def _upd_param(param): # Required DataArray arguments receive their own name as new default # + the Union[str, DataArray] annotation if param.kind in [param.VAR_KEYWORD, param.VAR_POSITIONAL]: return param xckind = infer_kind_from_parameter(param) default = passed.get(param.name, {}).get("default", param.default) if xckind == InputKind.OPTIONAL_VARIABLE and ( default is _empty or isinstance(default, str) ): # Was wrapped with suggested={param: _empty} OR somehow a variable name was injected (ex: through yaml) # It becomes a non-optional variable xckind = InputKind.VARIABLE if default is _empty: if xckind == InputKind.VARIABLE: default = param.name else: # Parameters with no default receive None # Because we can't have no-default args _after_ default args and we just set the default on the variables (which are the first args) default = None # Python dont need no switch case annots = { InputKind.VARIABLE: Union[str, DataArray], InputKind.OPTIONAL_VARIABLE: Optional[Union[str, DataArray]], } annot = annots.get(xckind, param.annotation) return Parameter( param.name, # We keep the kind, except we replace POSITIONAL_ONLY by POSITONAL_OR_KEYWORD max(param.kind, 1), default=default, annotation=annot, ) # Parse all parameters, replacing annotations and default where needed and possible. new_params = list(map(_upd_param, sig.parameters.values())) # Injection for name, meta in new_kwargs.items(): # ds argunent param = Parameter( name, Parameter.KEYWORD_ONLY, default=meta.get("default"), annotation=meta.get("annotation"), ) if new_params[-1].kind == Parameter.VAR_KEYWORD: new_params.insert(-1, param) else: new_params.append(param) # Create new compute function to be wrapped in __call__ indice_wrapper = copy_function(indice) indice_wrapper.__signature__ = new_sig = sig.replace(parameters=new_params) indice_wrapper.__doc__ = indice.__doc__ # Docstring parsing parsed = parse_doc(indice.__doc__) # Extract params and pop those not in the signature. params = parsed.pop("parameters", {}) for dropped in set(params.keys()) - set(new_sig.parameters.keys()): params.pop(dropped) if hasattr(indice, "in_units"): # Try to put units for var, ustr in indice.in_units.items(): if var in params: params[var]["units"] = ustr # Fill default values and annotation in parameter doc for name, param in new_sig.parameters.items(): if name in new_kwargs and "description" in new_kwargs[name]: params[name] = {"description": new_kwargs[name]["description"]} param_doc = params.setdefault(name, {"description": ""}) param_doc["default"] = param.default param_doc["kind"] = infer_kind_from_parameter(param, "units" in param_doc) param_doc.update(passed.get(name, {})) return indice_wrapper, parsed, params def add_iter_indicators(module): if not hasattr(module, "iter_indicators"): def iter_indicators(): for indname, ind in module.__dict__.items(): if isinstance(ind, Indicator): yield indname, ind iter_indicators.__doc__ = f"Iterated over the (name, indicator) pairs in the {module.__name__} indicator module." module.__dict__["iter_indicators"] = iter_indicators
[docs]def build_indicator_module( name: str, objs: Mapping[str, Indicator], doc: Optional[str] = None, ) -> ModuleType: """Create or update a module from imported objects. The module is inserted as a submodule of `xclim.indicators`. Parameters ---------- name : str New module name. If it already exists, the module is extended with the passed objects, overwriting those with same names. objs : dict Mapping of the indicators to put in the new module. Keyed by the name they will take in that module. doc : str Docstring of the new module. Defaults to a simple header. Invalid if the module already exists. Returns ------- ModuleType A indicator module built from a mapping of Indicators. """ from xclim import indicators if hasattr(indicators, name): if doc is not None: warnings.warn( "Passed docstring ignored when extending existing module.", stacklevel=1 ) out = getattr(indicators, name) else: doc = doc or f"{name.capitalize()} indicators\n" + "=" * (len(name) + 11) try: out = ModuleType(name, doc) except TypeError as err: raise TypeError(f"Module '{name}' is not properly formatted") from err indicators.__dict__[name] = out out.__dict__.update(objs) add_iter_indicators(out) return out
[docs]def build_indicator_module_from_yaml( filename: PathLike, name: Optional[str] = None, base: Type[Indicator] = Daily, doc: Optional[str] = None, indices: Optional[Union[Mapping[str, Callable], ModuleType]] = None, translations: Optional[Mapping[str, dict]] = None, mode: str = "raise", realm: Optional[str] = None, keywords: Optional[str] = None, references: Optional[str] = None, notes: Optional[str] = None, encoding: str = "UTF8", ) -> ModuleType: """Build or extend an indicator module from a YAML file. The module is inserted as a submodule of `xclim.indicators`. When given only a base filename (no 'yml' extesion), this tries to find custom indices in a module of the same name (*.py) and translations in json files (*.<lang>.json), see Notes. Parameters ---------- filename: PathLike Path to a YAML file or to the stem of all module files. See Notes for behaviour when passing a basename only. name: str, optional The name of the new or existing module, defaults to the basename of the file. (e.g: `atmos.yml` -> `atmos`) base: Indicator subclass The Indicator subclass from which the new indicators are based. Superseeded by the class given in the yaml file or in individual indicator definitions (see submodule's doc). doc : str, optional The docstring of the new submodule. Defaults to a very minimal header with the submodule's name. indices : Mapping of callables or module, optional A mapping or module of indice functions. When creating the indicator, the name in the `index_function` field is first sought here, then in xclim.indices.generic and finally in xclim.indices. translations : Mapping of dicts, optional Translated metadata for the new indicators. Keys of the mapping must be 2-char language tags. See Notes and :ref:`Internationalization` for more details. mode: {'raise', 'warn', 'ignore'} How to deal with broken indice definitions. realm: str, optional keywords: str, optional Comma separated keywords. references: str, optional Source citations. notes: str, optional Other indicator attributes that would apply to all indicators in this module. Values given here are overridden by the ones given in individual definition, but they override the ones given at top-level in the YAMl file. encoding: str The encoding used to open the `.yaml` and `.json` files. It defaults to UTF-8, overriding python's mechanism which is machine dependent. Returns ------- ModuleType A submodule of `xclim.indicators`. Notes ----- When the given `filename` has no suffix (usually '.yaml' or '.yml'), the function will try to load custom indice definitions from a file with the same name but with a `.py` extension. Similarly, it will try to load translations in `*.<lang>.json` files, where `<lang>` is the IETF language tag. For example. a set of custom indicators could be fully described by the following files: - `example.yml` : defining the indicator's metadata. - `example.py` : defining a few indice functions. - `example.fr.json` : French translations - `example.tlh.json` : Klingon translations. See also -------- The doc of :py:mod:`xclim.core.indicator` and of :py:func:`build_module`. """ filepath = Path(filename) if not filepath.suffix: # A stem was passed, try to load files ymlpath = filepath.with_suffix(".yml") else: ymlpath = filepath # Read YAML file with ymlpath.open(encoding=encoding) as f: yml = safe_load(f) # Load values from top-level in yml. # Priority of arguments differ. module_name = name or yml.get("module", filepath.stem) default_base = registry.get(yml.get("base"), base) doc = doc or yml.get("doc") # When given as a stem, we try to load indices and translations if not filepath.suffix: if indices is None: try: indices = load_module(filepath.with_suffix(".py")) except ModuleNotFoundError: pass if translations is None: translations = {} for locfile in filepath.parent.glob(filepath.stem + ".*.json"): locale = locfile.suffixes[0][1:] translations[locale] = read_locale_file( locfile, module=module_name, encoding=encoding ) # Module-wide default values for some attributes defkwargs = { # Other default argument, only given in case the indicator definition does not give them. "realm": realm or yml.get("realm"), "keywords": keywords or yml.get("keywords"), "references": references or yml.get("references"), "notes": notes or yml.get("notes"), } # Parse the indicators: mapping = {} for identifier, data in yml["indices"].items(): try: clean_id, data = _cleanup_indicator_dict( identifier, data, indices, defkwargs ) if "base" in data: if data["base"].startswith("."): # A point means the base has been declared above. base = registry[module_name + data["base"].upper()] else: base = registry[data["base"].upper()] else: base = default_base mapping[clean_id] = base.from_dict( data, identifier=clean_id, module=module_name ) except Exception as err: raise_warn_or_log( err, mode, msg=f"Constructing {identifier} failed with {err!r}" ) # Construct module mod = build_indicator_module(module_name, objs=mapping, doc=doc) # If there are translations, load them if translations: for locale, locdict in translations.items(): load_locale(locdict, locale) return mod
def _cleanup_indicator_dict(identifier, data, indices, defaults): # Assign indice as func. # If data.index_function.name refers to a function in `indices`, replace that field by the function. indice_name = data.get("index_function", {}).get("name", None) if indice_name is not None and indices is not None: indice_func = getattr(indices, indice_name, None) if indice_func is None and hasattr(indices, "__getitem__"): try: indice_func = indices[indice_name] except KeyError: pass if indice_func is not None: data["index_function"]["name"] = indice_func # clix-meta has illegal characters in the identifiers. clean_id = identifier.replace("{", "").replace("}", "") # Workaround for clix-meta (we name it references, they name it reference) data.setdefault("references", data.get("reference")) for k, v in defaults.items(): data.setdefault(k, v) return clean_id, data