Health Checks

The Indicator class performs a number of sanity checks on inputs to make sure valid data is fed to indices computations (cfchecks for checks on the metadata and datachecks for checks on the coordinates). Output values are properly masked in case input values are missing or invalid (missing). Finally, a user can use functions of dataflags to explore potential issues with its data (extreme values, suspicious runs, etc).

CF-Convention Checking

Utilities designed to verify the compliance of metadata with the CF-Convention.

xclim.core.cfchecks.cfcheck_from_name(varname, vardata, attrs=None)[source]

Perform cfchecks on a DataArray using specifications from xclim’s default variables.

xclim.core.cfchecks.check_valid(var, key, expected)[source]

Check that a variable’s attribute has one of the expected values. Raise a ValidationError otherwise.

Data Checks

Utilities designed to check the validity of data inputs.

xclim.core.datachecks.check_common_time(inputs)[source]

Raise an error if the list of inputs doesn’t have a single common frequency.

Raises:

ValidationError

  • if the frequency of any input can’t be inferred - if inputs have different frequencies - if inputs have a daily or hourly frequency, but they are not given at the same time of day.

Parameters:

inputs (Sequence of xr.DataArray) – Input arrays.

xclim.core.datachecks.check_daily(var)[source]

Raise an error if not series has a frequency other that daily, or is not monotonically increasing.

Notes

This does not check for gaps in series.

xclim.core.datachecks.check_freq(var, freq, strict=True)[source]

Raise an error if not series has not the expected temporal frequency or is not monotonically increasing.

Parameters:
  • var (xr.DataArray) – Input array.

  • freq (str or sequence of str) – The expected temporal frequencies, using Pandas frequency terminology ({‘Y’, ‘M’, ‘D’, ‘h’, ‘min’, ‘s’, ‘ms’, ‘us’}) and multiples thereof. To test strictly for ‘W’, pass ‘7D’ with strict=True. This ignores the start/end flag and the anchor (ex: ‘YS-JUL’ will validate against ‘Y’).

  • strict (bool) – Whether multiples of the frequencies are considered invalid or not. With strict set to False, a ‘3h’ series will not raise an error if freq is set to ‘h’.

Raises:

ValidationError

  • If the frequency of var is not inferrable. - If the frequency of var does not match the requested freq.

Missing Values Identification

Indicators may use different criteria to determine whether a computed indicator value should be considered missing. In some cases, the presence of any missing value in the input time series should result in a missing indicator value for that period. In other cases, a minimum number of valid values or a percentage of missing values should be enforced. The World Meteorological Organisation (WMO) suggests criteria based on the number of consecutive and overall missing values per month.

xclim has a registry of missing value detection algorithms that can be extended by users to customize the behavior of indicators. Once registered, algorithms can be used within indicators by setting the missing attribute of an Indicator subclass. By default, xclim registers the following algorithms:

  • any: A result is missing if any input value is missing.

  • at_least_n: A result is missing if less than a given number of valid values are present.

  • pct: A result is missing if more than a given fraction of values are missing.

  • wmo: A result is missing if 11 days are missing, or 5 consecutive values are missing in a month.

  • skip: Skip missing value detection.

  • from_context: Look-up the missing value algorithm from options settings. See xclim.set_options().

To define another missing value algorithm, subclass MissingBase and decorate it with xclim.core.options.register_missing_method().

Note

Corresponding stand-alone functions are also exposed to run the same missing value checks independent from indicator calculations.

xclim.core.missing.missing_any(da, freq, src_timestep=None, **indexer)[source]

Return whether there are missing days in the array.

Variables:
  • da (xr.DataArray) – Input array.

  • freq (str) – Resampling frequency.

  • src_timestep ({"D", "h", "M"}) – Expected input frequency.

  • indexer ({dim: indexer, }, optional) – Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter values, month=1 to select January, or month=[6,7,8] to select summer months. If not indexer is given, all values are considered.

Returns:

xr.DataArray – A boolean array set to True if period has missing values.

xclim.core.missing.at_least_n_valid(da, freq, n=1, src_timestep=None, **indexer)[source]

Return whether there are at least a given number of valid values.

Parameters:
  • da (xr.DataArray) – Input array.

  • freq (str) – Resampling frequency.

  • n (int) – Minimum of valid values required.

  • src_timestep ({“D”, “h”}) – Expected input frequency.

  • indexer ({dim: indexer, }, optional) – Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter values, month=1 to select January, or month=[6,7,8] to select summer months. If not indexer is given, all values are considered.

Returns:

xr.DataArray – A boolean array set to True if period has missing values.

xclim.core.missing.missing_pct(da, freq, tolerance, src_timestep=None, **indexer)[source]

Return whether there are more missing days in the array than a given percentage.

Variables:
  • da (DataArray) – Input array.

  • freq (str) – Resampling frequency.

  • tolerance (float) – Fraction of missing values that are tolerated [0,1].

  • src_timestep ({"D", "h"}) – Expected input frequency.

  • indexer ({dim: indexer, }, optional) – Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter values, month=1 to select January, or month=[6,7,8] to select summer months. If not indexer is given, all values are considered.

Returns:

xr.DataArray – A boolean array set to True if period has missing values.

xclim.core.missing.missing_wmo(da, freq, nm=11, nc=5, src_timestep=None, **indexer)[source]

Return whether a series fails WMO criteria for missing days.

The World Meteorological Organisation recommends that where monthly means are computed from daily values, it should be considered missing if either of these two criteria are met:

– observations are missing for 11 or more days during the month; – observations are missing for a period of 5 or more consecutive days during the month.

Stricter criteria are sometimes used in practice, with a tolerance of 5 missing values or 3 consecutive missing values.

Variables:
  • da (DataArray) – Input array.

  • freq (str) – Resampling frequency.

  • nm (int) – Number of missing values per month that should not be exceeded.

  • nc (int) – Number of consecutive missing values per month that should not be exceeded.

  • src_timestep ({"D"}) – Expected input frequency. Only daily values are supported.

  • indexer ({dim: indexer, }, optional) – Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter values, month=1 to select January, or month=[6,7,8] to select summer months. If not indexer is given, all values are considered.

Returns:

xr.DataArray – A boolean array set to True if period has missing values.

Notes

If used at frequencies larger than a month, for example on an annual or seasonal basis, the function will return True if any month within a period is missing.

xclim.core.missing.missing_from_context(da, freq, src_timestep=None, **indexer)[source]

Return whether each element of the resampled da should be considered missing according to the currently set options in xclim.set_options.

See also

xclim.set_options, xclim.core.options.register_missing_method

Data Flags

Pseudo-indicators designed to analyse supplied variables for suspicious/erroneous indicator values.

exception xclim.core.dataflags.DataQualityException(flag_array, message='Data quality flags indicate suspicious values. Flags raised are:\\n  - ')[source]

Bases: Exception

Raised when any data evaluation checks are flagged as True.

Variables:
  • flag_array (xarray.Dataset) – Xarray.Dataset of Data Flags.

  • message (str) – Message prepended to the error messages.

flag_array: Dataset = None
xclim.core.dataflags.data_flags(da, ds=None, flags=None, dims='all', freq=None, raise_flags=False)[source]

Evaluate the supplied DataArray for a set of data flag checks.

Test triggers depend on variable name and availability of extra variables within Dataset for comparison. If called with raise_flags=True, will raise a DataQualityException with comments for each failed quality check.

Parameters:
  • da (xarray.DataArray) – The variable to check. Must have a name that is a valid CMIP6 variable name and appears in xclim.core.utils.VARIABLES.

  • ds (xarray.Dataset, optional) – An optional dataset with extra variables needed by some checks.

  • flags (dict, optional) – A dictionary where the keys are the name of the flags to check and the values are parameter dictionaries. The value can be None if there are no parameters to pass (i.e. default will be used). The default, None, means that the data flags list will be taken from xclim.core.utils.VARIABLES.

  • dims ({“all”, None} or str or a sequence of strings) – Dimensions upon which the aggregation should be performed. Default: “all”.

  • freq (str, optional) – Resampling frequency to have data_flags aggregated over periods. Defaults to None, which means the “time” axis is treated as any other dimension (see dims).

  • raise_flags (bool) – Raise exception if any of the quality assessment flags are raised. Default: False.

Return type:

Dataset

Returns:

xarray.Dataset

Examples

To evaluate all applicable data flags for a given variable:

>>> from xclim.core.dataflags import data_flags
>>> ds = xr.open_dataset(path_to_pr_file)
>>> flagged = data_flags(ds.pr, ds)
>>> # The next example evaluates only one data flag, passing specific parameters. It also aggregates the flags
>>> # yearly over the "time" dimension only, such that a True means there is a bad data point for that year
>>> # at that location.
>>> flagged = data_flags(
...     ds.pr,
...     ds,
...     flags={"very_large_precipitation_events": {"thresh": "250 mm d-1"}},
...     dims=None,
...     freq="YS",
... )
xclim.core.dataflags.ecad_compliant(ds, dims='all', raise_flags=False, append=True)[source]

Run ECAD compliance tests.

Assert file adheres to ECAD-based quality assurance checks.

Parameters:
  • ds (xarray.Dataset) – Dataset containing variables to be examined.

  • dims ({“all”, None} or str or a sequence of strings) – Dimensions upon which aggregation should be performed. Default: "all".

  • raise_flags (bool) – Raise exception if any of the quality assessment flags are raised, otherwise returns None. Default: False.

  • append (bool) – If True, returns the Dataset with the ecad_qc_flag array appended to data_vars. If False, returns the DataArray of the ecad_qc_flag variable.

Return type:

DataArray | Dataset | None

Returns:

xarray.DataArray or xarray.Dataset or None

xclim.core.dataflags.negative_accumulation_values(da)[source]

Check if variable values are negative for any given day.

Parameters:

da (xarray.DataArray)

Return type:

DataArray

Returns:

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import negative_accumulation_values
>>> ds = xr.open_dataset(path_to_pr_file)
>>> flagged = negative_accumulation_values(ds.pr)
xclim.core.dataflags.outside_n_standard_deviations_of_climatology(da, *, n, window=5)[source]

Check if any daily value is outside n standard deviations from the day of year mean.

Parameters:
  • da (xarray.DataArray) – The DataArray being examined.

  • n (int) – Number of standard deviations.

  • window (int) – Moving window used to determining climatological mean. Default: 5.

Return type:

DataArray

Returns:

xarray.DataArray, [bool]

Notes

A moving window of 5 days is suggested for tas data flag calculations according to ICCLIM data quality standards.

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import outside_n_standard_deviations_of_climatology
>>> ds = xr.open_dataset(path_to_tas_file)
>>> std_devs = 5
>>> average_over = 5
>>> flagged = outside_n_standard_deviations_of_climatology(
...     ds.tas, n=std_devs, window=average_over
... )

References

Project team ECA&D and KNMI [2013]

xclim.core.dataflags.percentage_values_outside_of_bounds(da)[source]

Check if variable values fall below 0% or rise above 100% for any given day.

Parameters:

da (xarray.DataArray)

Return type:

DataArray

Returns:

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

xclim.core.dataflags.register_methods(variable_name=None)[source]

Register a data flag functioné.

Argument can be the output variable name template. The template may use any of the stringable input arguments. If not given, the function name is used instead, which may create variable conflicts.

xclim.core.dataflags.tas_below_tasmin(tas, tasmin)[source]

Check if tas values are below tasmin values for any given day.

Parameters:
  • tas (xarray.DataArray)

  • tasmin (xarray.DataArray)

Return type:

DataArray

Returns:

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import tas_below_tasmin
>>> ds = xr.open_dataset(path_to_tas_file)
>>> flagged = tas_below_tasmin(ds.tas, ds.tasmin)
xclim.core.dataflags.tas_exceeds_tasmax(tas, tasmax)[source]

Check if tas values tasmax values for any given day.

Parameters:
  • tas (xarray.DataArray)

  • tasmax (xarray.DataArray)

Return type:

DataArray

Returns:

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import tas_exceeds_tasmax
>>> ds = xr.open_dataset(path_to_tas_file)
>>> flagged = tas_exceeds_tasmax(ds.tas, ds.tasmax)
xclim.core.dataflags.tasmax_below_tasmin(tasmax, tasmin)[source]

Check if tasmax values are below tasmin values for any given day.

Parameters:
  • tasmax (xarray.DataArray)

  • tasmin (xarray.DataArray)

Return type:

DataArray

Returns:

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import tasmax_below_tasmin
>>> ds = xr.open_dataset(path_to_tas_file)
>>> flagged = tasmax_below_tasmin(ds.tasmax, ds.tasmin)
xclim.core.dataflags.temperature_extremely_high(da, *, thresh='60 degC')[source]

Check if temperatures values exceed 60 degrees Celsius for any given day.

Parameters:
  • da (xarray.DataArray)

  • thresh (str)

Return type:

DataArray

Returns:

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import temperature_extremely_high
>>> ds = xr.open_dataset(path_to_tas_file)
>>> temperature = "60 degC"
>>> flagged = temperature_extremely_high(ds.tas, thresh=temperature)
xclim.core.dataflags.temperature_extremely_low(da, *, thresh='-90 degC')[source]

Check if temperatures values are below -90 degrees Celsius for any given day.

Parameters:
  • da (xarray.DataArray)

  • thresh (str)

Return type:

DataArray

Returns:

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import temperature_extremely_low
>>> ds = xr.open_dataset(path_to_tas_file)
>>> temperature = "-90 degC"
>>> flagged = temperature_extremely_low(ds.tas, thresh=temperature)
xclim.core.dataflags.values_op_thresh_repeating_for_n_or_more_days(da, *, n, thresh, op='==')[source]

Check if array values repeat at a given threshold for N or more days.

Parameters:
  • da (xarray.DataArray) – The DataArray being examined.

  • n (int) – Number of days needed to trigger flag.

  • thresh (str) – Repeating values to search for that will trigger flag.

  • op ({“>”, “gt”, “<”, “lt”, “>=”, “ge”, “<=”, “le”, “==”, “eq”, “!=”, “ne”}) – Operator used for comparison with thresh.

Return type:

DataArray

Returns:

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import values_op_thresh_repeating_for_n_or_more_days
>>> ds = xr.open_dataset(path_to_pr_file)
>>> units = "5 mm d-1"
>>> days = 5
>>> comparison = "eq"
>>> flagged = values_op_thresh_repeating_for_n_or_more_days(
...     ds.pr, n=days, thresh=units, op=comparison
... )
xclim.core.dataflags.values_repeating_for_n_or_more_days(da, *, n)[source]

Check if exact values are found to be repeating for at least 5 or more days.

Parameters:
  • da (xarray.DataArray) – The DataArray being examined.

  • n (int) – Number of days to trigger flag.

Return type:

DataArray

Returns:

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import values_repeating_for_n_or_more_days
>>> ds = xr.open_dataset(path_to_pr_file)
>>> flagged = values_repeating_for_n_or_more_days(ds.pr, n=5)
xclim.core.dataflags.very_large_precipitation_events(da, *, thresh='300 mm d-1')[source]

Check if precipitation values exceed 300 mm/day for any given day.

Parameters:
  • da (xarray.DataArray) – The DataArray being examined.

  • thresh (str) – Threshold to search array for that will trigger flag if any day exceeds value.

Return type:

DataArray

Returns:

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import very_large_precipitation_events
>>> ds = xr.open_dataset(path_to_pr_file)
>>> rate = "300 mm d-1"
>>> flagged = very_large_precipitation_events(ds.pr, thresh=rate)
xclim.core.dataflags.wind_values_outside_of_bounds(da, *, lower='0 m s-1', upper='46 m s-1')[source]

Check if variable values fall below 0% or rise above 100% for any given day.

Parameters:
  • da (xarray.DataArray) – The DataArray being examined.

  • lower (str) – The lower limit for wind speed.

  • upper (str) – The upper limit for wind speed.

Return type:

DataArray

Returns:

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import wind_values_outside_of_bounds
>>> ceiling, floor = "46 m s-1", "0 m s-1"
>>> flagged = wind_values_outside_of_bounds(
...     sfcWind_dataset, upper=ceiling, lower=floor
... )