Health checks

The Indicator class performs a number of sanity checks on inputs to make sure valid data is fed to indices computations (cfchecks for checks on the metadata and datachecks for checks on the coordinates). Output values are properly masked in case input values are missing or invalid (missing). Finally, a user can use functions of dataflags to explore potential issues with its data (extreme values, suspicious runs, etc).

CF-Convention checking

Utilities designed to verify the compliance of metadata with the CF-Convention.

xclim.core.cfchecks.cfcheck_from_name(varname, vardata)[source]: Perform cfchecks on a DataArray using specifications from xclim’s default variables.

xclim.core.cfchecks.check_valid(var, key: str, expected: Union[str, Sequence[str]])[source]: Check that a variable’s attribute has one of the expected values. Raise a ValidationError otherwise.

Data checks

Utilities designed to check the validity of data inputs.

xclim.core.datachecks.check_daily(var: xarray.DataArray)[source]

Raise an error if not series has a frequency other that daily, or is not monotonically increasing.

Note that this does not check for gaps in the series.

xclim.core.datachecks.check_freq(var: xarray.DataArray, freq: Union[str, Sequence[str]], strict: bool = True)[source]

Raise an error if not series has not the expected temporal frequency or is not monotonically increasing.

Parameters

var (xr.DataArray) – Input array.
freq (str or sequence of str) – The expected temporal frequencies, using Pandas frequency terminology ({‘A’, ‘M’, ‘D’, ‘H’, ‘T’, ‘S’, ‘L’, ‘U’} and multiples thereof). To test strictly for ‘W’, pass ‘7D’ with strict=True. This ignores the start flag and the anchor (ex: ‘AS-JUL’ will validate against ‘Y’).
strict (bool) – Whether multiples of the frequencies are considered invalid or not. With strict set to False, a ‘3H’ series will not raise an error if freq is set to ‘H’.

Missing values identification

Indicators may use different criteria to determine whether a computed indicator value should be considered missing. In some cases, the presence of any missing value in the input time series should result in a missing indicator value for that period. In other cases, a minimum number of valid values or a percentage of missing values should be enforced. The World Meteorological Organisation (WMO) suggests criteria based on the number of consecutive and overall missing values per month.

xclim has a registry of missing value detection algorithms that can be extended by users to customize the behavior of indicators. Once registered, algorithms can be used within indicators by setting the missing attribute of an Indicator subclass. By default, xclim registers the following algorithms:

any: A result is missing if any input value is missing.

at_least_n: A result is missing if less than a given number of valid values are present.

pct: A result is missing if more than a given fraction of values are missing.

wmo: A result is missing if 11 days are missing, or 5 consecutive values are missing in a month.

skip: Skip missing value detection.

from_context: Look-up the missing value algorithm from options settings. See xclim.set_options().

To define another missing value algorithm, subclass MissingBase and decorate it with xclim.core.options.register_missing_method().

Corresponding stand-alone functions are also exposed to run the same missing value checks independent from indicator calculations.

xclim.core.missing.missing_any(da, freq, src_timestep=None, **indexer)[source]

Return whether there are missing days in the array.

Parameters

da (DataArray) – Input array.
freq (str) – Resampling frequency.
src_timestep ({“D”, “H”, “M”}) – Expected input frequency.
indexer ({dim: indexer, }, optional) – Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter values, month=1 to select January, or month=[6,7,8] to select summer months. If not indexer is given, all values are considered.

Returns

DataArray – A boolean array set to True if period has missing values.

xclim.core.missing.at_least_n_valid(da, freq, n=1, src_timestep=None, **indexer)[source]

Return whether there are at least a given number of valid values.

Parameters

da (DataArray) – Input array.
freq (str) – Resampling frequency.
n (int) – Minimum of valid values required.
src_timestep ({“D”, “H”}) – Expected input frequency.
indexer ({dim: indexer, }, optional) – Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter values, month=1 to select January, or month=[6,7,8] to select summer months. If not indexer is given, all values are considered.

Returns

out (DataArray) – A boolean array set to True if period has missing values.

xclim.core.missing.missing_pct(da, freq, tolerance, src_timestep=None, **indexer)[source]

Return whether there are more missing days in the array than a given percentage.

Parameters

da (DataArray) – Input array.
freq (str) – Resampling frequency.
tolerance (float) – Fraction of missing values that are tolerated [0,1].
src_timestep ({“D”, “H”}) – Expected input frequency.
indexer ({dim: indexer, }, optional) – Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter values, month=1 to select January, or month=[6,7,8] to select summer months. If not indexer is given, all values are considered.

Returns

DataArray – A boolean array set to True if period has missing values.

xclim.core.missing.missing_wmo(da, freq, nm=11, nc=5, src_timestep=None, **indexer)[source]

Return whether a series fails WMO criteria for missing days.

The World Meteorological Organisation recommends that where monthly means are computed from daily values, it should be considered missing if either of these two criteria are met:

– observations are missing for 11 or more days during the month; – observations are missing for a period of 5 or more consecutive days during the month.

Stricter criteria are sometimes used in practice, with a tolerance of 5 missing values or 3 consecutive missing values.

Parameters

da (DataArray) – Input array.
freq (str) – Resampling frequency.
nm (int) – Number of missing values per month that should not be exceeded.
nc (int) – Number of consecutive missing values per month that should not be exceeded.
src_timestep ({“D”}) – Expected input frequency. Only daily values are supported.
indexer ({dim: indexer, }, optional) – Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter values, month=1 to select January, or month=[6,7,8] to select summer months. If not indexer is given, all values are considered.

Returns

DataArray – A boolean array set to True if period has missing values.

Notes

If used at frequencies larger than a month, for example on an annual or seasonal basis, the function will return True if any month within a period is missing.

xclim.core.missing.missing_from_context(da, freq, src_timestep=None, **indexer)[source]

Return whether each element of the resampled da should be considered missing according to the currently set options in xclim.set_options.

See xclim.set_options and xclim.core.options.register_missing_method.

Data flags

Pseudo-indicators designed to analyse supplied variables for suspicious/erroneous indicator values.

exception xclim.core.dataflags.DataQualityException(flag_array: xarray.Dataset, message='Data quality flags indicate suspicious values. Flags raised are:\n - ')[source]

Bases: Exception

Raised when any data evaluation checks are flagged as True.

Variables: Flags (data_flags -- Xarray.Dataset of Data) –

xclim.core.dataflags.data_flags(da: xarray.DataArray, ds: Optional[xarray.Dataset] = None, flags: Optional[dict] = None, dims: Union[None, str, Sequence[str]] = 'all', freq: Optional[str] = None, raise_flags: bool = False) → xarray.Dataset[source]

Evaluate the supplied DataArray for a set of data flag checks.

Test triggers depend on variable name and availability of extra variables within Dataset for comparison. If called with raise_flags=True, will raise a DataQualityException with comments for each failed quality check.

Parameters

da (xarray.DataArray) – The variable to check. Must have a name that is a valid CMIP6 variable name and appears in xclim.core.utils.VARIABLES.
ds (xarray.Dataset, optional) – An optional dataset with extra variables needed by some checks.
flags (dict, optional) – A dictionary where the keys are the name of the flags to check and the values are parameter dictionaries. The value can be None if there are no parameters to pass (i.e. default will be used). The default, None, means that the data flags list will be taken from xclim.core.utils.VARIABLES.
dims ({“all”, None} or str or a sequence of strings) – Dimenions upon which aggregation should be performed. Default: “all”.
freq (str, optional) – Resampling frequency to have data_flags aggregated over periods. Defaults to None, which means the “time” axis is treated as any other dimension (see dims).
raise_flags (bool) – Raise exception if any of the quality assessment flags are raised. Default: False.

Returns

xarray.Dataset

Examples

To evaluate all applicable data flags for a given variable:

>>> from xclim.core.dataflags import data_flags
>>> ds = xr.open_dataset(path_to_pr_file)
>>> flagged = data_flags(ds.pr, ds)

The next example evaluates only one data flag, passing specific parameters. It also aggregates the flags yearly over the “time” dimension only, such that a True means there is a bad data point for that year at that location.

>>> flagged = data_flags(
...     ds.pr,
...     ds,
...     flags={'very_large_precipitation_events': {'thresh': '250 mm d-1'}},
...     dims=None,
...     freq='YS'
... )

xclim.core.dataflags.ecad_compliant(ds: xarray.Dataset, dims: Union[None, str, Sequence[str]] = 'all', raise_flags: bool = False, append: bool = True) → Optional[Union[xarray.DataArray, xarray.Dataset]][source]

Run ECAD compliance tests.

Assert file adheres to ECAD-based quality assurance checks.

Parameters

ds (xarray.Dataset) – Dataset containing variables to be examined.
dims ({“all”, None} or str or a sequence of strings) – Dimensions upon which aggregation should be performed. Default: “all”.
raise_flags (bool) – Raise exception if any of the quality assessment flags are raised, otherwise returns None. Default: False.
append (bool) – If True, returns the Dataset with the ecad_qc_flag array appended to data_vars. If False, returns the DataArray of the ecad_qc_flag variable.

Returns

Union[xarray.DataArray, xarray.Dataset]

xclim.core.dataflags.negative_accumulation_values(da: xarray.DataArray) → xarray.DataArray[source]

Check if variable values are negative for any given day.

Parameters: da (xarray.DataArray)
Returns: xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import negative_accumulation_values
>>> ds = xr.open_dataset(path_to_pr_file)
>>> flagged = negative_accumulation_values(ds.pr)

xclim.core.dataflags.outside_n_standard_deviations_of_climatology(da: xarray.DataArray, *, n: int, window: int = 5) → xarray.DataArray[source]

Check if any daily value is outside n standard deviations from the day of year mean.

Parameters

da (xarray.DataArray) – The DataArray being examined.
n (int) – Number of standard deviations.
window (int) – Moving window used to determining climatological mean. Default: 5.

Returns

xarray.DataArray, [bool]

Notes

A moving window of 5 days is suggested for tas data flag calculations according to ICCLIM data quality standards.

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import outside_n_standard_deviations_of_climatology
>>> ds = xr.open_dataset(path_to_tas_file)
>>> std_devs = 5
>>> average_over = 5
>>> flagged = outside_n_standard_deviations_of_climatology(ds.tas, n=std_devs, window=average_over)

xclim.core.dataflags.percentage_values_outside_of_bounds(da: xarray.DataArray) → xarray.DataArray[source]

Check if variable values fall below 0% or rise above 100% for any given day.

Parameters: da (xarray.DataArray)
Returns: xarray.DataArray, [bool]

Examples

To gain access to the flag_array: >>> from xclim.core.dataflags import percentage_values_outside_of_bounds >>> ds = xr.open_dataset(path_to_huss_file) # doctest: +SKIP >>> flagged = percentage_values_outside_of_bounds(ds.huss) # doctest: +SKIP

xclim.core.dataflags.register_methods(func)[source]

xclim.core.dataflags.tas_below_tasmin(tas: xarray.DataArray, tasmin: xarray.DataArray) → xarray.DataArray[source]

Check if tas values are below tasmin values for any given day.

Parameters

tas (xarray.DataArray)
tasmin (xarray.DataArray)

Returns

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import tas_below_tasmin
>>> ds = xr.open_dataset(path_to_tas_file)
>>> flagged = tas_below_tasmin(ds.tas, ds.tasmin)

xclim.core.dataflags.tas_exceeds_tasmax(tas: xarray.DataArray, tasmax: xarray.DataArray) → xarray.DataArray[source]

Check if tas values tasmax values for any given day.

Parameters

tas (xarray.DataArray)
tasmax (xarray.DataArray)

Returns

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import tas_exceeds_tasmax
>>> ds = xr.open_dataset(path_to_tas_file)
>>> flagged = tas_exceeds_tasmax(ds.tas, ds.tasmax)

xclim.core.dataflags.tasmax_below_tasmin(tasmax: xarray.DataArray, tasmin: xarray.DataArray) → xarray.DataArray[source]

Check if tasmax values are below tasmin values for any given day.

Parameters

tasmax (xarray.DataArray)
tasmin (xarray.DataArray)

Returns

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import tasmax_below_tasmin
>>> ds = xr.open_dataset(path_to_tas_file)
>>> flagged = tasmax_below_tasmin(ds.tasmax, ds.tasmin)

xclim.core.dataflags.temperature_extremely_high(da: xarray.DataArray, *, thresh: str = '60 degC') → xarray.DataArray[source]

Check if temperatures values exceed 60 degrees Celsius for any given day.

Parameters

da (xarray.DataArray)
thresh (str)

Returns

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import temperature_extremely_high
>>> ds = xr.open_dataset(path_to_tas_file)
>>> temperature = "60 degC"
>>> flagged = temperature_extremely_high(ds.tas, thresh=temperature)

xclim.core.dataflags.temperature_extremely_low(da: xarray.DataArray, *, thresh: str = '-90 degC') → xarray.DataArray[source]

Check if temperatures values are below -90 degrees Celsius for any given day.

Parameters

da (xarray.DataArray)
thresh (str)

Returns

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import temperature_extremely_low
>>> ds = xr.open_dataset(path_to_tas_file)
>>> temperature = "-90 degC"
>>> flagged = temperature_extremely_low(ds.tas, thresh=temperature)

xclim.core.dataflags.values_op_thresh_repeating_for_n_or_more_days(da: xarray.DataArray, *, n: int, thresh: str, op: str = 'eq') → xarray.DataArray[source]

Check if array values repeat at a given threshold for ‘n’ or more days.

Parameters

da (xarray.DataArray) – The DataArray being examined.
n (int) – Number of days needed to trigger flag.
thresh (str) – Repeating values to search for that will trigger flag.
op ({“eq”, “gt”, “lt”, “gteq”, “lteq”}) – Operator used for comparison with thresh.

Returns

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import values_op_thresh_repeating_for_n_or_more_days
>>> ds = xr.open_dataset(path_to_pr_file)
>>> units = "5 mm d-1"
>>> days = 5
>>> comparison = "eq"
>>> flagged = values_op_thresh_repeating_for_n_or_more_days(ds.pr, n=days, thresh=units, op=comparison)

xclim.core.dataflags.values_repeating_for_n_or_more_days(da: xarray.DataArray, *, n: int) → xarray.DataArray[source]

Check if exact values are found to be repeating for at least 5 or more days.

Parameters

da (xarray.DataArray) – The DataArray being examined.
n (int) – Number of days to trigger flag.

Returns

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import values_repeating_for_n_or_more_days
>>> ds = xr.open_dataset(path_to_pr_file)
>>> flagged = values_repeating_for_n_or_more_days(ds.pr, n=5)

xclim.core.dataflags.very_large_precipitation_events(da: xarray.DataArray, *, thresh='300 mm d-1') → xarray.DataArray[source]

Check if precipitation values exceed 300 mm/day for any given day.

Parameters

da (xarray.DataArray) – The DataArray being examined.
thresh (str) – Threshold to search array for that will trigger flag if any day exceeds value.

Returns

xarray.DataArray, [bool]

Examples

To gain access to the flag_array:

>>> from xclim.core.dataflags import very_large_precipitation_events
>>> ds = xr.open_dataset(path_to_pr_file)
>>> rate = "300 mm d-1"
>>> flagged = very_large_precipitation_events(ds.pr, thresh=rate)

xclim.core.dataflags.wind_values_outside_of_bounds(da: xarray.DataArray, *, lower: str = '0 m s-1', upper: str = '46 m s-1') → xarray.DataArray[source]

Check if variable values fall below 0% or rise above 100% for any given day.

Parameters

da (xarray.DataArray) – The DataArray being examined.
lower (str) – The lower limit for wind speed.
upper (str) – The upper limit for wind speed.

Returns

xarray.DataArray, [bool]

Examples

To gain access to the flag_array: >>> from xclim.core.dataflags import wind_values_outside_of_bounds >>> ds = xr.open_dataset(path_to_tas_file) >>> ceiling, floor = “46 m s-1”, “0 m s-1” >>> flagged = wind_values_outside_of_bounds(ds.wsgsmax, upper=ceiling, lower=floor)