Health Checks¶
The Indicator
class performs a number of sanity checks on inputs to make sure valid data is fed to indices
computations (cfchecks
for checks on the metadata and datachecks
for checks on the coordinates).
Output values are properly masked in case input values are missing or invalid (missing
).
Finally, a user can use functions of dataflags
to explore potential issues with its data (extreme values, suspicious runs, etc).
CF-Convention Checking¶
Utilities designed to verify the compliance of metadata with the CF-Convention.
- xclim.core.cfchecks.cfcheck_from_name(varname, vardata, attrs=None)[source]
Perform cfchecks on a DataArray using specifications from xclim’s default variables.
- Parameters:
varname (str) – The name of the variable to check.
vardata (xr.DataArray) – The variable to check.
attrs (list of str, optional) – Attributes to check. Default is [“cell_methods”, “standard_name”].
- Raises:
ValidationError – If the variable does not meet the expected CF-Convention.
- xclim.core.cfchecks.check_valid(var, key, expected)[source]
Check that a variable’s attribute has one of the expected values and raise a ValidationError if otherwise.
- Parameters:
var (xr.DataArray) – The variable to check.
key (str) – The attribute to check.
expected (str or sequence of str) – The expected value(s).
- Raises:
ValidationError – If the attribute is not present or does not match the expected value(s).
Data Checks¶
Utilities designed to check the validity of data inputs.
- xclim.core.datachecks.check_common_time(inputs)[source]
Raise an error if the list of inputs doesn’t have a single common frequency.
- Parameters:
inputs (Sequence of xr.DataArray) – Input arrays.
- Raises:
if the frequency of any input can’t be inferred - if inputs have different frequencies - if inputs have a daily or hourly frequency, but they are not given at the same time of day.
- Return type:
None
- xclim.core.datachecks.check_daily(var)[source]
Raise an error if series has a frequency other that daily, or is not monotonically increasing.
- Parameters:
var (xr.DataArray) – Input array.
- Return type:
None
Notes
This does not check for gaps in series.
- xclim.core.datachecks.check_freq(var, freq, strict=True)[source]
Raise an error if not series has not the expected temporal frequency or is not monotonically increasing.
- Parameters:
var (xr.DataArray) – Input array.
freq (str or sequence of str) – The expected temporal frequencies, using Pandas frequency terminology (e.g. {‘Y’, ‘M’, ‘D’, ‘h’, ‘min’, ‘s’, ‘ms’, ‘us’}) and multiples thereof. To test strictly for ‘W’, pass ‘7D’ with strict=True. This ignores the start/end flag and the anchor (ex: ‘YS-JUL’ will validate against ‘Y’).
strict (bool) – Whether multiples of the frequencies are considered invalid or not. With strict set to False, a ‘3h’ series will not raise an error if freq is set to ‘h’.
- Raises:
If the frequency of var is not inferrable. - If the frequency of var does not match the requested freq.
- Return type:
None
Missing Values Identification¶
Indicators may use different criteria to determine whether a computed indicator value should be considered missing. In some cases, the presence of any missing value in the input time series should result in a missing indicator value for that period. In other cases, a minimum number of valid values or a percentage of missing values should be enforced. The World Meteorological Organisation (WMO) suggests criteria based on the number of consecutive and overall missing values per month.
xclim has a registry of missing value detection algorithms that can be extended by users to customize the behavior
of indicators. Once registered, algorithms can be used by setting the global option as
xc.set_options(check_missing="method")
or within indicators by setting the missing attribute of an
Indicator subclass. By default, xclim registers the following algorithms:
any: A result is missing if any input value is missing.
at_least_n: A result is missing if less than a given number of valid values are present.
pct: A result is missing if more than a given fraction of its values are missing.
wmo: A result is missing if 11 days are missing, or 5 consecutive values are missing in a month.
To define another missing value algorithm, subclass MissingBase
and decorate it with
xclim.core.options.register_missing_method()
. See subclassing guidelines in MissingBase
’s doc.
Note
Corresponding stand-alone functions are also exposed to run the same missing value checks independent from indicator calculations.
- xclim.core.missing.missing_any(da, freq, src_timestep=None, **indexer)[source]
Mask periods as missing if any of its elements is missing or invalid.
- Parameters:
da (xr.DataArray) – Input data, must have a “time” coordinate.
freq (str, optional) – Resampling frequency. If None, a collapse of the temporal dimension is assumed.
src_timestep (str, optional) – The expected source input frequency. If not given, it will be inferred from the input array.
**indexer (Indexer) – Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter values, month=1 to select January, or month=[6,7,8] to select summer months. If not indexer is given, all values are considered. See
xclim.core.calendar.select_time()
.
- Return type:
DataArray
- Returns:
DataArray – Boolean array at the resampled frequency, True on the periods that should be considered missing or invalid.
- xclim.core.missing.at_least_n_valid(da, freq, src_timestep=None, n=20, subfreq=None, **indexer)[source]
Mask periods as missing if they don’t have at least a given number of valid values.
Ignores the expected count of elements.
- Parameters:
da (xr.DataArray) – Input data, must have a “time” coordinate.
freq (str, optional) – Target resampling frequency. If None, a collapse of the temporal dimension is assumed.
src_timestep (str, optional) – The expected source input frequency. If not given, it will be inferred from the input array.
n (float) – The minimum number of valid values needed.
subfreq (str, optional) – If given, computes a mask at this frequency using this method and then resample at the target frequency using the “any” method on subgroups.
**indexer (Indexer) – Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter values, month=1 to select January, or month=[6,7,8] to select summer months. If no indexer is given, all values are considered. See
xclim.core.calendar.select_time()
.
- Return type:
DataArray
- Returns:
DataArray – Boolean array at the resampled frequency, True on the periods that should be considered missing or invalid.
- xclim.core.missing.missing_pct(da, freq, src_timestep=None, tolerance=0.1, subfreq=None, **indexer)[source]
Mask periods as missing when there are more than a given percentage of missing days.
- Parameters:
da (xr.DataArray) – Input data, must have a “time” coordinate.
freq (str, optional) – Target resampling frequency. If None, a collapse of the temporal dimension is assumed.
src_timestep (str, optional) – The expected source input frequency. If not given, it will be inferred from the input array.
tolerance (float) – The maximum tolerated proportion of missing values, given as a number between 0 and 1.
subfreq (str, optional) – If given, computes a mask at this frequency using this method and then resample at the target frequency using the “any” method on subgroups.
**indexer (Indexer) – Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter values, month=1 to select January, or month=[6,7,8] to select summer months. If no indexer is given, all values are considered. See
xclim.core.calendar.select_time()
.
- Return type:
DataArray
- Returns:
DataArray – Boolean array at the resampled frequency, True on the periods that should be considered missing or invalid.
- xclim.core.missing.missing_wmo(da, freq, src_timestep=None, nm=11, nc=5, **indexer)[source]
Mask periods as missing using the WMO criteria for missing days.
The World Meteorological Organisation recommends that where monthly means are computed from daily values, it should be considered missing if either of these two criteria are met:
– observations are missing for 11 or more days during the month; – observations are missing for a period of 5 or more consecutive days during the month.
Stricter criteria are sometimes used in practice, with a tolerance of 5 missing values or 3 consecutive missing values.
Notes
If used at frequencies larger than a month, for example on an annual or seasonal basis, the function will return True if any month within a period is masked.
- Parameters:
da (xr.DataArray) – Input data, must have a “time” coordinate.
freq (str, optional) – Target resampling frequency. If None, a collapse of the temporal dimension is assumed.
src_timestep (str, optional) – The expected source input frequency. If not given, it will be inferred from the input array.
nm (int) – Minimal number of missing elements for a month to be masked.
nc (int) – Minimal number of consecutive missing elements for a month to be masked.
**indexer (Indexer) – Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter values, month=1 to select January, or month=[6,7,8] to select summer months. If no indexer is given, all values are considered. See
xclim.core.calendar.select_time()
.
- Return type:
DataArray
- Returns:
DataArray – Boolean array at the resampled frequency, True on the periods that should be considered missing or invalid.
- xclim.core.missing.missing_from_context(da, freq, src_timestep=None, **indexer)[source]
Mask periods as missing according to the algorithm and options set in xclim’s global options.
The options can be manipulated with
xclim.core.options.set_options()
.- Parameters:
da (xr.DataArray) – Input data, must have a “time” coordinate.
freq (str, optional) – Resampling frequency. If absent, a collapse of the temporal dimension is assumed.
src_timestep (str, optional) – The expected source input frequency. If not given, it will be inferred from the input array.
**indexer (Indexer) – Time attribute and values over which to subset the array. For example, use season=’DJF’ to select winter values, month=1 to select January, or month=[6,7,8] to select summer months. If not indexer is given, all values are considered. See
xclim.core.calendar.select_time()
.
- Return type:
DataArray
- Returns:
DataArray – Boolean array at the resampled frequency, True on the periods that should be considered missing or invalid.
Data Flags¶
Pseudo-indicators designed to analyse supplied variables for suspicious/erroneous indicator values.
- exception xclim.core.dataflags.DataQualityException(flag_array, message='Data quality flags indicate suspicious values. Flags raised are:\\n - ')[source]
Bases:
Exception
Raised when any data evaluation checks are flagged as True.
- Parameters:
flag_array (xarray.Dataset) – Xarray.Dataset of Data Flags.
message (str) – Message prepended to the error messages.
-
flag_array:
Dataset
|None
= None
- xclim.core.dataflags.data_flags(da, ds=None, flags=None, dims='all', freq=None, raise_flags=False)[source]
Evaluate the supplied DataArray for a set of data flag checks.
Test triggers depend on variable name and availability of extra variables within Dataset for comparison. If called with raise_flags=True, will raise a DataQualityException with comments for each failed quality check.
- Parameters:
da (xarray.DataArray) – The variable to check. Must have a name that is a valid CMIP6 variable name and appears in
xclim.core.utils.VARIABLES
.ds (xarray.Dataset, optional) – An optional dataset with extra variables needed by some checks.
flags (dict, optional) – A dictionary where the keys are the name of the flags to check and the values are parameter dictionaries. The value can be None if there are no parameters to pass (i.e. default will be used). The default, None, means that the data flags list will be taken from
xclim.core.utils.VARIABLES
.dims ({“all”, None} or str or a sequence of strings) – Dimensions upon which the aggregation should be performed. Default: “all”.
freq (str, optional) – Resampling frequency to have data_flags aggregated over periods. Defaults to None, which means the “time” axis is treated as any other dimension (see dims).
raise_flags (bool) – Raise exception if any of the quality assessment flags are raised. Default: False.
- Return type:
Dataset
- Returns:
xarray.Dataset – The Dataset of boolean flag arrays.
Examples
To evaluate all applicable data flags for a given variable:
>>> from xclim.core.dataflags import data_flags >>> ds = xr.open_dataset(path_to_pr_file) >>> flagged_multi = data_flags(ds.pr, ds) >>> # The next example evaluates only one data flag, passing specific parameters. It also aggregates the flags >>> # yearly over the "time" dimension only, such that a True means there is a bad data point for that year >>> # at that location. >>> flagged_single = data_flags( ... ds.pr, ... ds, ... flags={"very_large_precipitation_events": {"thresh": "250 mm d-1"}}, ... dims=None, ... freq="YS", ... )
- xclim.core.dataflags.ecad_compliant(ds, dims='all', raise_flags=False, append=True)[source]
Run ECAD compliance tests.
Assert that file adheres to ECAD-based quality assurance checks.
- Parameters:
ds (xarray.Dataset) – Variable-containing dataset.
dims ({“all”} or str or a sequence of strings, optional) – Dimensions upon which aggregation should be performed. Default:
"all"
.raise_flags (bool) – Raise exception if any of the quality assessment flags are raised, otherwise returns None. Default:
False
.append (bool) – If True, return the Dataset with the ecad_qc_flag array appended to data_vars. If False, return the DataArray of the ecad_qc_flag variable.
- Return type:
DataArray
|Dataset
|None
- Returns:
xarray.DataArray or xarray.Dataset or None – Flag array or Dataset with flag array(s) appended.
- xclim.core.dataflags.negative_accumulation_values(da)[source]
Check if variable values are negative for any given day.
- Parameters:
da (xarray.DataArray) – Variable array.
- Return type:
DataArray
- Returns:
xarray.DataArray, [bool] – Boolean array of True where values are negative.
Examples
To gain access to the flag_array:
>>> from xclim.core.dataflags import negative_accumulation_values >>> ds = xr.open_dataset(path_to_pr_file) >>> flagged = negative_accumulation_values(ds.pr)
- xclim.core.dataflags.outside_n_standard_deviations_of_climatology(da, *, n, window=5)[source]
Check if any daily value is outside n standard deviations from the day of year mean.
- Parameters:
da (xarray.DataArray) – Variable array.
n (int) – Number of standard deviations.
window (int) – Moving window used in determining the climatological mean. Default: 5.
- Return type:
DataArray
- Returns:
xarray.DataArray, [bool] – The boolean array of True where values exceed the bounds.
Notes
A moving window of five (5) days is suggested for tas data flag calculations according to ICCLIM data quality standards.
References
Project team ECA&D and KNMI [2013]
Examples
To gain access to the flag_array:
>>> from xclim.core.dataflags import outside_n_standard_deviations_of_climatology >>> ds = xr.open_dataset(path_to_tas_file) >>> std_devs = 5 >>> average_over = 5 >>> flagged = outside_n_standard_deviations_of_climatology(ds.tas, n=std_devs, window=average_over)
- xclim.core.dataflags.percentage_values_outside_of_bounds(da)[source]
Check if variable values fall below 0% or exceed 100% for any given day.
- Parameters:
da (xarray.DataArray) – Variable array.
- Return type:
DataArray
- Returns:
xarray.DataArray, [bool] – The boolean array of True where values exceed the bounds.
Examples
To gain access to the flag_array:
>>> from xclim.core.dataflags import percentage_values_outside_of_bounds >>> flagged = percentage_values_outside_of_bounds(huss_dataset)
- xclim.core.dataflags.register_methods(variable_name=None)[source]
Register a data flag as functional.
Argument can be the output variable name template. The template may use any of the string-like input arguments. If not given, the function name is used instead, which may create variable conflicts.
- Parameters:
variable_name (str, optional) – The output variable name template. Default is None.
- Return type:
Callable
- Returns:
callable – The function being registered.
- xclim.core.dataflags.tas_below_tasmin(tas, tasmin)[source]
Check if tas values are below tasmin values for any given day.
- Parameters:
tas (xarray.DataArray) – Mean temperature.
tasmin (xarray.DataArray) – Minimum temperature.
- Return type:
DataArray
- Returns:
xarray.DataArray, [bool] – Boolean array of True where tas is below tasmin.
Examples
To gain access to the flag_array:
>>> from xclim.core.dataflags import tas_below_tasmin >>> ds = xr.open_dataset(path_to_tas_file) >>> flagged = tas_below_tasmin(ds.tas, ds.tasmin)
- xclim.core.dataflags.tas_exceeds_tasmax(tas, tasmax)[source]
Check if tas values tasmax values for any given day.
- Parameters:
tas (xarray.DataArray) – Mean temperature.
tasmax (xarray.DataArray) – Maximum temperature.
- Return type:
DataArray
- Returns:
xarray.DataArray, [bool] – Boolean array of True where tas is above tasmax.
Examples
To gain access to the flag_array:
>>> from xclim.core.dataflags import tas_exceeds_tasmax >>> ds = xr.open_dataset(path_to_tas_file) >>> flagged = tas_exceeds_tasmax(ds.tas, ds.tasmax)
- xclim.core.dataflags.tasmax_below_tasmin(tasmax, tasmin)[source]
Check if tasmax values are below tasmin values for any given day.
- Parameters:
tasmax (xarray.DataArray) – Maximum temperature.
tasmin (xarray.DataArray) – Minimum temperature.
- Return type:
DataArray
- Returns:
xarray.DataArray, [bool] – Boolean array of True where tasmax is below tasmin.
Examples
To gain access to the flag_array:
>>> from xclim.core.dataflags import tasmax_below_tasmin >>> ds = xr.open_dataset(path_to_tas_file) >>> flagged = tasmax_below_tasmin(ds.tasmax, ds.tasmin)
- xclim.core.dataflags.temperature_extremely_high(da, *, thresh='60 degC')[source]
Check if temperature values exceed 60 degrees Celsius for any given day.
- Parameters:
da (xarray.DataArray) – Temperature.
thresh (str) – Threshold above which temperatures are considered problematic and a flag is raised. Default is 60 degrees Celsius.
- Return type:
DataArray
- Returns:
xarray.DataArray, [bool] – Boolean array of True where temperatures are above the threshold.
Examples
To gain access to the flag_array:
>>> from xclim.core.dataflags import temperature_extremely_high >>> ds = xr.open_dataset(path_to_tas_file) >>> temperature = "60 degC" >>> flagged = temperature_extremely_high(ds.tas, thresh=temperature)
- xclim.core.dataflags.temperature_extremely_low(da, *, thresh='-90 degC')[source]
Check if temperature values are below -90 degrees Celsius for any given day.
- Parameters:
da (xarray.DataArray) – Temperature.
thresh (str) – Threshold below which temperatures are considered problematic and a flag is raised. Default is -90 degrees Celsius.
- Return type:
DataArray
- Returns:
xarray.DataArray, [bool] – Boolean array of True where temperatures are below the threshold.
Examples
To gain access to the flag_array:
>>> from xclim.core.dataflags import temperature_extremely_low >>> ds = xr.open_dataset(path_to_tas_file) >>> temperature = "-90 degC" >>> flagged = temperature_extremely_low(ds.tas, thresh=temperature)
- xclim.core.dataflags.values_op_thresh_repeating_for_n_or_more_days(da, *, n, thresh, op='==')[source]
Check if array values repeat at a given threshold for N or more days.
- Parameters:
da (xarray.DataArray) – Variable array.
n (int) – Number of repeating days needed to trigger data flag.
thresh (str) – Repeating values to search for that will trigger data flag.
op ({“>”, “gt”, “<”, “lt”, “>=”, “ge”, “<=”, “le”, “==”, “eq”, “!=”, “ne”}) – Operator used for comparison with thresh.
- Return type:
DataArray
- Returns:
xarray.DataArray, [bool] – Boolean array of True where values repeat at threshold for N or more days.
Examples
To gain access to the flag_array:
>>> from xclim.core.dataflags import values_op_thresh_repeating_for_n_or_more_days >>> ds = xr.open_dataset(path_to_pr_file) >>> units = "5 mm d-1" >>> days = 5 >>> comparison = "eq" >>> flagged = values_op_thresh_repeating_for_n_or_more_days(ds.pr, n=days, thresh=units, op=comparison)
- xclim.core.dataflags.values_repeating_for_n_or_more_days(da, *, n)[source]
Check if exact values are found to be repeating for at least 5 or more days.
- Parameters:
da (xarray.DataArray) – Variable array.
n (int) – Number of days to trigger flag.
- Return type:
DataArray
- Returns:
xarray.DataArray, [bool] – The boolean array of True where values repeat for n or more days.
Examples
To gain access to the flag_array:
>>> from xclim.core.dataflags import values_repeating_for_n_or_more_days >>> ds = xr.open_dataset(path_to_pr_file) >>> flagged = values_repeating_for_n_or_more_days(ds.pr, n=5)
- xclim.core.dataflags.very_large_precipitation_events(da, *, thresh='300 mm d-1')[source]
Check if precipitation values exceed 300 mm/day for any given day.
- Parameters:
da (xarray.DataArray) – Precipitation.
thresh (str) – Threshold to search an array for that will trigger flag if any day exceeds value.
- Return type:
DataArray
- Returns:
xarray.DataArray, [bool] – Boolean array of True where precipitation values exceed the threshold.
Examples
To gain access to the flag_array:
>>> from xclim.core.dataflags import very_large_precipitation_events >>> ds = xr.open_dataset(path_to_pr_file) >>> rate = "300 mm d-1" >>> flagged = very_large_precipitation_events(ds.pr, thresh=rate)
- xclim.core.dataflags.wind_values_outside_of_bounds(da, *, lower='0 m s-1', upper='46 m s-1')[source]
Check if wind speed values exceed reasonable bounds for any given day.
- Parameters:
da (xarray.DataArray) – Wind speed.
lower (str) – The lower limit for wind speed. Default is 0 m s-1.
upper (str) – The upper limit for wind speed. Default is 46 m s-1.
- Return type:
DataArray
- Returns:
xarray.DataArray, [bool] – The boolean array of True where values exceed the bounds.
Examples
To gain access to the flag_array:
>>> from xclim.core.dataflags import wind_values_outside_of_bounds >>> ceiling, floor = "46 m s-1", "0 m s-1" >>> flagged = wind_values_outside_of_bounds(sfcWind_dataset, upper=ceiling, lower=floor)