Download this notebook from github.
Command Line Interface
xclim provides the xclim
command line executable to perform basic indicator computation easily without having to start up a full Python environment. However, not all indicators listed in Climate Indicators are available through this tool.
Its use is simple; Type the following to see the usage message:
[ ]:
!xclim --help
To list all available indicators, use the “indices” subcommand:
[ ]:
!xclim indices
For more information about a specific indicator, you can either use the info
sub-command or directly access the --help
message of the indicator. The former gives more information about the metadata, while the latter only prints the usage. Note that the module name (atmos
, land
or seaIce
) is mandatory.
[ ]:
!xclim info liquidprcptot
In the usage message, VAR_NAME
indicates that the passed argument must match a variable in the input dataset.
[ ]:
from __future__ import annotations
import warnings
import numpy as np
import pandas as pd
import xarray as xr
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
warnings.filterwarnings("ignore", "implicitly registered datetime converter")
%matplotlib inline
xr.set_options(display_style="html")
time = pd.date_range("2000-01-01", periods=366)
tasmin = xr.DataArray(
-5 * np.cos(2 * np.pi * time.dayofyear / 365) + 273.15,
dims="time",
coords={"time": time},
attrs={"units": "K"},
)
tasmax = xr.DataArray(
-5 * np.cos(2 * np.pi * time.dayofyear / 365) + 283.15,
dims="time",
coords={"time": time},
attrs={"units": "K"},
)
pr = xr.DataArray(
np.clip(10 * np.sin(18 * np.pi * time.dayofyear / 365), 0, None),
dims="time",
coords={"time": time},
attrs={"units": "mm/d"},
)
ds = xr.Dataset({"tasmin": tasmin, "tasmax": tasmax, "pr": pr})
data_folder = notebook_folder / "data"
data_folder.mkdir(exist_ok=True)
ds.to_netcdf(data_folder / "example_data.nc")
Computing indicators
Let’s say we have the following toy dataset:
[ ]:
import xarray as xr
ds = xr.open_dataset(data_folder.joinpath("example_data.nc"))
display(ds)
[ ]:
import matplotlib.pyplot as plt
fig1, (ax_tas, ax_pr) = plt.subplots(1, 2, figsize=(10, 5))
ds.tasmin.plot(label="tasmin", ax=ax_tas)
ds.tasmax.plot(label="tasmax", ax=ax_tas)
ds.pr.plot(ax=ax_pr)
ax_tas.legend()
To compute an indicator, say the monthly solid precipitation accumulation, we simply call:
[ ]:
!xclim -i data/example_data.nc -o data/out1.nc solidprcptot --pr pr --tas tasmin --freq MS
In this example, we decided to use tasmin
for the tas
variable. We didn’t need to provide the --pr
parameter, as our data has the same name.
Finally, more than one indicator can be computed and written to the output dataset by simply chaining the calls:
[ ]:
!xclim -i data/example_data.nc -o data/out2.nc liquidprcptot --tas tasmin --freq MS tropical_nights --thresh "2 degC" --freq MS
Let’s see the outputs:
[ ]:
ds1 = xr.open_dataset(data_folder / "out1.nc")
ds2 = xr.open_dataset(data_folder / "out2.nc", decode_timedelta=False)
fig2, (ax_prcptot, ax_tropical_nights) = plt.subplots(1, 2, figsize=(10, 5))
ds1.solidprcptot.plot(ax=ax_prcptot, label=ds1.solidprcptot.long_name)
ds2.liquidprcptot.plot(ax=ax_prcptot, label=ds2.liquidprcptot.long_name)
ds2.tropical_nights.plot(ax=ax_tropical_nights, marker="o")
ax_prcptot.legend()
[ ]:
ds1.close()
[ ]:
ds2.close()
Data Quality Checks
As of version 0.30.0, xclim
now also provides a command-line utility for performing data quality control checks on existing NetCDF files.
These checks examine the values of data_variables for suspicious value patterns (e.g. values that repeat for many days) or erroneous values (e.g. humidity percentages outside 0-100, minimum temperatures exceeding maximum temperatures, etc.). The checks (called dataflags
) are based on the ECAD ICCLIM quality control checks (https://www.ecad.eu/documents/atbd.pdf).
The full list of checks performed for each variable are listed in xclim/core/data/variables.yml
.
[ ]:
!xclim dataflags --help
When running the dataflags
CLI checks, you must either set an output file (-o filename.nc
) or set the checks to raise if there are any failed checks (-r
).
By default, when setting an output file, the returned file will only contain the flag value (True
if no flags were raised, False
otherwise). To append the flag to a copy of the dataset, we use the -a
option.
The default behaviour is to raise a flag if any element of the array resolves to True
(i.e. aggregated across all dimensions), but we can specify the level of aggregation by dimension with the -d
or --dims
option.
[ ]:
# Create an output file with just the flag value and no aggregation (dims=None)
!xclim -i data/example_data.nc -o data/flag_output.nc dataflags -d none
# Need to wait until the file is written
!sleep 2s
[ ]:
import xarray as xr
ds1 = xr.open_dataset(data_folder / "flag_output.nc")
display(ds1.data_vars, ds1.ecad_qc_flag)
ds1.close()
[ ]:
# Create an output file with values appended to the original dataset.
!xclim -i data/example_data.nc -o data/flag_output_appended.nc dataflags -a
# Need to wait until the file is written
!sleep 2s
[ ]:
import xarray as xr
ds2 = xr.open_dataset(data_folder / "flag_output_appended.nc")
display(ds2.data_vars, ds2.ecad_qc_flag)
ds2.close()
[ ]:
# Raise an error if any quality control checks fail. Passing example:
!xclim -i data/example_data.nc dataflags -r
[ ]:
import xarray as xr
# Create some bad data with minimum temperatures exceeding max temperatures
bad_ds = xr.open_dataset(data_folder / "example_data.nc")
# Swap entire variable arrays
bad_ds["tasmin"].values, bad_ds["tasmax"].values = (
bad_ds.tasmax.values,
bad_ds.tasmin.values,
)
bad_ds.to_netcdf(data_folder / "suspicious_data.nc")
bad_ds.close()
[ ]:
# Raise an error if any quality control checks fail. Failing example:
!xclim -i data/suspicious_data.nc dataflags -r
These checks can also be set to examine a specific variable within a NetCDF file, with more descriptive information for each check performed.
[ ]:
!xclim -i data/example_data.nc -o data/flag_output_pr.nc dataflags pr
[ ]:
import xarray as xr
ds3 = xr.open_dataset(data_folder / "flag_output_pr.nc")
display(ds3.data_vars)
for dv in ds3.data_vars:
display(ds3[dv])