Download this notebook from github.
Unit Handling¶
[1]:
from __future__ import annotations
import matplotlib.pyplot as plt
import nc_time_axis # noqa
import xarray as xr
import xclim
from xclim.core import units
from xclim.testing import open_dataset
# Set display to HTML style (optional)
xr.set_options(display_style="html", display_width=50)
%matplotlib inline
plt.rcParams["figure.figsize"] = (11, 5)
A lot of effort has been placed into automatic handling of input data units. xclim will automatically detect the input variable(s) units (e.g. °C versus K or mm/s versus mm/day etc.) and adjust on-the-fly in order to calculate indices in the consistent manner. This comes with the obvious caveat that input data requires a metadata attribute for units : the units attribute is required, and the standard_name can be useful for automatic conversions.
The main unit handling method is `xclim.core.units.convert_units_to <../xclim.core.rst#xclim.core.units.convert_units_to>`__ which can also be useful on its own. xclim relies on pint for unit handling and extends the units registry and formatting functions of cf-xarray.
Simple example: Temperature¶
[2]:
# See the Usage page for details on opening datasets, subsetting and resampling.
ds = xr.tutorial.load_dataset("air_temperature")
tas = ds.air.sel(lat=40, lon=270, method="nearest").resample(time="D").mean(keep_attrs=True)
print(tas.attrs["units"])
degK
Here, we convert our kelvin data to the very useful Fahrenheits:
[3]:
tas_F = units.convert_units_to(tas, "degF")
print(tas_F.attrs["units"])
°F
Smart conversions: Precipitation¶
For precipitation data, xclim usually expects precipitation fluxes, so units of mass / (area * time). However, many indicators will also accept rates (length / time, for example mm/d) by implicitly assuming the data refers to liquid water, and thus that we can simply multiply by 1000 kg/m³ to convert from the latter to the former. This transformation is enabled on indicators that have Indicator.context == 'hydro'.
We can also leverage the CF-conventions to perform some other “smart” conversions. For example, if the CF standard name of an input refers to liquid water, the flux ⇋ rate and amount ⇋ thickness conversions explained above will be automatic in xc.core.units.convert_units_to, whether the “hydro” context is activated or not. Another CF-driven conversion is between amount and flux or thickness and rate. Here again, convert_units_to will see if the standard_name attribute, but it will
also need to infer the frequency of the data. For example, if a daily precipitation series records total daily precipitation and has units of mm (a “thickness”), it should use the lwe_thickness_of_precipitation_amount standard name and have a regular time coordinate, With these two, xclim will understand it and be able to convert it to a precipitation flux (by dividing by 1 day and multiplying by 1000 kg/m³).
These CF conversions are not automatically done in the indicator (in opposition to the “hydro” context ones). convert_units_to should be called beforehand.
Here are some examples:
Going from a precipitation flux to a daily thickness
[4]:
ds = open_dataset("ERA5/daily_surface_cancities_1990-1993.nc")
ds.pr.attrs
[4]:
{'cell_methods': 'time: mean within days',
'description': 'Total precipitation thickness converted to mass flux using a water density of 1000 kg/m³.',
'long_name': 'Mean daily precipitation flux',
'original_long_name': 'Total precipitation',
'original_variable': 'tp',
'standard_name': 'precipitation_flux',
'units': 'kg m-2 s-1'}
[5]:
pr_as_daily_total = units.convert_units_to(ds.pr, "mm")
pr_as_daily_total
[5]:
<xarray.DataArray 'pr' (location: 5, time: 1461)> Size: 29kB
array([[2.99711246e+01, 1.34261340e-01, 2.58628298e-02, ...,
5.25184035e-01, 2.32672749e+01, 1.51715234e-01],
[3.75277948e+00, 7.53812417e-02, 2.25566290e-02, ...,
1.66656479e-01, 2.49618101e+00, 6.33494914e-01],
[4.52306151e-01, 1.26119339e+00, 7.31839001e-01, ...,
1.54514790e-01, 8.30879658e-02, 2.86629438e-01],
[8.89770687e-01, 2.50204110e+00, 9.11779702e-01, ...,
8.16891044e-02, 2.13806495e-01, 1.33507878e-01],
[2.23460245e+00, 1.11153364e-01, 5.75242996e+00, ...,
9.38808501e-01, 1.16742477e+01, 4.87939644e+00]],
shape=(5, 1461), dtype=float32)
Coordinates:
* location (location) <U9 180B 'Halifax' ......
lon (location) float32 20B ...
lat (location) float32 20B ...
* time (time) datetime64[ns] 12kB 1990-0...
Attributes:
cell_methods: time: mean within days
description: Total precipitation th...
long_name: Mean daily precipitati...
original_long_name: Total precipitation
original_variable: tp
standard_name: lwe_thickness_of_preci...
units: mmGoing from the liquid water equivalent thickness of snow (swe) to the snow amount (snw).
The former being common in observations and the latter being the CMIP6 variable. Notice that convert_units_to cannot handle the variable name itself, that change has to be done by hand.
[6]:
ds.swe.attrs
[6]:
{'standard_name': 'lwe_thickness_of_surface_snow_amount',
'long_name': 'Liquid water equivalent of surface snow amount',
'units': 'm',
'cell_methods': 'time: mean within days'}
[7]:
units.convert_units_to(ds.swe, "kg m-2").rename("snw")
[7]:
<xarray.DataArray 'snw' (location: 5, time: 1461)> Size: 29kB
array([[ 0.7883708 , 0.38146973, 0.34332275, ..., 10.440029 ,
13.30123 , 18.70568 ],
[ 7.0449486 , 7.1975365 , 6.86693 , ..., 16.60124 ,
17.841017 , 17.027214 ],
[12.81786 , 12.957732 , 12.945016 , ..., 15.316662 ,
15.361306 , 15.2784 ],
[ 9.982265 , 10.579902 , 12.360096 , ..., 33.32916 ,
33.704273 , 33.716988 ],
[ 0. , 0. , 0. , ..., 0. ,
0. , 0. ]], shape=(5, 1461), dtype=float32)
Coordinates:
* location (location) <U9 180B 'Halifax' ......
lon (location) float32 20B ...
lat (location) float32 20B ...
* time (time) datetime64[ns] 12kB 1990-0...
Attributes:
standard_name: surface_snow_amount
long_name: Liquid water equivalent of ...
units: kg m-2
cell_methods: time: mean within daysThreshold indices¶
xclim unit handling also applies to threshold indicators. Users can provide threshold in units of choice and xclim will adjust automatically. For example, in order to determine the number of days with tasmax > 20 °C, users can define a threshold input of "20 C" or "20 degC" even if input data is in Kelvin. Alternatively, users can even provide a threshold in Kelvin ("293.15 K"), if they really wanted to.
[8]:
tasmaxK = ds.tasmax.sel(location="Halifax")
tasmaxF = units.convert_units_to(ds.tasmax.sel(location="Halifax"), "degF")
with xclim.set_options(cf_compliance="log"):
# Using Kelvin data, threshold in Celsius
out1 = xclim.atmos.tx_days_above(tasmax=tasmaxK, thresh="20 C", freq="YS")
# Using Fahrenheit data, threshold in Celsius
out2 = xclim.atmos.tx_days_above(tasmax=tasmaxF, thresh="20 C", freq="YS")
# Using Fahrenheit data, with threshold in Kelvin
out3 = xclim.atmos.tx_days_above(tasmax=tasmaxF, thresh="293.15 K", freq="YS")
# Plot and see that it's all identical:
plt.figure()
out1.plot(label="K and degC", linestyle="-")
out2.plot(label="degF and degC", marker="s", markersize=10, linestyle="none")
out3.plot(label="degF and K", marker="o", linestyle="none")
plt.legend()
[8]:
<matplotlib.legend.Legend at 0x7606c84810d0>
Sum and count indices¶
Many indices in xclim will either sum values or count events along the time dimension and over a period. As of version 0.24, unit handling dynamically infers what the sampling frequency and its corresponding unit is.
Indicators, on the other hand, do not have this flexibility and often expect input at a given frequency, more often daily than otherwise.
For example, we can run the tx_days_above on the 6-hourly test data, and it should return similar results as on the daily data, but in units of h (the base unit of the sampling frequency).
[9]:
ds = xr.tutorial.load_dataset("air_temperature")
tas_6h = ds.air.sel(lat=40, lon=270, method="nearest") # no resampling, original data is 6-hourly
tas_D = tas_6h.resample(time="D").mean()
out1_h = xclim.indices.tx_days_above(tasmax=tas_6h, thresh="20 C", freq="MS")
out2_D = xclim.indices.tx_days_above(tasmax=tas_D, thresh="20 C", freq="MS")
out1_h
[9]:
<xarray.DataArray 'air' (time: 24)> Size: 192B
array([ 0, 0, 0, 48, 228, 426, 492, 612, 456, 174, 0, 0, 0,
0, 0, 54, 282, 552, 504, 636, 324, 78, 0, 0])
Coordinates:
* time (time) datetime64[ns] 192B 2013-01...
lat float32 4B 40.0
lon float32 4B 270.0
Attributes:
long_name: 4xDaily Air temperature at s...
units: h
precision: 2
GRIB_id: 11
GRIB_name: TMP
var_desc: Air temperature
dataset: NMC Reanalysis
level_desc: Surface
statistic: Individual Obs
parent_stat: Other
actual_range: [185.16 322.1 ][10]:
out1_D = units.convert_units_to(out1_h, "d")
plt.figure()
out2_D.plot(label="From daily input", linestyle="-")
out1_D.plot(label="From 6-hourly input", linestyle="-")
plt.legend()
[10]:
<matplotlib.legend.Legend at 0x7606c844d400>
Temperature differences vs absolute temperature¶
Temperature anomalies and biases as well as degree-days indicators are all differences between temperatures. If we assign those differences units of degrees Celsius, then converting to Kelvins or Fahrenheits will yield nonsensical values. pint solves this using delta units such as delta_degC and delta_degF.
[11]:
# If we have a DataArray storing a temperature anomaly of 2°C,
# converting to Kelvin will yield a nonsensical value 0f 275.15.
# Fortunately, pint has delta units to represent temperature differences.
display(units.convert_units_to(xr.DataArray([2], attrs={"units": "delta_degC"}), "K"))
<xarray.DataArray (dim_0: 1)> Size: 8B
array([2])
Dimensions without coordinates: dim_0
Attributes:
units: K
units_metadata: temperature: unknownThe issue for xclim is that there are no equivalent delta units in the CF convention. To resolve this ambiguity, the CF convention recommends including a units_metadata attribute set to "temperature: difference", and this is supported in xclim as of version 0.52. The function units2pint interprets the units_metadata attribute and returns a pint delta unit as
needed. To convert a pint delta unit to CF attributes, use the function pint2cfattrs, which returns a dictionary with the units and units_metadata attributes (pint2cfunits cannot support the convention because it only returns the unit string).
[12]:
delta = xr.DataArray([2], attrs={"units": "K", "units_metadata": "temperature: difference"})
units.convert_units_to(delta, "delta_degC")
[12]:
<xarray.DataArray (dim_0: 1)> Size: 8B
array([2.])
Dimensions without coordinates: dim_0
Attributes:
units: degC
units_metadata: temperature: differenceOther utilities¶
Many helper functions are defined in xclim.core.units, see Unit handling module.