diff --git a/public/posts/multiple-groupers/repr-time.html b/public/posts/multiple-groupers/repr-time.html new file mode 100644 index 000000000..5b50edd48 --- /dev/null +++ b/public/posts/multiple-groupers/repr-time.html @@ -0,0 +1,441 @@ +
+ + + + + + + + + + + + + + +
<xarray.Dataset> Size: 255kB
+Dimensions:  (lat: 25, lon: 53, year: 2, month: 12)
+Coordinates:
+  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
+  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
+  * year     (year) int64 16B 2013 2014
+  * month    (month) int64 96B 1 2 3 4 5 6 7 8 9 10 11 12
+Data variables:
+    air      (lat, lon, year, month) float64 254kB 244.5 240.7 ... 298.8 297.7
+Attributes:
+    Conventions:  COARDS
+    title:        4x daily NMC reanalysis (1948)
+    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
+    platform:     Model
+    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
diff --git a/public/posts/multiple-groupers/repr1.html b/public/posts/multiple-groupers/repr1.html new file mode 100644 index 000000000..365cae8e0 --- /dev/null +++ b/public/posts/multiple-groupers/repr1.html @@ -0,0 +1,373 @@ +
+ + + + + + + + + + + + + + +
<xarray.DataArray (labels1: 3, labels2: 3)> Size: 72B
+array([[1. , nan, nan],
+       [nan, 2. , nan],
+       [nan, nan, 1.5]])
+Coordinates:
+  * labels1  (labels1) object 24B 'a' 'b' 'c'
+  * labels2  (labels2) object 24B 'x' 'y' 'z'
diff --git a/public/posts/multiple-groupers/repr2.html b/public/posts/multiple-groupers/repr2.html new file mode 100644 index 000000000..f2f4d7ec0 --- /dev/null +++ b/public/posts/multiple-groupers/repr2.html @@ -0,0 +1,379 @@ +
+ + + + + + + + + + + + + + +
<xarray.DataArray 'foo' (x_bins: 2, letters: 2, y: 3)> Size: 96B
+array([[[ 0.,  1.,  2.],
+        [nan, nan, nan]],
+
+       [[nan, nan, nan],
+        [ 3.,  4.,  5.]]])
+Coordinates:
+  * x_bins   (x_bins) object 16B (5, 15] (15, 25]
+  * letters  (letters) object 16B 'a' 'b'
+Dimensions without coordinates: y
diff --git a/public/posts/multiple-groupers/repr3.html b/public/posts/multiple-groupers/repr3.html new file mode 100644 index 000000000..1fce52cab --- /dev/null +++ b/public/posts/multiple-groupers/repr3.html @@ -0,0 +1,373 @@ +
+ + + + + + + + + + + + + + +
<xarray.DataArray (labels1: 3, labels2: 3)> Size: 72B
+array([[ 1., nan, nan],
+       [nan,  2., nan],
+       [nan, nan,  3.]])
+Coordinates:
+  * labels1  (labels1) object 24B 'a' 'b' 'c'
+  * labels2  (labels2) object 24B 'x' 'y' 'z'
diff --git a/src/posts/multiple-groupers/index.md b/src/posts/multiple-groupers/index.md new file mode 100644 index 000000000..e93907563 --- /dev/null +++ b/src/posts/multiple-groupers/index.md @@ -0,0 +1,112 @@ +--- +title: 'Grouping by multiple arrays with Xarray' +date: '2024-09-02' +authors: + - name: Deepak Cherian + github: dcherian + +summary: 'Xarray finally supports grouping by multiple arrays.' +--- + +## TLDR + +Xarray now supports grouping by multiple variables ([docs](https://docs.xarray.dev/en/latest/user-guide/groupby.html#grouping-by-multiple-variables)). 🎉 😱 🤯 🥳. Try it out! + +## How do I use it? + +Install `xarray>=2024.09.0` and optionally [flox](https://flox.readthedocs.io/en/latest/) for better performance with reductions. + +## Simple example + +Simple grouping by multiple categorical variables is easy: + +```python +import xarray as xr +from xarray.groupers import UniqueGrouper + +da = xr.DataArray( + np.array([1, 2, 3, 0, 2, np.nan]), + dims="d", + coords=dict( + labels1=("d", np.array(["a", "b", "c", "c", "b", "a"])), + labels2=("d", np.array(["x", "y", "z", "z", "y", "x"])), + ), +) + +gb = da.groupby(["labels1", "labels2"]) +gb +``` + +``` + +``` + +Reductions work as usual: + +```python +gb.mean() +``` + + + +So does `map`: + +```python +gb.map(lambda x: x[0]) +``` + + + +## More complex time grouping + +Grouping by multiple /virtual/ variables like `"time.month"` is also supported: + +```python +import xarray as xr + +ds = xr.tutorial.open_dataset("air_temperature") +ds.groupby(["time.year", "time.month"]).mean() +``` + + + +## Multiple Grouper types + +The above syntax `da.groupby(["labels1", "labels2"])` is a short cut for using [Grouper objects](https://docs.xarray.dev/en/latest/user-guide/groupby.html#grouping-by-multiple-variables). + +```python +da.groupby(labels1=UniqueGrouper(), labels2=UniqueGrouper()) +``` + +Grouper objects allow you to express more complicated GroupBy problems. +For example, combining different grouper types is allowed. +That is you can combine categorical grouping with [`UniqueGrouper`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.UniqueGrouper.html#xarray.groupers.UniqueGrouper), +binning with [`BinGrouper`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.BinGrouper.html#xarray.groupers.BinGrouper), and +resampling with [`TimeResampler`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.TimeResampler.html#xarray.groupers.TimeResampler). + +```python +from xarray.groupers import BinGrouper + +ds = xr.Dataset( + {"foo": (("x", "y"), np.arange(12).reshape((4, 3)))}, + coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))}, + ) +gb = ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()) +gb +``` + +``` + +``` + +Now reduce as usual + +```python +gb.mean() +``` + +