From 7c3e71d735ec4836fb1bd7c2eb17a53d39c73147 Mon Sep 17 00:00:00 2001 From: Deepak Cherian Date: Tue, 27 Aug 2024 21:58:20 -0600 Subject: [PATCH 1/8] GroupBy(multiple variables) --- src/posts/multiple-groupers/index.md | 125 +++++++++++++++++++++++++++ 1 file changed, 125 insertions(+) create mode 100644 src/posts/multiple-groupers/index.md diff --git a/src/posts/multiple-groupers/index.md b/src/posts/multiple-groupers/index.md new file mode 100644 index 000000000..769ab27bc --- /dev/null +++ b/src/posts/multiple-groupers/index.md @@ -0,0 +1,125 @@ +--- +title: 'Grouping by multiple arrays with Xarray' +date: '2023-07-18' +authors: + - name: Deepak Cherian + github: dcherian + +summary: 'Xarray finally supports grouping by multiple arrays. 🎉' +--- + +## TLDR + +Xarray now supports grouping by multiple variables ([docs](https://docs.xarray.dev/en/latest/user-guide/groupby.html#grouping-by-multiple-variables)). 🎉 😱 🤯 🥳. Try it out! + +## How do I use it? + +Install `xarray>=2024.08.0` and optionally [flox](https://flox.readthedocs.io/en/latest/) for better performance with reductions. + +## Simple example + +Set up a multiple variable groupby using [Grouper objects](https://docs.xarray.dev/en/latest/user-guide/groupby.html#grouping-by-multiple-variables). + +```python +import xarray as xr +from xarray.groupers import UniqueGrouper + +da = xr.DataArray( + np.array([1, 2, 3, 0, 2, np.nan]), + dims="d", + coords=dict( + labels1=("d", np.array(["a", "b", "c", "c", "b", "a"])), + labels2=("d", np.array(["x", "y", "z", "z", "y", "x"])), + ), +) + +gb = da.groupby(labels1=UniqueGrouper(), labels2=UniqueGrouper()) +gb +``` + +``` + +``` + +Reductions work as usual: + +```python +gb.mean() +``` + +``` +xarray.DataArray (labels1: 3, labels2: 3)> Size: 72B +array([[1. , nan, nan], + [nan, 2. , nan], + [nan, nan, 1.5]]) +Coordinates: + * labels1 (labels1) object 24B 'a' 'b' 'c' + * labels2 (labels2) object 24B 'x' 'y' 'z' +``` + +So does `map`: + +```python +gb.map(lambda x: x[0]) +``` + +``` + Size: 72B +array([[ 1., nan, nan], + [nan, 2., nan], + [nan, nan, 3.]]) +Coordinates: + * labels1 (labels1) object 24B 'a' 'b' 'c' + * labels2 (labels2) object 24B 'x' 'y' 'z' +``` + +## Multiple Groupers + +Combining different grouper types is allowed, that is you can combine +categorical grouping with` UniqueGrouper`, binning with `BinGrouper`, and +resampling with `TimeResampler`. + +```python +ds = xr.Dataset( + {"foo": (("x", "y"), np.arange(12).reshape((4, 3)))}, + coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))}, + ) +gb = ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()) +gb +``` + +``` +from xarray.groupers import BinGrouper + +ds = xr.Dataset( + {"foo": (("x", "y"), np.arange(12).reshape((4, 3)))}, + coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))}, + ) +gb = ds.foo.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()) +gb +``` + +``` + +``` + +```python +gb.mean() +``` + +``` + Size: 96B +array([[[ 0., 1., 2.], + [nan, nan, nan]], + + [[nan, nan, nan], + [ 3., 4., 5.]]]) +Coordinates: + * x_bins (x_bins) object 16B (5, 15] (15, 25] + * letters (letters) object 16B 'a' 'b' +Dimensions without coordinates: y +``` From 5eabb2d5511d222c2ed7a2e8cdeb961161f76702 Mon Sep 17 00:00:00 2001 From: Deepak Cherian Date: Tue, 27 Aug 2024 22:08:40 -0600 Subject: [PATCH 2/8] cleanup --- src/posts/multiple-groupers/index.md | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/src/posts/multiple-groupers/index.md b/src/posts/multiple-groupers/index.md index 769ab27bc..2906cd3fd 100644 --- a/src/posts/multiple-groupers/index.md +++ b/src/posts/multiple-groupers/index.md @@ -5,7 +5,7 @@ authors: - name: Deepak Cherian github: dcherian -summary: 'Xarray finally supports grouping by multiple arrays. 🎉' +summary: 'Xarray finally supports grouping by multiple arrays.' --- ## TLDR @@ -82,22 +82,13 @@ categorical grouping with` UniqueGrouper`, binning with `BinGrouper`, and resampling with `TimeResampler`. ```python -ds = xr.Dataset( - {"foo": (("x", "y"), np.arange(12).reshape((4, 3)))}, - coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))}, - ) -gb = ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()) -gb -``` - -``` from xarray.groupers import BinGrouper ds = xr.Dataset( {"foo": (("x", "y"), np.arange(12).reshape((4, 3)))}, coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))}, ) -gb = ds.foo.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()) +gb = ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()) gb ``` @@ -107,6 +98,8 @@ gb 'letters': 2 groups with labels 'a', 'b'> ``` +Now reduce as usual + ```python gb.mean() ``` From 6ffb021500dce6be95a8ae02f700e49f7a795f47 Mon Sep 17 00:00:00 2001 From: Deepak Cherian Date: Tue, 27 Aug 2024 22:11:13 -0600 Subject: [PATCH 3/8] more links --- src/posts/multiple-groupers/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/posts/multiple-groupers/index.md b/src/posts/multiple-groupers/index.md index 2906cd3fd..d5dc019dc 100644 --- a/src/posts/multiple-groupers/index.md +++ b/src/posts/multiple-groupers/index.md @@ -78,8 +78,8 @@ Coordinates: ## Multiple Groupers Combining different grouper types is allowed, that is you can combine -categorical grouping with` UniqueGrouper`, binning with `BinGrouper`, and -resampling with `TimeResampler`. +categorical grouping with [`UniqueGrouper`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.UniqueGrouper.html#xarray.groupers.UniqueGrouper), binning with [`BinGrouper`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.BinGrouper.html#xarray.groupers.BinGrouper), and +resampling with [`TimeResampler`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.TimeResampler.html#xarray.groupers.TimeResampler). ```python from xarray.groupers import BinGrouper From 05b07a599e115d0a7e3e1c7f0c9823e66422e0fa Mon Sep 17 00:00:00 2001 From: Deepak Cherian Date: Tue, 27 Aug 2024 22:17:59 -0600 Subject: [PATCH 4/8] Add reprs --- public/posts/multiple-groupers/repr1.html | 373 +++++++++++++++++++++ public/posts/multiple-groupers/repr2.html | 379 ++++++++++++++++++++++ public/posts/multiple-groupers/repr3.html | 373 +++++++++++++++++++++ src/posts/multiple-groupers/index.md | 33 +- 4 files changed, 1128 insertions(+), 30 deletions(-) create mode 100644 public/posts/multiple-groupers/repr1.html create mode 100644 public/posts/multiple-groupers/repr2.html create mode 100644 public/posts/multiple-groupers/repr3.html diff --git a/public/posts/multiple-groupers/repr1.html b/public/posts/multiple-groupers/repr1.html new file mode 100644 index 000000000..365cae8e0 --- /dev/null +++ b/public/posts/multiple-groupers/repr1.html @@ -0,0 +1,373 @@ +
+ + + + + + + + + + + + + + +
<xarray.DataArray (labels1: 3, labels2: 3)> Size: 72B
+array([[1. , nan, nan],
+       [nan, 2. , nan],
+       [nan, nan, 1.5]])
+Coordinates:
+  * labels1  (labels1) object 24B 'a' 'b' 'c'
+  * labels2  (labels2) object 24B 'x' 'y' 'z'
diff --git a/public/posts/multiple-groupers/repr2.html b/public/posts/multiple-groupers/repr2.html new file mode 100644 index 000000000..f2f4d7ec0 --- /dev/null +++ b/public/posts/multiple-groupers/repr2.html @@ -0,0 +1,379 @@ +
+ + + + + + + + + + + + + + +
<xarray.DataArray 'foo' (x_bins: 2, letters: 2, y: 3)> Size: 96B
+array([[[ 0.,  1.,  2.],
+        [nan, nan, nan]],
+
+       [[nan, nan, nan],
+        [ 3.,  4.,  5.]]])
+Coordinates:
+  * x_bins   (x_bins) object 16B (5, 15] (15, 25]
+  * letters  (letters) object 16B 'a' 'b'
+Dimensions without coordinates: y
diff --git a/public/posts/multiple-groupers/repr3.html b/public/posts/multiple-groupers/repr3.html new file mode 100644 index 000000000..1fce52cab --- /dev/null +++ b/public/posts/multiple-groupers/repr3.html @@ -0,0 +1,373 @@ +
+ + + + + + + + + + + + + + +
<xarray.DataArray (labels1: 3, labels2: 3)> Size: 72B
+array([[ 1., nan, nan],
+       [nan,  2., nan],
+       [nan, nan,  3.]])
+Coordinates:
+  * labels1  (labels1) object 24B 'a' 'b' 'c'
+  * labels2  (labels2) object 24B 'x' 'y' 'z'
diff --git a/src/posts/multiple-groupers/index.md b/src/posts/multiple-groupers/index.md index d5dc019dc..cede0850a 100644 --- a/src/posts/multiple-groupers/index.md +++ b/src/posts/multiple-groupers/index.md @@ -49,15 +49,7 @@ Reductions work as usual: gb.mean() ``` -``` -xarray.DataArray (labels1: 3, labels2: 3)> Size: 72B -array([[1. , nan, nan], - [nan, 2. , nan], - [nan, nan, 1.5]]) -Coordinates: - * labels1 (labels1) object 24B 'a' 'b' 'c' - * labels2 (labels2) object 24B 'x' 'y' 'z' -``` + So does `map`: @@ -65,15 +57,7 @@ So does `map`: gb.map(lambda x: x[0]) ``` -``` - Size: 72B -array([[ 1., nan, nan], - [nan, 2., nan], - [nan, nan, 3.]]) -Coordinates: - * labels1 (labels1) object 24B 'a' 'b' 'c' - * labels2 (labels2) object 24B 'x' 'y' 'z' -``` + ## Multiple Groupers @@ -104,15 +88,4 @@ Now reduce as usual gb.mean() ``` -``` - Size: 96B -array([[[ 0., 1., 2.], - [nan, nan, nan]], - - [[nan, nan, nan], - [ 3., 4., 5.]]]) -Coordinates: - * x_bins (x_bins) object 16B (5, 15] (15, 25] - * letters (letters) object 16B 'a' 'b' -Dimensions without coordinates: y -``` + From 464a301929514fb7295180702a9e3da21bcd1fc8 Mon Sep 17 00:00:00 2001 From: Deepak Cherian Date: Fri, 30 Aug 2024 17:32:53 -0600 Subject: [PATCH 5/8] Update --- src/posts/multiple-groupers/index.md | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/src/posts/multiple-groupers/index.md b/src/posts/multiple-groupers/index.md index cede0850a..a1266de9d 100644 --- a/src/posts/multiple-groupers/index.md +++ b/src/posts/multiple-groupers/index.md @@ -18,7 +18,7 @@ Install `xarray>=2024.08.0` and optionally [flox](https://flox.readthedocs.io/en ## Simple example -Set up a multiple variable groupby using [Grouper objects](https://docs.xarray.dev/en/latest/user-guide/groupby.html#grouping-by-multiple-variables). +Simple grouping by multiple categorical variables is easy: ```python import xarray as xr @@ -33,7 +33,7 @@ da = xr.DataArray( ), ) -gb = da.groupby(labels1=UniqueGrouper(), labels2=UniqueGrouper()) +gb = da.groupby(["labels1", "labels2"]) gb ``` @@ -59,10 +59,18 @@ gb.map(lambda x: x[0]) -## Multiple Groupers +## Multiple Grouper types -Combining different grouper types is allowed, that is you can combine -categorical grouping with [`UniqueGrouper`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.UniqueGrouper.html#xarray.groupers.UniqueGrouper), binning with [`BinGrouper`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.BinGrouper.html#xarray.groupers.BinGrouper), and +The above syntax `da.groupby(["labels1", "labels2"])` is a short cut for using [Grouper objects](https://docs.xarray.dev/en/latest/user-guide/groupby.html#grouping-by-multiple-variables). + +```python +da.groupby(labels1=UniqueGrouper(), labels2=UniqueGrouper()) +``` + +Grouper objects allow you to express more complicated GroupBy problems. +For example, combining different grouper types is allowed. +That is you can combine categorical grouping with [`UniqueGrouper`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.UniqueGrouper.html#xarray.groupers.UniqueGrouper), +binning with [`BinGrouper`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.BinGrouper.html#xarray.groupers.BinGrouper), and resampling with [`TimeResampler`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.TimeResampler.html#xarray.groupers.TimeResampler). ```python From 0b59a5708c3e15459c656c257f92c1eb736d162c Mon Sep 17 00:00:00 2001 From: Deepak Cherian Date: Tue, 3 Sep 2024 13:16:43 -0600 Subject: [PATCH 6/8] Update index.md --- src/posts/multiple-groupers/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/posts/multiple-groupers/index.md b/src/posts/multiple-groupers/index.md index a1266de9d..5a0e6dbb4 100644 --- a/src/posts/multiple-groupers/index.md +++ b/src/posts/multiple-groupers/index.md @@ -14,7 +14,7 @@ Xarray now supports grouping by multiple variables ([docs](https://docs.xarray.d ## How do I use it? -Install `xarray>=2024.08.0` and optionally [flox](https://flox.readthedocs.io/en/latest/) for better performance with reductions. +Install `xarray>=2024.09.0` and optionally [flox](https://flox.readthedocs.io/en/latest/) for better performance with reductions. ## Simple example From a7987dce2c71759dcd26010edf2ece1b314cda44 Mon Sep 17 00:00:00 2001 From: Deepak Cherian Date: Tue, 3 Sep 2024 13:20:37 -0600 Subject: [PATCH 7/8] Update src/posts/multiple-groupers/index.md --- src/posts/multiple-groupers/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/posts/multiple-groupers/index.md b/src/posts/multiple-groupers/index.md index 5a0e6dbb4..423820aae 100644 --- a/src/posts/multiple-groupers/index.md +++ b/src/posts/multiple-groupers/index.md @@ -1,6 +1,6 @@ --- title: 'Grouping by multiple arrays with Xarray' -date: '2023-07-18' +date: '2024-09-02' authors: - name: Deepak Cherian github: dcherian From 407b839377bcde6f65c773bc5be9a4fa1f63c7d5 Mon Sep 17 00:00:00 2001 From: Deepak Cherian Date: Tue, 3 Sep 2024 21:17:29 -0600 Subject: [PATCH 8/8] Add ["time.year", "time.month"] --- public/posts/multiple-groupers/repr-time.html | 441 ++++++++++++++++++ src/posts/multiple-groupers/index.md | 13 + 2 files changed, 454 insertions(+) create mode 100644 public/posts/multiple-groupers/repr-time.html diff --git a/public/posts/multiple-groupers/repr-time.html b/public/posts/multiple-groupers/repr-time.html new file mode 100644 index 000000000..5b50edd48 --- /dev/null +++ b/public/posts/multiple-groupers/repr-time.html @@ -0,0 +1,441 @@ +
+ + + + + + + + + + + + + + +
<xarray.Dataset> Size: 255kB
+Dimensions:  (lat: 25, lon: 53, year: 2, month: 12)
+Coordinates:
+  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
+  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
+  * year     (year) int64 16B 2013 2014
+  * month    (month) int64 96B 1 2 3 4 5 6 7 8 9 10 11 12
+Data variables:
+    air      (lat, lon, year, month) float64 254kB 244.5 240.7 ... 298.8 297.7
+Attributes:
+    Conventions:  COARDS
+    title:        4x daily NMC reanalysis (1948)
+    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
+    platform:     Model
+    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
diff --git a/src/posts/multiple-groupers/index.md b/src/posts/multiple-groupers/index.md index 423820aae..e93907563 100644 --- a/src/posts/multiple-groupers/index.md +++ b/src/posts/multiple-groupers/index.md @@ -59,6 +59,19 @@ gb.map(lambda x: x[0]) +## More complex time grouping + +Grouping by multiple /virtual/ variables like `"time.month"` is also supported: + +```python +import xarray as xr + +ds = xr.tutorial.open_dataset("air_temperature") +ds.groupby(["time.year", "time.month"]).mean() +``` + + + ## Multiple Grouper types The above syntax `da.groupby(["labels1", "labels2"])` is a short cut for using [Grouper objects](https://docs.xarray.dev/en/latest/user-guide/groupby.html#grouping-by-multiple-variables).