Skip to content

Commit 54658e5

Browse files
authored
Update 2022-6-12-geospatial-deep-learning-with-torchgeo.md
Feedback updates
1 parent 961c501 commit 54658e5

File tree

1 file changed

+14
-20
lines changed

1 file changed

+14
-20
lines changed

_posts/2022-6-12-geospatial-deep-learning-with-torchgeo.md

Lines changed: 14 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ author: Adam Stewart (University of Illinois at Urbana-Champaign), Caleb Robinso
55
featured-img: ""
66
---
77

8-
TorchGeo is a PyTorch domain library providing datasets, samplers, transforms, and pre-trained models specific to geospatial data
8+
TorchGeo is a PyTorch domain library providing datasets, samplers, transforms, and pre-trained models specific to geospatial data.
99

1010
<p align="center">
1111
<img src="/assets/images/torchgeo-logo.png" width="100%">
@@ -29,16 +29,16 @@ National Oceanic and Atmospheric Administration satellite image of Hurricane Kat
2929

3030
In traditional computer vision datasets, such as ImageNet, the image files themselves tend to be rather simple and easy to work with. Most images have 3 spectral bands (RGB), are stored in common file formats like PNG or JPEG, and can be easily loaded with popular software libraries like [PIL](https://pillow.readthedocs.io/en/stable/) or [OpenCV](https://opencv.org/). Each image in these datasets is usually small enough to pass directly into a neural network. Furthermore, most of these datasets contain a finite number of well-curated images that are assumed to be independent and identically distributed, making train-val-test splits straightforward. As a result of this relative homogeneity, the same pre-trained models (e.g., CNNs pretrained on ImageNet) have shown to be effective across a wide range of vision tasks using transfer learning methods. Existing libraries, such as [torchvision](https://github.com/pytorch/vision), handle these simple cases well, and have been used to make large advances in vision tasks over the past decade.
3131

32-
Remote sensing imagery is not so uniform. Instead of simple RGB images, satellites tend to capture images that are multispectral (Landsat 8 has 11 spectral bands) or even hyperspectral (Hyperion has 242 spectral bands). These images capture information at a wider range of wavelengths (400 nm–15 µm), far outside of the visible spectrum. Different satellites also have very different spatial resolutions—GOES has a resolution of 4 km/px, Maxar imagery is 30 cm/px, and drone imagery resolution can be as high as 7 mm/px. These datasets almost always have a temporal component, with satellite revisists that are daily, weekly, or biweekly. Images often have overlap with other images in the dataset, and need to be stitched together based on geographic metadata. These images tend to be very large (e.g., 10K x 10K pixels), so it isn't possible to pass an entire image through a neural network. This data is distributed in hundreds of different raster and vector file formats like GeoTIFF and ESRI Shapefile, requiring specialty libraries like [GDAL](https://gdal.org/) to load.
32+
Remote sensing imagery is not so uniform. Instead of simple RGB images, satellites tend to capture images that are multispectral ([Landsat 8](https://www.usgs.gov/landsat-missions) has 11 spectral bands) or even hyperspectral ([Hyperion](https://www.usgs.gov/centers/eros/science/usgs-eros-archive-earth-observing-one-eo-1-hyperion) has 242 spectral bands). These images capture information at a wider range of wavelengths (400 nm–15 µm), far outside of the visible spectrum. Different satellites also have very different spatial resolutions—[GOES](https://www.goes.noaa.gov/) has a resolution of 4 km/px, [Maxar](https://www.maxar.com/products/satellite-imagery) imagery is 30 cm/px, and drone imagery resolution can be as high as 7 mm/px. These datasets almost always have a temporal component, with satellite revisists that are daily, weekly, or biweekly. Images often have overlap with other images in the dataset, and need to be stitched together based on geographic metadata. These images tend to be very large (e.g., 10K x 10K pixels), so it isn't possible to pass an entire image through a neural network. This data is distributed in hundreds of different raster and vector file formats like GeoTIFF and ESRI Shapefile, requiring specialty libraries like [GDAL](https://gdal.org/) to load.
3333

3434

3535
<p align="center">
36-
<img src="/assets/images/torchgeo-geospatial-data.png" width="80%">
36+
<img src="/assets/images/torchgeo-map.png" width="80%">
3737
</p>
3838

3939

4040
<p align = "center">
41-
Geospatial data is associated with one of many different types of reference systems that project the 3D Earth onto a 2D representation (<a href="https://scitools.org.uk/cartopy/docs/latest/reference/projections.html">source</a>). Combining data from different sources often involves re-projecting to a common reference system in order to ensure that all layers are aligned.
41+
From left to right: Mercator, Albers Equal Area, and Interrupted Goode Homolosine projections (source). Geospatial data is associated with one of many different types of reference systems that project the 3D Earth onto a 2D representation (<a href="https://scitools.org.uk/cartopy/docs/latest/reference/projections.html">source</a>). Combining data from different sources often involves re-projecting to a common reference system in order to ensure that all layers are aligned.
4242
</p>
4343

4444
Although each image is 2D, the Earth itself is 3D. In order to stitch together images, they first need to be projected onto a 2D representation of the Earth, called a coordinate reference system (CRS). Most people are familiar with equal angle representations like Mercator that distort the size of regions (Greenland looks larger than Africa even though Africa is 15x larger), but there are many other CRSs that are commonly used. Each dataset may use a different CRS, and each image within a single dataset may also be in a unique CRS. In order to use data from multiple layers, they must all share a common CRS, otherwise the data won't be properly aligned. For those who aren't familiar with remote sensing data, this can be a daunting task.
@@ -60,7 +60,7 @@ At the moment, it can be quite challenging to work with both deep learning model
6060

6161
TorchGeo is not just a research project, but a production-quality library that uses continuous integration to test every commit with a range of Python versions on a range of platforms (Linux, macOS, Windows). It can be easily installed with any of your favorite package managers, including pip, conda, and [spack](https://spack.io):
6262

63-
```c++
63+
```Python
6464
$ pip install torchgeo
6565
```
6666

@@ -101,9 +101,7 @@ This dataset can now be used with a PyTorch data loader. Unlike benchmark datase
101101

102102
```c++
103103
sampler = RandomGeoSampler(dataset, size=256, length=10000)
104-
dataloader = DataLoader(
105-
dataset, batch_size=128, sampler=sampler, collate_fn=stack_samples
106-
)
104+
dataloader = DataLoader(dataset, batch_size=128, sampler=sampler, collate_fn=stack_samples)
107105
```
108106

109107
This data loader can now be used in your normal training/evaluation pipeline.
@@ -121,12 +119,12 @@ Many applications involve intelligently composing datasets based on geospatial m
121119
- Combine datasets for multiple image sources and treat them as equivalent (e.g., Landsat 7 and 8)
122120
- Combine datasets for disparate geospatial locations (e.g., Chesapeake NY and PA)
123121

124-
These combinations require that all queries are present in at least one dataset, and can be created using a UnionDataset. Similarly, users may want to:
122+
These combinations require that all queries are present in *at least one* dataset, and can be created using a UnionDataset. Similarly, users may want to:
125123

126124
- Combine image and target labels and sample from both simultaneously (e.g., Landsat and CDL)
127-
- Combine datasets for multiple images sources for multimodal learning or data fusion (e.g., Landsat and Sentinel)
125+
- Combine datasets for multiple image sources for multimodal learning or data fusion (e.g., Landsat and Sentinel)
128126

129-
These combinations require that all queries are present in both datasets, and can be created using an IntersectionDataset. TorchGeo automatically composes these datasets for you when you use the intersection (&) and union \(\|\) operators.
127+
These combinations require that all queries are present in *both* datasets, and can be created using an IntersectionDataset. TorchGeo automatically composes these datasets for you when you use the intersection (&) and union \(\|\) operators.
130128

131129
# Multispectral and geospatial transforms
132130

@@ -189,7 +187,7 @@ for batch in dataloader:
189187
# train a model, or make predictions using a pre-trained model
190188
```
191189

192-
All TorchGeo datasets are compatible with PyTorch DataLoaders, making them easy to integrate into existing training workflows. The only difference between a benchmark dataset in TorchGeo and a similar dataset in torchvision is that each dataset returns a dictionary with keys for each PyTorch Tensor.
190+
All TorchGeo datasets are compatible with PyTorch data loaders, making them easy to integrate into existing training workflows. The only difference between a benchmark dataset in TorchGeo and a similar dataset in torchvision is that each dataset returns a dictionary with keys for each PyTorch Tensor.
193191

194192
<p align="center">
195193
<img src="/assets/images/techge-nwpu.png" width="100%">
@@ -210,12 +208,8 @@ from pytorch_lightning import Trainer
210208
from torchgeo.datamodules import InriaAerialImageLabelingDataModule
211209
from torchgeo.trainers import SemanticSegmentationTask
212210

213-
datamodule = InriaAerialImageLabelingDataModule(
214-
root_dir="...", batch_size=64, num_workers=6
215-
)
216-
task = SemanticSegmentationTask(
217-
model="resnet18", pretrained=True, learning_rate=0.1
218-
)
211+
datamodule = InriaAerialImageLabelingDataModule(root_dir="...", batch_size=64, num_workers=6)
212+
task = SemanticSegmentationTask(model="resnet18", pretrained=True, learning_rate=0.1)
219213
trainer = Trainer(gpus=1, default_root_dir="...")
220214

221215
trainer.fit(model=task, datamodule=datamodule)
@@ -226,10 +220,10 @@ trainer.fit(model=task, datamodule=datamodule)
226220
</p>
227221

228222
<p align = "center">
229-
Building segmentations produced by a model trained on the <a href="https://project.inria.fr/aerialimagelabeling/">Inria Aerial Image Labeling dataset</a>. Reproducing these results is as simple as a few imports and four lines of code, making comparison of different models and training techniques simple and easy.
223+
Building segmentations produced by a U-Net model trained on the <a href="https://project.inria.fr/aerialimagelabeling/">Inria Aerial Image Labeling</a> dataset. Reproducing these results is as simple as a few imports and four lines of code, making comparison of different models and training techniques simple and easy.
230224
</p>
231225

232-
In our [preprint](https://arxiv.org/abs/2111.08872) we show a set of results that use the aforementioned datamodules and trainers to benchmark simple modeling approaches for several of the datasets in TorchGeo. For example, we find that a simple ResNet-50 can achieve state-of-the-art performance on the [So2Sat dataset](https://ieeexplore.ieee.org/document/9014553). These types of baseline results are important for evaluating the contribution of different modeling choices when tackling problems with remotely sensed data.
226+
In our [preprint](https://arxiv.org/abs/2111.08872) we show a set of results that use the aforementioned datamodules and trainers to benchmark simple modeling approaches for several of the datasets in TorchGeo. For example, we find that a simple ResNet-50 can achieve state-of-the-art performance on the [So2Sat](https://ieeexplore.ieee.org/document/9014553) dataset. These types of baseline results are important for evaluating the contribution of different modeling choices when tackling problems with remotely sensed data.
233227

234228
# Future work and contributing
235229

0 commit comments

Comments
 (0)