[Feature Request] Datasets Should Use New `torchvision.io` Image Loader APIs and Return `TVTensor` Images by Default #8762

fang-d · 2024-11-28T10:31:42Z

🚀 The feature

Add "torchvision" image loader backend based on new torchvision.io APIs (See: Release Notes v0.20) and enable it by default.
VisionDatasets should return TVTensor images by default instead of PIL.Image.

Motivation, pitch

TorchVision v0.20 introduces new torchvision.io APIs that enhance its encoding/decoding capabilities.
Current VisionDatasets returns PIL.Image by default, but the first step of transforms is usually transforms.ToImage().
PIL is slow (See: Pillow-SIMD), especially when compared with new torchvision.io APIs.
Current TorchVision image loader backends are based on PIL or accimage, not including new torchvision.io APIs.

Alternatives

The return type of datasets can be PIL.Image when using the PIL or the accimage backends, and be TVTensor if using new APIs (may lose consistency).

Additional context

I would like to make a pull request if the community likes this feature.

The text was updated successfully, but these errors were encountered:

NicolasHug · 2024-12-09T12:35:03Z

Hi @fang-d , thank you for the feature request. This is a great idea, and I think the torchvision decoders are in a stable enough state to enable this now.

We already support the loader parameter for some datasets (mostly ImageFolder I think https://pytorch.org/vision/stable/generated/torchvision.datasets.ImageFolder.html#torchvision.datasets.ImageFolder). But we should enable the same for all existing datasets.

I think the way to go would probably be to add that loader parameter to all datasets that currently call Image.open(...).

~/dev/vision (main*) » git grep Image.open torchvision/datasets                                         nicolashug@nicolashug-fedora-PF2MMKSN
torchvision/datasets/_optical_flow.py:        img = Image.open(file_name)
torchvision/datasets/_stereo_matching.py:        img = Image.open(file_path)
torchvision/datasets/_stereo_matching.py:        disparity_map = np.asarray(Image.open(file_path)) / 256.0
torchvision/datasets/_stereo_matching.py:        disparity_map = np.asarray(Image.open(file_path)) / 256.0
torchvision/datasets/_stereo_matching.py:        disparity_map = np.asarray(Image.open(file_path), dtype=np.float32)
torchvision/datasets/_stereo_matching.py:        depth = np.asarray(Image.open(file_path))
torchvision/datasets/_stereo_matching.py:        disparity_map = np.asarray(Image.open(file_path), dtype=np.float32)
torchvision/datasets/_stereo_matching.py:        valid_mask = np.asarray(Image.open(occlued_mask_path)) == 0
torchvision/datasets/_stereo_matching.py:        off_mask = np.asarray(Image.open(out_of_frame_mask_path)) == 0
torchvision/datasets/_stereo_matching.py:        disparity_map = np.asarray(Image.open(file_path), dtype=np.float32)
torchvision/datasets/_stereo_matching.py:        valid_mask = Image.open(mask_path)
torchvision/datasets/caltech.py:        img = Image.open(
torchvision/datasets/caltech.py:        img = Image.open(
torchvision/datasets/celeba.py:        X = PIL.Image.open(os.path.join(self.root, self.base_folder, "img_align_celeba", self.filename[index]))
torchvision/datasets/cityscapes.py:        image = Image.open(self.images[index]).convert("RGB")
torchvision/datasets/cityscapes.py:                target = Image.open(self.targets[index][i])  # type: ignore[assignment]
torchvision/datasets/clevr.py:        image = Image.open(image_file).convert("RGB")
torchvision/datasets/coco.py:        return Image.open(os.path.join(self.root, path)).convert("RGB")
torchvision/datasets/dtd.py:        image = PIL.Image.open(image_file).convert("RGB")
torchvision/datasets/fgvc_aircraft.py:        image = PIL.Image.open(image_file).convert("RGB")
torchvision/datasets/flickr.py:        img = Image.open(img_id).convert("RGB")
torchvision/datasets/flickr.py:        img = Image.open(filename).convert("RGB")
torchvision/datasets/flowers102.py:        image = PIL.Image.open(image_file).convert("RGB")
torchvision/datasets/folder.py:        img = Image.open(f)
torchvision/datasets/food101.py:        image = PIL.Image.open(image_file).convert("RGB")
torchvision/datasets/gtsrb.py:        sample = PIL.Image.open(path).convert("RGB")
torchvision/datasets/imagenette.py:        image = Image.open(path).convert("RGB")
torchvision/datasets/inaturalist.py:        img = Image.open(os.path.join(self.root, self.all_categories[cat_id], fname))
torchvision/datasets/kitti.py:        image = Image.open(self.images[index])
torchvision/datasets/lfw.py:            img = Image.open(f)
torchvision/datasets/lsun.py:        img = Image.open(buf).convert("RGB")
torchvision/datasets/omniglot.py:        image = Image.open(image_path, mode="r").convert("L")
torchvision/datasets/oxford_iiit_pet.py:        image = Image.open(self._images[idx]).convert("RGB")
torchvision/datasets/oxford_iiit_pet.py:                target.append(Image.open(self._segs[idx]))
torchvision/datasets/phototour.py:        img = Image.open(fpath)
torchvision/datasets/rendered_sst2.py:        image = PIL.Image.open(image_file).convert("RGB")
torchvision/datasets/sbd.py:        img = Image.open(self.images[index]).convert("RGB")
torchvision/datasets/sbu.py:        img = Image.open(filename).convert("RGB")
torchvision/datasets/stanford_cars.py:        pil_image = Image.open(image_path).convert("RGB")
torchvision/datasets/sun397.py:        image = PIL.Image.open(image_file).convert("RGB")
torchvision/datasets/voc.py:        img = Image.open(self.images[index]).convert("RGB")
torchvision/datasets/voc.py:        target = Image.open(self.masks[index])
torchvision/datasets/voc.py:        img = Image.open(self.images[index]).convert("RGB")
torchvision/datasets/widerface.py:        img = Image.open(self.img_info[index]["img_path"])  # type: ignore[arg-type]

GdoongMathew · 2025-02-05T19:31:26Z

Hi @NicolasHug
I think since ImageNet dataset already supports loader parameter, for consistency reason we could also add kwargs to Country211 and EuroSAT datasets?

GdoongMathew · 2025-02-08T09:48:20Z

Hi @NicolasHug, I'd like to tackle this issue, but there're some concerns that I'd like to ask here first.

would it be ok to move default_loader, pil_loader, and accimage_loader from the original folder.py to vision.py? the change would enable other classes to set the default loader value to default_loader, like what's currently in ImageNet without having to import folder, taken some of the dataset classes are not a subclass of ImageFolder. but if this change counts as a BC change, then we'll need to keep it still importable from the original folder module?
We could add torchvision.io.decode_image as one of the supported loader, but still keep the default loader as PIL?
This change would raise the inconsistency issue mentioned here
Alternatives
1. The return type of datasets can be PIL.Image when using the PIL or the accimage backends, and be TVTensor if using new APIs (may lose consistency).
but since the loader and transform are both determined by users, probably this wouldn't be a huge problem?

NicolasHug · 2025-02-20T17:52:18Z

Hi @GdoongMathew ,

I think since ImageNet dataset already supports loader parameter, for consistency reason we could also add kwargs to Country211 and EuroSAT datasets?

Yes good point, we should also add the loader parameter to all datasets that inherit from ImageDataset. We shouldn't need to use **kwargs for that, a new loader parameter should work.

would it be ok to move

Unfortunately, no... This would be BC-breaking :/ . I'm not sure I understand the concern though - is there any issue with having the datasets import things from folder.py?

We could add torchvision.io.decode_image as one of the supported loader, but still keep the default loader as PIL?

I think we'll need to keep PIL as the default loader, yes. But we can encourage users to rely on decode_image in the docstring.

Overall and to answer your final point, I don't think we should implement all this via a new "image backend". The image backend logic is really old and I'm not even sure we still support accimage at all. I think this is better done by simply adding a loader parameter to most datasets - this is more obvious what's going on. And like you said, when a user explicitly passes loader=decode_image, then it's perfectly fine for the dataset to return a Tensor, since this is what the user asked for.

Thanks a lot for working on this!

GdoongMathew · 2025-02-21T12:45:29Z

Well the 1st point is more about the dependency? From what I understand, most of the datasets are subclassed from vision.VisionDataset, whereas the default_loader is located in folder.py module. To me it's like moving to a higher level of the import chain. But like what you mentioned, this would cause a BC change, which is much less desirable.

GdoongMathew · 2025-02-24T15:01:08Z

Hi, @NicolasHug , I think there's another problem tho..
What if a dataset has it labeling as mask, should it use the provided loader to load the mask data? or should the dataset use another argument, something like mask_loader?

If we use the provided loader, then we probably will need to add additional argument for the default_loader and pil_loader so that dataset could read images/mask in different mode, but then it'll have to figure out what argument to pass for which loader function

Or, for the time being, we just expose loader, and not change anything about how one read its mask?

NicolasHug · 2025-02-24T15:12:06Z

Great point about loading masks - lets' just take it step by step first, maybe we can start with the classification datasets as a first step.
Let's look into the detection/segmentation datasets as a second iteration.

GdoongMathew · 2025-03-01T06:47:35Z

vadimkantorov · 2025-03-19T21:09:42Z

Might be related to:

Move image loading logic to _load_image in VOC datasets (similar to COCO) #4975
[feature request] Allow passing in image reading/loading function to old-style datasets constructors #4991

GdoongMathew mentioned this issue Feb 20, 2025

feat: expose loader argument in Country211 and EuroSAT. #8922

Merged

NicolasHug closed this as completed in #8922 Feb 24, 2025

NicolasHug reopened this Feb 24, 2025

This was referenced Feb 25, 2025

feats: add loader in classification task datasets. #8939

Merged

feat: add loader to Omniglot and INaturalist's argument. #8945

Open

GdoongMathew mentioned this issue Mar 15, 2025

feat: expose loader parameter in FlowDataset type, except `Flying… #8972

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Datasets Should Use New `torchvision.io` Image Loader APIs and Return `TVTensor` Images by Default #8762

[Feature Request] Datasets Should Use New `torchvision.io` Image Loader APIs and Return `TVTensor` Images by Default #8762

fang-d commented Nov 28, 2024 •

edited

Loading

NicolasHug commented Dec 9, 2024

GdoongMathew commented Feb 5, 2025 •

edited

Loading

GdoongMathew commented Feb 8, 2025 •

edited

Loading

Alternatives

NicolasHug commented Feb 20, 2025

GdoongMathew commented Feb 21, 2025 •

edited

Loading

GdoongMathew commented Feb 24, 2025

NicolasHug commented Feb 24, 2025

GdoongMathew commented Mar 1, 2025

vadimkantorov commented Mar 19, 2025 •

edited

Loading

[Feature Request] Datasets Should Use New torchvision.io Image Loader APIs and Return TVTensor Images by Default #8762

[Feature Request] Datasets Should Use New torchvision.io Image Loader APIs and Return TVTensor Images by Default #8762

Comments

fang-d commented Nov 28, 2024 • edited Loading

🚀 The feature

Motivation, pitch

Alternatives

Additional context

NicolasHug commented Dec 9, 2024

GdoongMathew commented Feb 5, 2025 • edited Loading

GdoongMathew commented Feb 8, 2025 • edited Loading

Alternatives

NicolasHug commented Feb 20, 2025

GdoongMathew commented Feb 21, 2025 • edited Loading

GdoongMathew commented Feb 24, 2025

NicolasHug commented Feb 24, 2025

GdoongMathew commented Mar 1, 2025

vadimkantorov commented Mar 19, 2025 • edited Loading

[Feature Request] Datasets Should Use New `torchvision.io` Image Loader APIs and Return `TVTensor` Images by Default #8762

[Feature Request] Datasets Should Use New `torchvision.io` Image Loader APIs and Return `TVTensor` Images by Default #8762

fang-d commented Nov 28, 2024 •

edited

Loading

GdoongMathew commented Feb 5, 2025 •

edited

Loading

GdoongMathew commented Feb 8, 2025 •

edited

Loading

GdoongMathew commented Feb 21, 2025 •

edited

Loading

vadimkantorov commented Mar 19, 2025 •

edited

Loading