-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Torchvision transforms do not scale but the docs say that they do #8984
Comments
I will look into the reproducible code that you have shared (maybe tomorrow) but I'm sure we do scale the inputs. vision/torchvision/transforms/_presets.py Line 62 in 8ea4772
F.convert_image_dtype handles the scaling. https://pytorch.org/vision/main/generated/torchvision.transforms.functional.convert_image_dtype.html I will go through your shared code tom, it's pretty late rn. |
@abhi-glitchhg Thanks for looking at this! In the code you cite, I do not understand why are we scaling the values to 0..1 based on dtypes. If the input is integer, I see a normalization based on the maximum value of the dtype. Is this the intended behavior? The docs say
I took this to mean that if I input my image with values 0 - 255 in any dtype, they would be rescaled to Apologies if I misunderstood the doc. If that is the case, I recommend mentioning this in the documentation to clarify |
Correct.
Correct. generally the images are of type
No, it depends on the dtype of the input. if the input image is of dtype import torchvision.transforms.functional as F
int16_input = torch.randint(0,255,(1,3,224,224)).to(dtype=torch.uint16)
int8_input = int16_input.to(dtype=torch.uint8)
int16_output= F.convert_image_dtype(int16_input)
int8_output= F.convert_image_dtype(int8_input)
print(int16_output)
print("#")20
print(int8_output) But if you are providing a float dtype with 0-255. values then no operation is done.
As i mentioned above, there are many types of integers, int8, uint8, uint16 and each one has its different maximum value possible.
This is dangerous, as i mentioned there are different types of integer datatypes and assuming the data to be between 0-255 (or any range) is not possible. =============================================== Coming back to the example code you have shared,
I think the problem is here, you are dividing the input by norm which converts the numpy array to dtype of float irrespective of your typecasting (astype) operation. Maybe thats why there was no scaling done by I hope i have addressed the issue, if anything is unclear, feel free to comment. |
im assuming you are having input that is in between 0-255;make sure that the dtype of input is We do have this documented as well. |
@abhi-glitchhg Thanks for the clarification! |
🐛 Describe the bug
The docs for torchvision alexnet mention that the transforms rescale the values to 0...1 before applying the mean and std scaling. However, this is not the case. Looking at the source code, in
transforms.__presets.py
, theImageClassification
class describes itself as doing this rescaling but I do not see the corresponding code. Presumably the Alexnet transform inherits this directly. Apologies if I am missing something here!Here is a reproducible code:
Versions
torchvision: 0.21.0
torch: 2.6.0
numpy: 2.2.2
The text was updated successfully, but these errors were encountered: