-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Behavior of reductions on empty arrays should be specified #232
Comments
The note on integers is correct, but gets the logic backwards IMO: The reason for the float promotion is that this is a reduce-like operation (not a proper reduction) which includes a true-division in the definition. E.g.
Since One problem is that These reduce-likes cannot have an I believe there was an argument/request for the |
Assuming all the reduction functions being referred to here are in the "statistical functions" section of the spec, I don't have any tests for them written yet in the test suite, beyond the basic signature tests that apply to every function. I agree that the reductions should either explicitly state what they do to empty arrays or else indicate that they are only specified for nonempty arrays. Right now, it is underspecified. I would also note that the statistical functions don't mention the allowed data types of their inputs (presumably they should all be either "numerical" or "floating-point"). Also I noticed that NumPy doesn't define |
Functions like mean() that perform divisions require floating-point inputs. All other functions require numeric inputs. This was mentioned in data-apis#232 but does not fix the main question of that issue (the behavior of the statistical functions on empty arrays). I can implement that as well if we come to an agreement about what it should say.
I added input dtype restrictions at #234. Please see if they make sense. If we agree on behavior for empty inputs, I can add that to that PR as well. |
On identities and reductions: Something that we came across in PyTorch when looking into these things is how annoying this is for
After some discussion, we opted for not supporting reductions along zero dimensions for these operations. The whole discussion starts in this comment: pytorch/pytorch#52565 (comment) |
I agree with your explanation for why there's the promote-to-float, but that doesn't mean it's backwards. Design should go in this order I think:
That's a pretty ad-hoc reason. I'd rather have a more consistent set of rules. Part of the problem here seems to be with how if has_even_length(input):
return (left_median + right_median) / 2 # float dtype
else:
return median_val So the output dtype depends on the shape of the input - that seems like a code smell. Although in NumPy the docs say:
Which is nicer - but the price we pay for the midpoint rule is that the definition for odd length is unexpected - Either of those options seem okay, still consistent with the initial design rules I posted above. I guess that should be extended with a higher-level design rule that we've stated elsewhere: output dtype must be predictable from input dtypes. |
Just to be clear, |
Is it at least not controversial that empty |
Agreed. |
Following up on pytorch/pytorch#61901 (comment). A design rule that NumPy, PyTorch and JAX seem to follow is:
Return identity if reduction has one, otherwise return nan for reductions that promote integers to floating point dtypes and raise an error for all other.
This is not specified anywhere, and I can't find a previous discussion on it. For element-wise functions there's a "Special Cases" section, but for reductions there isn't. @asmeurer does the test suite cover empty array input to functions at all?
The text was updated successfully, but these errors were encountered: