-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API for device support #96
Conversation
The summary of feedback received on this PR so far was:
|
PR updated. And asked about the link to the |
I will note that it's kind of odd to have a |
What is the behavior in symbolic execution model where devices may be set post hoc and may even change across multiple calls to the same function? Also, is there some notion of symbolic devices ? As a concrete example, does |
If the default of
It may be good to mention explicitly that symbolic execution is a thing and that the current design must allow for it. Other than that, I think all the current version of this PR says is that a library will have a device object and that it must implement The text says "to a specific physical or logical device". That may mean different things to different people, "logical" and "symbolic" sound similar to me; happy to add "symbolic" explicitly though. |
That co-location is current a recommendation, not a hard requirement, in the "semantics" section of this PR. I'd expect also a symbolic execution model to do co-location and raise otherwise, but I'm not sure if that's guaranteed or not (TFs placement policies are a little tricky to wrap my head around just from the docs). |
#### Returns | ||
|
||
- **out**: _<array>_ | ||
|
||
- an array containing uninitialized data. | ||
|
||
(function-empty_like)= | ||
### empty_like(x, /, *, dtype=None) | ||
### empty_like(x, /, *, dtype=None, device=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me of an interest inquiry we had a while ago: Should the _like*()
functions honor the device where the input array x
is on? (cupy/cupy#3457)
Now I look back, it seems also plausible if the output array is on the same device as x
, but my rejection still holds: it is incompatible with any sane device management approaches (context manager, explicit function call (such as use_device(N)
), etc). I suppose having the newly added argument device
will encounter the same challenge. The most radical example is x
on device 1, the argument device
is set to 2, but the default/current device is 0.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but my rejection still holds: it is incompatible with any sane device management approaches
I agree with the argument you made on that issue.
I suppose having the newly added argument
device
will encounter the same challenge. The most radical example isx
on device 1, the argumentdevice
is set to 2, but the default/current device is 0.
I don't see the conflict here. If a library has multiple ways of controlling device placement, the most explicit method should have the highest priority. So:
- If
device=
keyword is specified, that always takes precedence - If
device=None
, then use the setting from a context manager, if set. - If no context manager was used, then use the global default device/strategy
Your example seems very similar to the first example of https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics, which I think explains desired behaviour here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty_like
description seems clear enough: the _like
is about shape
and dtype
only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @rgommers I fully agree with everything you said above, but I noticed in the "Semantics" section that you're adding in this PR, the wording is a bit different and (arguably) less clear. Should we incorporate your above reply there? And it'd be nice to add a variant of the "_like
is about shape
and dtype
only" emphasis to all the docstrings where device
is an optional argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good ideas, done.
I searched the TensorFlow more thoroughly, and there's no such thing as a "symbolic device" there. I'm pretty sure it's LogicalDevice instead, so the current text in this PR seems clear to me. |
Assumes a bit less about implementation. A string like `'cpu'` should meet the requirements, and it doesn't have `__neq__`.
No more comments in the last couple of weeks, so I'll go ahead and merge this PR so it's available in the published html docs. If there are any more comments, please add them here or open a new issues. |
LGTM. Agreed if there's any concern we can revisit 🙂 |
Note that with symbolic execution, there is a "tracing" phase and an execution phase. If A side effect is that there will be different behavior in Eager vs symbolic execution. For Eager, |
Just checking, are you saying that that won't be done now, if you simply add the It seems desirable to just always put those extra directives in I'd think, and not let eager/symbolic behavior diverge? |
We will likely make this work best-effort in TensorFlow when we try to adhere to these new APIs. Given that this device placement seems like a recommendation instead of a requirement, it should be ok I think. This is a subtle implementation issue for frameworks implementing symbolic execution. |
Based on the discussion in gh-39.