Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specification for asarray #130

Merged
merged 3 commits into from
Feb 21, 2021
Merged

Conversation

rgommers
Copy link
Member

@rgommers rgommers commented Feb 17, 2021

Closes gh-122

See gh-122 for a discussion on the order= keyword. The majority opinion there is not to add it, but it's not 100% clear cut.

Also, I did add memoryview as a supported input type, which wasn't considered in the discussion in gh-122. (EDIT: removed again, captured by SupportsBufferProtocol).

@@ -76,7 +76,7 @@ Convert the input to an array.

- **copy**: _bool_

- Whether or not to make a copy of the input. If `True`, always copies. If `False`, reuses existing memory buffer if possible. Default: `False`.
- Whether or not to make a copy of the input. If `True`, always copies. If `False`, never copies. If `None`, reuses existing memory buffer if possible. Default: `None`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is probably a good idea to specify the specific exception (i.e., ValueError) that is raised when an otherwise valid input would require a copy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've tried to avoid that in other places, but this particular may be so often needed when writing library code with copy=False that it makes sense to add.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added: "Must raise ValueError if copying of inputs which support DLPack or the buffer protocol is necessary and copy=False."

Data in sequence input is still copied of course.

@rgommers
Copy link
Member Author

One point made for why order= is useful to include (by @kkraus14): a common use case is to create 2-D arrays with order='F' and then turn them into a dataframe - Fortran ordering will ensure columns are contiguous; otherwise copies may need to be made.

order= is always only ever a performance optimization, it does not change interpretation of data like nested lists.

It doesn't make sense to add order= only to asarray though, it'd need to be added to all array creation functions (zeros, ones`, etc.) - and we had previously decided against it.

@shoyer
Copy link
Contributor

shoyer commented Feb 19, 2021 via email

@rgommers
Copy link
Member Author

My personal opinion is that if you need those sorts of optimizations, you should use library specific APIs, perhaps with generic APIs as a fallback. Whether this makes sense depends on lots of details, e.g., the particular way that the dataframe library manages memory.

True indeed.

We don't have order= in other array creation functions - it can be considered separately for all of them, but shouldn't be a blocker for getting asarray in. I'll merge this. Thanks for the review @shoyer

@rgommers rgommers merged commit e1870ff into data-apis:main Feb 21, 2021
@rgommers rgommers deleted the add-asarray branch February 21, 2021 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explicit array conversion (e.g., array(), asarray())
3 participants