-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Construct 1d array from listlike #18769
Construct 1d array from listlike #18769
Conversation
@jreback Any hint about why all 2.7 fail with " |
e72437c
to
dcf0136
Compare
Codecov Report
@@ Coverage Diff @@
## master #18769 +/- ##
==========================================
- Coverage 91.64% 91.59% -0.05%
==========================================
Files 154 154
Lines 51401 51430 +29
==========================================
+ Hits 47106 47108 +2
- Misses 4295 4322 +27
Continue to review full report at Codecov.
|
pandas/core/common.py
Outdated
# we have a list-of-list | ||
result[:] = [tuple(x) for x in values] | ||
if hasattr(values, '__array__'): | ||
values = [tuple(x) for x in values] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this one (can you add some comments?)
The original comment said "we have a list-of-list", which is actually not possible because it was already checked before if it was a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that comment was wrong, and confusing. Will add a comment.
Answering myself... it was a circular dependency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some tests for construct_1d_object_array_from_listlike in test_cast.py (where similar things are tested).
asv_bench/benchmarks/ctors.py
Outdated
@@ -33,3 +34,21 @@ def time_dtindex_from_series(self): | |||
|
|||
def time_dtindex_from_index_with_series(self): | |||
Index(self.s) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use params
for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lots of examples in the other asvs
pandas/core/dtypes/cast.py
Outdated
|
||
Returns | ||
------- | ||
1-dimensional numpy array of dtype "dtype" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this to
construct_1d_object_array_from_listlike
and drop the dtype (always make it object
).
you can also just add another function for this as well (e.g. have 1 that accepts the dtype as a required parameter ). I also don't think we are actually using dtype anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The array we return is of dtype dtype
. So this will raise an error if e.g. dtype=int
and the input contains lists, but this is good because it allows cleaner code (and less checks) in those methods where the actual dtype is unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are not using this, so its very confusing. I would just rather return an object
dtyped array here (or have 2 methods).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are not using this, so its very confusing
Non sequitur.
have 2 methods
... with almost perfect duplication of code, one of which perfectly generalizes the other, and without any performance gain?! I thought you liked to reduce code :-)
This method builds a 1d array of type dtype
with given data. If it is impossible, it raises an error. This is pretty straightforward, and whenever np.array
will have an ndmax
parameter, all calls to this method will be replaced with a simple call to np.array(..., ndmax=1)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I will explain again.
We are not using the dtype argument. So just eliminate it. I don't see utility in having it.
dcf0136
to
c88f05f
Compare
Hello @toobaz! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on December 19, 2017 at 13:39 Hours UTC |
c88f05f
to
c7fbfb0
Compare
asv_bench/benchmarks/ctors.py
Outdated
@@ -33,3 +34,21 @@ def time_dtindex_from_series(self): | |||
|
|||
def time_dtindex_from_index_with_series(self): | |||
Index(self.s) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lots of examples in the other asvs
pandas/core/dtypes/cast.py
Outdated
|
||
Returns | ||
------- | ||
1-dimensional numpy array of dtype "dtype" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are not using this, so its very confusing. I would just rather return an object
dtyped array here (or have 2 methods).
pandas/tests/dtypes/test_cast.py
Outdated
try: | ||
problem = (dtype is not object and | ||
np.array(data, dtype=dtype).ndim != 1) and ValueError | ||
except (ValueError, TypeError) as exc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make 2 tests, 1 of which all succeed, 1 of which tests the ones that error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can, but then I wouldn't parametrize (that would be a nightmare, since the cartesian product of parameters cannot be cleanly split). Let me try to add some comments to make it more clear instead.
61ebefb
to
159427a
Compare
asv_bench/benchmarks/ctors.py
Outdated
[False, True]] | ||
|
||
def setup(self, data_fmt, with_index): | ||
N = 10**2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this 10**4, this is too small
asv_bench/benchmarks/ctors.py
Outdated
param_names = ["data_fmt", "with_index"] | ||
params = [[lambda x: x, | ||
list, | ||
lambda arr: dict(zip(range(len(arr)), arr)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add something with tuples here
add strings
|
||
|
||
class FromNDArray(object): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there already list-like constructor asv's here? and from dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no idea (unrelated to this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the answer is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no idea, as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well since you are adding this one, should be very easy to .parametrize and add similar to the Series case, at least a few simple ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, some simple ones are present in frame_ctor.py
, above the code I moved from ctor.py
. I can open an issue to remind adding more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except ValueError: | ||
# we have a list-of-list | ||
result[:] = [tuple(x) for x in values] | ||
# Avoid building an array of arrays: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there an asv that hits this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't even think there is any valid code path that hits this case... which indeed should be suppressed in #18626
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there is not valid code, then let's remove it. or make a new issue. inside a PR doesn't help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number here with a TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done)
pandas/core/dtypes/cast.py
Outdated
|
||
Returns | ||
------- | ||
1-dimensional numpy array of dtype "dtype" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I will explain again.
We are not using the dtype argument. So just eliminate it. I don't see utility in having it.
pandas/tests/dtypes/test_cast.py
Outdated
def test_cast_1d_array(self, dtype, datum1, datum2): | ||
data = [datum1, datum2] | ||
try: | ||
# Conversion to 1d array is possible if requested dtype is object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please separate into 2 tests. This is so confusing as to what cases are what. If you need to skip a few tests on specific cases in the 'good' case, that is ok, but you need to clearly have a separate tests for the error cases. catching errors IN a test is a big no-no. it is impossible to tell what should succeed here.
Yeah, but this is hardly an argument - you're not even seeing that the change you are firmly pretending in the code is in contrast with the change you are firmly pretending in the test. But I guess I'll have to explain you in some other PR, now I don't have enough time. An updated version of this one is on its way. |
905046a
to
8848344
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments.
list, | ||
lambda arr: list(arr.astype(str)), | ||
lambda arr: dict(zip(range(len(arr)), arr)), | ||
lambda arr: [(i, -i) for i in arr], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some args are duplicated here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, was reading the ( as [
|
||
|
||
class FromNDArray(object): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the answer is?
except ValueError: | ||
# we have a list-of-list | ||
result[:] = [tuple(x) for x in values] | ||
# Avoid building an array of arrays: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there is not valid code, then let's remove it. or make a new issue. inside a PR doesn't help.
|
||
@pytest.mark.parametrize('datum1', [1, 2., "3", (4, 5), [6, 7], None]) | ||
@pytest.mark.parametrize('datum2', [8, 9., "10", (11, 12), [13, 14], None]) | ||
def test_cast_1d_array(self, datum1, datum2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there fail cases? iow where this routine raises?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not once you remove support for non-object
dtypes (which is what you just asked me to do).
(well, as long as data
has a len()
and is iterable...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right so can you add a test that fails for scalars then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done)
pandas/tests/dtypes/test_cast.py
Outdated
def test_cast_1d_array(self, datum1, datum2): | ||
data = [datum1, datum2] | ||
result = construct_1d_object_array_from_listlike(data) | ||
# Direct comparison fails: https://github.com/numpy/numpy/issues/10218 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line
list, | ||
lambda arr: list(arr.astype(str)), | ||
lambda arr: dict(zip(range(len(arr)), arr)), | ||
lambda arr: [(i, -i) for i in arr], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, was reading the ( as [
|
||
|
||
class FromNDArray(object): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well since you are adding this one, should be very easy to .parametrize and add similar to the Series case, at least a few simple ones.
except ValueError: | ||
# we have a list-of-list | ||
result[:] = [tuple(x) for x in values] | ||
# Avoid building an array of arrays: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number here with a TODO
|
||
@pytest.mark.parametrize('datum1', [1, 2., "3", (4, 5), [6, 7], None]) | ||
@pytest.mark.parametrize('datum2', [8, 9., "10", (11, 12), [13, 14], None]) | ||
def test_cast_1d_array(self, datum1, datum2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right so can you add a test that fails for scalars then
Returns | ||
------- | ||
1-dimensional numpy array of dtype object | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would not object to an
assert is_iterable(values)
with a nice error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is: I can't think of any code path which could be hitting it. Scalar input to a Series()
is (considered valid and) recasted to a 1-d before calling this. Similarly, an operation such as Series([1,2]) + 3
transforms 3 before hitting this. So I don't know what the error message could actually say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not what i am asking
this is a completely internal
routine
it should fail with invalid input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should fail with invalid input
Sure it does, TypeError: object of type 'int' has no len()
. Which is pretty clear, considering the docstring, and precisely in light of the fact that this is an internal routine. That said, feel free to suggest an error message which is worth the cost of the additional assert
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a raises section to the doc-string
8848344
to
e5b3cd5
Compare
Codecov Report
@@ Coverage Diff @@
## master #18769 +/- ##
=========================================
Coverage ? 91.62%
=========================================
Files ? 154
Lines ? 51401
Branches ? 0
=========================================
Hits ? 47097
Misses ? 4304
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments. ping on green.
Returns | ||
------- | ||
1-dimensional numpy array of dtype object | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a raises section to the doc-string
e5b3cd5
to
d0a6e48
Compare
this will fail because of a conda issue, that I think is now fixed in master. I will merge after this is resolved (no need to repush). |
thanks! |
git diff upstream/master -u -- "*.py" | flake8 --diff
#18626 isn't converging, will need more work and I won't have much time soon, so I took away the asv and refactoring that should be straighforward to merge.
asv says
... but they are borderline, and don't always come up.