-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas 0.22.0 does not raise KeyError for misspelled column with .drop_duplicates() #19726
Comments
Thanks for the report. The root cause looks like it's in Interested in submitting a fix? |
Whoever fixes this may be able to knock out #12869 at the same time. |
@TomAugspurger I might have a crack, see if someone beats me to it. Had a quick look with PyCharm, have not managed to quickly locate that variable / class, it must be in there somewhere.. |
Great! I think the offending lines are Lines 3658 to 3660 in 2fdf1e2
Making some bad assumptions about |
I've got a working fix. Mind if I submit the pull request? Or did you want to take a shot at it, @aktivkohle? |
@NoahTheDuke Yes do it, thanks for asking.. I'm sure I'll get my chance one day, and would probably take hours to work it out on the weekend which is not meant for that kind of thing. Will have a look at how you did it of course.. |
So I have tested two versions of Pandas parallel to each other with exactly the same code. 0.19.2 behaves more as expected, but 0.22.0 does what I am about to describe. Will probably switch to 0.19.2 for now. Am using Python 3.6.4
Problem description
I became aware of the problem working with a much larger dataframe when it failed to warn me or raise a
KeyError
when I misspelled a column name.Expected Output
Pandas 0.19.2 gives you the following and but Pandas 22 gives you no
KeyError
for the first print statement it just runs.Output of
pd.show_versions()
Below is the output for the Pandas version for where the problem is.
INSTALLED VERSIONS
commit: None
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.4.0
Cython: None
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: