BUG: Replacing NaNs with a Series object based on boolean indexing does not replace NaNs #39717

shivendra90 · 2021-02-10T08:39:58Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

diamond_data.isna().sum()

carat        0
cut          0
color        0
clarity      0
depth      691
table        0
x            0
y            0
z            0
price        0
dtype: int64

# Generate random numbers for filling nans

ideal_data = pd.Series(np.random.randint(low=float(diamond_data[diamond_data.cut == "Ideal"]["depth"].min()),
                                         high=float(diamond_data[diamond_data.cut == "Ideal"]["depth"].max()),
                                         size=270)).astype("float")

prem_data = pd.Series(np.random.randint(low=float(diamond_data[diamond_data.cut == "Premium"]["depth"].min()),
                                        high=float(diamond_data[diamond_data.cut == "Premium"]["depth"].max()),
                                        size=192)).astype("float")

vgood_data = pd.Series(np.random.randint(low=float(diamond_data[diamond_data.cut == "Very Good"]["depth"].min()),
                                        high=float(diamond_data[diamond_data.cut == "Very Good"]["depth"].max()),
                                        size=152)).astype("float")

good_data = pd.Series(np.random.randint(low=float(diamond_data[diamond_data.cut == "Good"]["depth"].min()),
                                        high=float(diamond_data[diamond_data.cut == "Good"]["depth"].max()),
                                        size=59)).astype("float")

fair_data = pd.Series(np.random.randint(low=float(diamond_data[diamond_data.cut == "Fair"]["depth"].min()),
                                        high=float(diamond_data[diamond_data.cut == "Fair"]["depth"].max()),
                                        size=24)).astype("float")

# Fill it up
diamond_data.loc[(diamond_data.cut == "Ideal") & (diamond_data.depth.isna()), "depth"] = ideal_data
diamond_data.loc[(diamond_data.cut == "Premium") & (diamond_data.depth.isna()), "depth"] = prem_data
diamond_data.loc[(diamond_data.cut == "Very Good") & (diamond_data.depth.isna()), "depth"] = vgood_data
diamond_data.loc[(diamond_data.cut == "Good") & (diamond_data.depth.isna()), "depth"] = good_data
diamond_data.loc[(diamond_data.cut == "Fair") & (diamond_data.depth.isna()), "depth"] = fair_data

Problem description

When trying to replace some of the nan values that are spread out randomly on the dataset I'm working on, as shown above through .loc indexing, I should get all the values in thedepth column filled up. Instead, nothing is filled up and running isna().sum() shows presence of same missing values.

Expected Output

Expected output should be 0 na values for the column depth.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 9d598a5
python : 3.7.9.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-65-generic
Version : #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.2.1
numpy : 1.19.2
pytz : 2021.1
dateutil : 2.8.1
pip : 20.3.3
setuptools : 52.0.0.post20210125
Cython : None
pytest : 6.2.2
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.20.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.2
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.0
sqlalchemy : None
tables : None
tabulate : 0.8.7
xarray : None
xlrd : 2.0.1
xlwt : None
numba : 0.51.2

The text was updated successfully, but these errors were encountered:

attack68 · 2021-02-10T11:44:25Z

You need to provide a minimalist example, preferably code-only, that demonstrates the problem.

As it stands there could be a number of problems, some candidates not even pandas related, so it is difficult to debug and for developers to fix..

attack68 · 2021-02-10T11:55:00Z

Here is an example:

df = pd.DataFrame([[1,2], [2,np.nan], [3, np.nan]], columns=['A', 'B'])
s = pd.Series(np.random.randint(3,7, size=2)).astype(float)
df.loc[df['B'].isna(), 'B'] = s
print(df)

It has only filled in one value, preseumably because the index in s has been matched with the index in df.

You have two solutions:

Either use the ndarray version of the series (so ignore the index)

df.loc[df['B'].isna(), 'B'] = s.values

Generate ndarray directly (which excludes index) and apply to column

s = np.random.randint(3,7, size=2)
df.loc[df['B'].isna(), 'B'] = s

I'm fairly sure this is working as intended, i.e. giving the power of index matching where it is useful.

shivendra90 · 2021-02-10T13:34:26Z

@attack68 Thanks, using .values actually worked.

shivendra90 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 10, 2021

attack68 added Usage Question Indexing Related to indexing on series/frames, not to indexes themselves and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 10, 2021

shivendra90 closed this as completed Feb 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Replacing NaNs with a Series object based on boolean indexing does not replace NaNs #39717

BUG: Replacing NaNs with a Series object based on boolean indexing does not replace NaNs #39717

shivendra90 commented Feb 10, 2021

INSTALLED VERSIONS

attack68 commented Feb 10, 2021

attack68 commented Feb 10, 2021

shivendra90 commented Feb 10, 2021

BUG: Replacing NaNs with a Series object based on boolean indexing does not replace NaNs #39717

BUG: Replacing NaNs with a Series object based on boolean indexing does not replace NaNs #39717

Comments

shivendra90 commented Feb 10, 2021

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

attack68 commented Feb 10, 2021

attack68 commented Feb 10, 2021

shivendra90 commented Feb 10, 2021

Output of `pd.show_versions()`