Skip to content

BUG: stacked bar graphs show invalid label position due to invalid rectangle bottom when data is 0 #59429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
KinzigFlyer opened this issue Aug 6, 2024 · 11 comments · Fixed by #60211
Closed
3 tasks done
Labels
Milestone

Comments

@KinzigFlyer
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import matplotlib.pyplot as plt
permutations = [(a,b,c) for a in range(2) for b in range(2) for c in range(3)]
data = [
    {'i': i, 'a':a, 'b':b, 'c':c, 't': a+b+c}
    for i, (a,b,c) in enumerate(permutations)
]
df = pd.DataFrame.from_dict(data)
ax = df[['a','b', 'c']].plot.bar(stacked=True)
bl = ax.bar_label(ax.containers[-1], df['t'])
plt.show()

Issue Description

if the top part of the stacked plot has data value 0, the bar-label does not appear on top, but at the bottom of the bar.
grafik

Further debugging shows that all bars with data = 0 have their y position set to 0.0. They should have the top of the bar below as their bottom = y.

Expected Behavior

Bar-Labels should be positioned on top for all stacks.

grafik

this behaviour can be produced by correcting the y positions of the defective bars

def correct_stack(container, info=False):
    """ correct the y positions of stacked bars with 0 height

    This is needed because the y position is calculated wrongly when data value is 0 on stacked bars created by Pandas plot.bar.
    """
    # Attention, since we start at row 1, r shows to the row below - which we need
    for r, row in enumerate(container[1:]):
        for b, bar in enumerate(row):
            (my_x, my_y), my_height = bar.xy, bar.get_height()
            # note that r show to the bar below the current bar, and c is the stack
            support = container[r][b]    # this is the bar we are resting on top of
            (s_x, s_y), s_height = support.xy, support.get_height()
            if info:
                print(f"bar at row: {r+1}, col: {b}: ({my_x}, {my_y}) - {my_height} resting on top of ({s_x, s_y}) - {s_height}")
            if my_y < s_y + s_height:
                print(f"bar at row: {r+1}, col: {b}: {my_y = } is lower than expected {s_y + s_height}")
                bar.xy = (my_x, s_y + s_height)

ax2 = df[['a','b','c']].plot.bar(stacked=True)
correct_stack(ax2.containers)
bl = ax2.bar_label(ax2.containers[-1], df['t'])
plt.show()

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.11.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22631
machine : AMD64
processor : Intel64 Family 6 Model 186 Stepping 2, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : de_DE.cp1252

pandas : 2.2.2
numpy : 2.0.1
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 65.5.0
pip : 24.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.26.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.9.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.14.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None

@KinzigFlyer KinzigFlyer added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 6, 2024
@KevsterAmp
Copy link
Contributor

take

@KinzigFlyer
Copy link
Author

Thanks @KevsterAmp for looking into this

@KevsterAmp
Copy link
Contributor

All good, just waiting for this issue to be triaged by a maintainer before working on it.

@KevsterAmp
Copy link
Contributor

@KinzigFlyer - seems like this is a problem on matplotlib not on the pandas itself

@KinzigFlyer
Copy link
Author

I don't think so,
Matplotlib does not provide stacking on it's own. You create a stacked bar by providing "bottom" parameter to the bars.
So I think bottom is calculated inside Pandas.
see this page in the official matplotlib documentation:
Stacked Bar charts

@KevsterAmp
Copy link
Contributor

Good point, thank you

@KinzigFlyer
Copy link
Author

I took the official programm, changed one of the Above values to 0 and added the bar-label. Works correctly.

import matplotlib.pyplot as plt
import numpy as np

# data from https://allisonhorst.github.io/palmerpenguins/

species = (
    "Adelie\n $\\mu=$3700.66g",
    "Chinstrap\n $\\mu=$3733.09g",
    "Gentoo\n $\\mu=5076.02g$",
)
weight_counts = {
    "Below": np.array([70, 31, 58]),
    "Above": np.array([82, 0, 66]),
}
width = 0.5

fig, ax = plt.subplots()
bottom = np.zeros(3)

for boolean, weight_count in weight_counts.items():
    p = ax.bar(species, weight_count, width, label=boolean, bottom=bottom)
    bottom += weight_count

ax.bar_label(p, weight_counts['Above'])
ax.set_title("Number of penguins with above average body mass")
ax.legend(loc="upper right")

plt.show()
image

@KinzigFlyer
Copy link
Author

Converting the official example to a Pandas driven version shows the error:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# data from https://allisonhorst.github.io/palmerpenguins/
penguins = pd.DataFrame.from_dict([
    {'species': "Adelie", 'Below': 70.0, 'Above': 82.0},
    {'species': "Chinstrap", 'Below': 31.0, 'Above': 0.0},
    {'species': "Gentoo", 'Below': 58.0, 'Above': 66.0},
]).set_index('species')
width = 0.5

ax2 = penguins.plot.bar(stacked = True)

ax2.bar_label(ax2.containers[-1], penguins['Above'])
ax2.set_title("Number of penguins with above average body mass")
ax2.legend(loc="upper right")

plt.show()
image

@rhshadrach rhshadrach added Visualization plotting and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 7, 2024
@rhshadrach
Copy link
Member

Thanks for the report - PRs to fix are welcome!

@KevsterAmp KevsterAmp removed their assignment Sep 26, 2024
@tev-dixon
Copy link
Contributor

I will take this.

@tev-dixon
Copy link
Contributor

Thank you @KinzigFlyer for the in depth write up. I have made a pull request (#60211) with changes that should fix the bug along with a regression test.

Pictured below is the given Reproducible Example when run on the current main branch of pandas:
plot_b1_example

Here is the given Reproducible Example when run on the fix:
fixed_plot_b1_example
(Notice that the errant number labels are moved to the tops of the bars.)

Here is the regression test case when run on the current main branch of pandas:
test_case

Here is the regression test case when run on the fix:
test_case_fixed
(Notice that the leftmost column's orange bar slice of zero height now has the label '3' which overlaps with the blue label '3', making it appear bold. Also notice the middle column's '0' label becomes less bold, and the '2' label becomes more bold for the same reasons.)

As you can see, the pull request fixes the bug to match the expected behavior. It should be noted that there is some ambiguity to the expected behavior. Should bar slices of zero height still display labels? This behavior causes labels to overlap which makes them appear bolded as stated before. I do not know the intended behavior here, but am willing to make changes to my pull request to match it if a maintainer suggests one way or the other. My current pull request takes the path of least impact, and display labels even for bar slices of zero height.

@rhshadrach rhshadrach added this to the 3.0 milestone Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants