You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a memory leak in pandas.read_msgpack when reading from a string. Calling pandas.read_msgpack(str_data) increases the ref count of str_data if and only if read_msgpack sees the content of str_data for the first time. This implies that there is a memory leak, but only when reading different files -- when reading the same file over and over again str_data will only leak once.
The problem does not exist when reading from file handles or BytesIO.
Output of above example
The output clearly shows the effect of the memory leak when loading data frame partitions sequentially:
[Memory usage] At initialization 39.4 MB
[Memory usage] After data generation 39.9 MB
[Memory usage] After partition 1 185.9 MB
[Memory usage] After partition 2 329.8 MB
[Memory usage] After partition 3 473.7 MB
[Memory usage] After partition 4 617.6 MB
[Memory usage] After partition 5 761.5 MB
[Memory usage] After partition 6 905.4 MB
[Memory usage] After partition 7 1049.3 MB
[Memory usage] After partition 8 1193.2 MB
[Memory usage] After partition 9 1337.1 MB
[Memory usage] After partition 10 1481.0 MB
[Memory usage] After partition 11 1624.9 MB
[Memory usage] After partition 12 1768.8 MB
[Memory usage] After partition 13 1912.7 MB
[Memory usage] After partition 14 2056.6 MB
[Memory usage] After partition 15 2200.4 MB
[Memory usage] After partition 16 2344.3 MB
[Memory usage] After partition 17 2488.2 MB
[Memory usage] After partition 18 2631.7 MB
[Memory usage] After partition 19 2775.6 MB
[Memory usage] After partition 20 2919.5 MB
jreback
changed the title
Memory leak in pandas.read_msgpack when reading from string
Memory leak in pandas.read_msgpack when reading from string
Jun 9, 2017
Code Sample (copy-pastable)
Problem description
There is a memory leak in
pandas.read_msgpack
when reading from a string. Callingpandas.read_msgpack(str_data)
increases the ref count ofstr_data
if and only ifread_msgpack
sees the content ofstr_data
for the first time. This implies that there is a memory leak, but only when reading different files -- when reading the same file over and over againstr_data
will only leak once.The problem does not exist when reading from file handles or
BytesIO
.Output of above example
The output clearly shows the effect of the memory leak when loading data frame partitions sequentially:
Output of
pd.show_versions()
pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 36.0.1
Cython: None
numpy: 1.13.0
scipy: None
xarray: None
IPython: 5.4.1
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: