Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-126491: Lower heap size limit with faster marking #127519

Merged
merged 24 commits into from
Dec 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
3038a78
Faster marking of reachable objects
markshannon Nov 9, 2024
c024484
Handle more classes in fast marking
markshannon Nov 10, 2024
e8497ae
Add support for asyn generators on fast path. Simplify counting
markshannon Nov 11, 2024
4c1a6bc
Check stackref before converting to PyObject *
markshannon Nov 11, 2024
6efb4c0
Rename stuff
markshannon Nov 13, 2024
b1c7ab0
Remove expand_region_transitively_reachable and use move_all_transiti…
markshannon Nov 13, 2024
07f228b
Merge branch 'main' into faster-marking
markshannon Dec 2, 2024
51ff78e
Fix compiler warnings and linkage
markshannon Dec 2, 2024
df907b5
Fix another linkage issue
markshannon Dec 2, 2024
9ca64f5
Try 'extern'
markshannon Dec 2, 2024
bda13f4
Go back to PyAPI_FUNC and move functions together
markshannon Dec 2, 2024
d9d63c8
Use _Py_FALLTHROUGH
markshannon Dec 2, 2024
57b8820
Add necessary #ifndef Py_GIL_DISABLED
markshannon Dec 2, 2024
a607059
Go back to using tp_traverse, but make traversal more efficient
markshannon Dec 3, 2024
1545508
Tidy up
markshannon Dec 3, 2024
a1a38c8
A bit more tidying up
markshannon Dec 3, 2024
68fc90b
Move all work to do calculations to one place
markshannon Dec 3, 2024
8893cf5
Assume that increments are 50% garbage for work done calculation
markshannon Dec 3, 2024
ba20c7c
Elaborate comment
markshannon Dec 4, 2024
8262bf0
More tweaking of thresholds
markshannon Dec 4, 2024
3c2116e
Do some algebra
markshannon Dec 4, 2024
72d0284
Revert to 2M+I from 3M+I
markshannon Dec 4, 2024
0f182e2
Address review comments
markshannon Dec 5, 2024
d3c21bb
Address review comments and clarify code a bit
markshannon Dec 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 44 additions & 6 deletions InternalDocs/garbage_collector.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,22 +199,22 @@ unreachable:

```pycon
>>> import gc
>>>
>>>
>>> class Link:
... def __init__(self, next_link=None):
... self.next_link = next_link
...
...
>>> link_3 = Link()
>>> link_2 = Link(link_3)
>>> link_1 = Link(link_2)
>>> link_3.next_link = link_1
>>> A = link_1
>>> del link_1, link_2, link_3
>>>
>>>
>>> link_4 = Link()
>>> link_4.next_link = link_4
>>> del link_4
>>>
>>>
>>> # Collect the unreachable Link object (and its .__dict__ dict).
>>> gc.collect()
2
Expand Down Expand Up @@ -459,11 +459,11 @@ specifically in a generation by calling `gc.collect(generation=NUM)`.
>>> # Create a reference cycle.
>>> x = MyObj()
>>> x.self = x
>>>
>>>
>>> # Initially the object is in the young generation.
>>> gc.get_objects(generation=0)
[..., <__main__.MyObj object at 0x7fbcc12a3400>, ...]
>>>
>>>
>>> # After a collection of the youngest generation the object
>>> # moves to the old generation.
>>> gc.collect(generation=0)
Expand Down Expand Up @@ -515,6 +515,44 @@ increment. All objects directly referred to from those stack frames are
added to the working set.
Then the above algorithm is repeated, starting from step 2.

Determining how much work to do
-------------------------------

We need to do a certain amount of work to enusre that garbage is collected,
but doing too much work slows down execution.

To work out how much work we need to do, consider a heap with `L` live objects
and `G0` garbage objects at the start of a full scavenge and `G1` garbage objects
at the end of the scavenge. We don't want the amount of garbage to grow, `G1 ≤ G0`, and
we don't want too much garbage (say 1/3 of the heap maximum), `G0 ≤ L/2`.
For each full scavenge we must visit all objects, `T == L + G0 + G1`, during which
`G1` garbage objects are created.

The number of new objects created `N` must be at least the new garbage created, `N ≥ G1`,
assuming that the number of live objects remains roughly constant.
If we set `T == 4*N` we get `T > 4*G1` and `T = L + G0 + G1` => `L + G0 > 3G1`
For a steady state heap (`G0 == G1`) we get `L > 2G0` and the desired garbage ratio.

In other words, to keep the garbage fraction to 1/3 or less we need to visit
4 times as many objects as are newly created.

We can do better than this though. Not all new objects will be garbage.
Consider the heap at the end of the scavenge with `L1` live objects and `G1`
garbage. Also, note that `T == M + I` where `M` is the number of objects marked
as reachable and `I` is the number of objects visited in increments.
Everything in `M` is live, so `I ≥ G0` and in practice `I` is closer to `G0 + G1`.

If we choose the amount of work done such that `2*M + I == 6N` then we can do
less work in most cases, but are still guaranteed to keep up.
Since `I ≳ G0 + G1` (not strictly true, but close enough)
`T == M + I == (6N + I)/2` and `(6N + I)/2 ≳ 4G`, so we can keep up.

The reason that this improves performance is that `M` is usually much larger
than `I`. If `M == 10I`, then `T ≅ 3N`.

Finally, instead of using a fixed multiple of 8, we gradually increase it as the
heap grows. This avoids wasting work for small heaps and during startup.


Optimization: reusing fields to save memory
===========================================
Expand Down
14 changes: 3 additions & 11 deletions Lib/test/test_gc.py
Original file line number Diff line number Diff line change
Expand Up @@ -1161,27 +1161,19 @@ def make_ll(depth):
return head

head = make_ll(1000)
count = 1000

# There will be some objects we aren't counting,
# e.g. the gc stats dicts. This test checks
# that the counts don't grow, so we try to
# correct for the uncounted objects
# This is just an estimate.
CORRECTION = 20

enabled = gc.isenabled()
gc.enable()
olds = []
initial_heap_size = _testinternalcapi.get_tracked_heap_size()
for i in range(20_000):
iterations = max(20_000, initial_heap_size)
for i in range(iterations):
newhead = make_ll(20)
count += 20
newhead.surprise = head
olds.append(newhead)
if len(olds) == 20:
new_objects = _testinternalcapi.get_tracked_heap_size() - initial_heap_size
self.assertLess(new_objects, 27_000, f"Heap growing. Reached limit after {i} iterations")
self.assertLess(new_objects, initial_heap_size/2, f"Heap growing. Reached limit after {i} iterations")
del olds[:]
if not enabled:
gc.disable()
Expand Down
4 changes: 1 addition & 3 deletions Objects/dictobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -7067,9 +7067,7 @@ int
PyObject_VisitManagedDict(PyObject *obj, visitproc visit, void *arg)
{
PyTypeObject *tp = Py_TYPE(obj);
if((tp->tp_flags & Py_TPFLAGS_MANAGED_DICT) == 0) {
return 0;
}
assert(tp->tp_flags & Py_TPFLAGS_MANAGED_DICT);
if (tp->tp_flags & Py_TPFLAGS_INLINE_VALUES) {
PyDictValues *values = _PyObject_InlineValues(obj);
if (values->valid) {
Expand Down
69 changes: 3 additions & 66 deletions Objects/genobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -882,25 +882,7 @@ PyTypeObject PyGen_Type = {
gen_methods, /* tp_methods */
gen_memberlist, /* tp_members */
gen_getsetlist, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */

0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
0, /* tp_init */
0, /* tp_alloc */
0, /* tp_new */
0, /* tp_free */
0, /* tp_is_gc */
0, /* tp_bases */
0, /* tp_mro */
0, /* tp_cache */
0, /* tp_subclasses */
0, /* tp_weaklist */
0, /* tp_del */
0, /* tp_version_tag */
_PyGen_Finalize, /* tp_finalize */
.tp_finalize = _PyGen_Finalize,
};

static PyObject *
Expand Down Expand Up @@ -1242,24 +1224,7 @@ PyTypeObject PyCoro_Type = {
coro_methods, /* tp_methods */
coro_memberlist, /* tp_members */
coro_getsetlist, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
0, /* tp_init */
0, /* tp_alloc */
0, /* tp_new */
0, /* tp_free */
0, /* tp_is_gc */
0, /* tp_bases */
0, /* tp_mro */
0, /* tp_cache */
0, /* tp_subclasses */
0, /* tp_weaklist */
0, /* tp_del */
0, /* tp_version_tag */
_PyGen_Finalize, /* tp_finalize */
.tp_finalize = _PyGen_Finalize,
};

static void
Expand Down Expand Up @@ -1464,7 +1429,6 @@ typedef struct _PyAsyncGenWrappedValue {
(assert(_PyAsyncGenWrappedValue_CheckExact(op)), \
_Py_CAST(_PyAsyncGenWrappedValue*, (op)))


static int
async_gen_traverse(PyObject *self, visitproc visit, void *arg)
{
Expand Down Expand Up @@ -1673,24 +1637,7 @@ PyTypeObject PyAsyncGen_Type = {
async_gen_methods, /* tp_methods */
async_gen_memberlist, /* tp_members */
async_gen_getsetlist, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
0, /* tp_init */
0, /* tp_alloc */
0, /* tp_new */
0, /* tp_free */
0, /* tp_is_gc */
0, /* tp_bases */
0, /* tp_mro */
0, /* tp_cache */
0, /* tp_subclasses */
0, /* tp_weaklist */
0, /* tp_del */
0, /* tp_version_tag */
_PyGen_Finalize, /* tp_finalize */
.tp_finalize = _PyGen_Finalize,
};


Expand Down Expand Up @@ -1935,16 +1882,6 @@ PyTypeObject _PyAsyncGenASend_Type = {
PyObject_SelfIter, /* tp_iter */
async_gen_asend_iternext, /* tp_iternext */
async_gen_asend_methods, /* tp_methods */
0, /* tp_members */
0, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
0, /* tp_init */
0, /* tp_alloc */
0, /* tp_new */
.tp_finalize = async_gen_asend_finalize,
};

Expand Down
13 changes: 13 additions & 0 deletions Objects/typeobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -2354,6 +2354,16 @@ subtype_traverse(PyObject *self, visitproc visit, void *arg)
return 0;
}


static int
plain_object_traverse(PyObject *self, visitproc visit, void *arg)
{
PyTypeObject *type = Py_TYPE(self);
assert(type->tp_flags & Py_TPFLAGS_MANAGED_DICT);
Py_VISIT(type);
return PyObject_VisitManagedDict(self, visit, arg);
}

static void
clear_slots(PyTypeObject *type, PyObject *self)
{
Expand Down Expand Up @@ -4146,6 +4156,9 @@ type_new_descriptors(const type_new_ctx *ctx, PyTypeObject *type)
assert((type->tp_flags & Py_TPFLAGS_MANAGED_DICT) == 0);
type->tp_flags |= Py_TPFLAGS_MANAGED_DICT;
type->tp_dictoffset = -1;
if (type->tp_basicsize == sizeof(PyObject)) {
type->tp_traverse = plain_object_traverse;
}
}

type->tp_basicsize = slotoffset;
Expand Down
Loading
Loading