[mypyc] Fix reference leak when setting unboxed refcounted attrs#21657
Merged
p-sawicki merged 2 commits intoJun 30, 2026
Conversation
…descriptor The generated getset descriptor setter for a native attribute over-increfed refcounted unboxed values. For such a type emit_unbox already produces a new (owned) reference (e.g. CPyTagged_FromObject increfs the heap-boxed int case), and generate_setter then applied an additional emit_inc_ref, taking two references while the deallocator releases only one. The result was a leaked reference on every set through the setter. This is hit whenever the attribute is set from interpreted code, most notably the __init__ that the dataclasses module synthesizes for a mypyc-compiled @DataClass: `self.v = v` goes through the descriptor, so every constructed instance with a heap-boxed int (>= 2**62) field leaked one PyLong. Tuple attributes with refcounted items were affected the same way. Fix: unbox with borrow=True so all three branches of generate_setter produce a borrowed value and the single emit_inc_ref takes exactly one owned reference. borrow=True is a no-op for non-refcounted unboxed types and is propagated correctly through RTuple unboxing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On free-threaded CPython, all code constants are immortal, so the folded literals `1 << 70` / `1 << 71` in the test driver had an unchanging refcount and sys.getrefcount could not observe the leak (the test saw delta 0 instead of 100 on py314t). Compute the values at runtime via a variable shift so they stay mortal and the leak is detectable on both GIL and free-threaded builds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I ran into this memory leak which can reproduced with:
This PR adds a fix and a test that fails without the fix. I am no expert on mypyc internals, so I asked Claude Code to fix this bug. It is a one-line change so I believe it will be easy to review.
From quick inspection it seems that the rest of the code has fewer comments than what Claude wrote, so if you prefer, I'll remove the verbose comment that this PR adds. Similarly for the test code, it can be shortened if wanted.
Long explanation from Claude code
Description
A native attribute setter generated by mypyc over-increfs the stored value when
the attribute has a refcounted unboxed type — most importantly
int(
CPyTagged), and also tuples with refcounted items.For an unboxed type,
generate_setteremitted:CPyTagged_FromObject already increfs in the heap-boxed case, so the setter
takes two references while the deallocator releases only one — leaking one
reference on every set through the setter. The other two branches of
generate_setter (object and the emit_cast path) are correct because they
produce a borrowed value and rely on the single emit_inc_ref to take
ownership; only the unboxed branch was inconsistent.
Why this shows up with dataclasses
This is reached whenever an attribute is set from interpreted code. The clearest
real-world case is the init that the stdlib dataclasses module
synthesizes for a mypyc-compiled @DataClass: its self.v = v runs as
interpreted code and goes through the generated descriptor setter. So every
constructed instance of a compiled dataclass with a heap-boxed int field
(value ≥ 2**62) leaked one PyLong — silent, unbounded growth in long-lived
programs that build many such objects (e.g. 64-bit ids).
It does not depend on slots, frozen, eq, field count, or field
position; a hand-written native init is unaffected because it stores via
SetAttr rather than the descriptor. Small (inline-tagged) ints, floats, and
object-typed fields are also unaffected since they aren't refcounted through
this path.
Reproducer
Fix
Unbox with borrow=True in the setter's unboxed branch, so all three branches
of generate_setter produce a borrowed value and the single emit_inc_ref
takes exactly one owned reference. borrow=True is a no-op for non-refcounted
unboxed types (float, fixed-width ints, bool) and is propagated correctly
through RTuple unboxing, which fixes the analogous leak for tuple-typed
attributes too.
Tests
Added testNativeAttrSetterRefcountLeak to mypyc/test-data/run-classes.test,
covering a boxed-int dataclass field, an unboxed Tuple[int, int] field, and
setter re-assignment. It fails before the fix (expected 100 live refs, got 200) and passes after.