Skip to content

[mypyc] Fix reference leak when setting unboxed refcounted attrs#21657

Merged
p-sawicki merged 2 commits into
python:masterfrom
Tombana:fix/mypyc-dataclass-int-field-leak
Jun 30, 2026
Merged

[mypyc] Fix reference leak when setting unboxed refcounted attrs#21657
p-sawicki merged 2 commits into
python:masterfrom
Tombana:fix/mypyc-dataclass-int-field-leak

Conversation

@Tombana

@Tombana Tombana commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

I ran into this memory leak which can reproduced with:

from dataclasses import dataclass

@dataclass
class MyClass:
    v: int

c = MyClass(1 << 70)

This PR adds a fix and a test that fails without the fix. I am no expert on mypyc internals, so I asked Claude Code to fix this bug. It is a one-line change so I believe it will be easy to review.
From quick inspection it seems that the rest of the code has fewer comments than what Claude wrote, so if you prefer, I'll remove the verbose comment that this PR adds. Similarly for the test code, it can be shortened if wanted.

Long explanation from Claude code

Description

A native attribute setter generated by mypyc over-increfs the stored value when
the attribute has a refcounted unboxed type — most importantly int
(CPyTagged), and also tuples with refcounted items.

For an unboxed type, generate_setter emitted:

tmp = CPyTagged_FromObject(value);  // already creates a NEW (owned) reference
CPyTagged_INCREF(tmp);              // ...and then takes a second one
self->_v = tmp;

CPyTagged_FromObject already increfs in the heap-boxed case, so the setter
takes two references while the deallocator releases only one — leaking one
reference on every set through the setter. The other two branches of
generate_setter (object and the emit_cast path) are correct because they
produce a borrowed value and rely on the single emit_inc_ref to take
ownership; only the unboxed branch was inconsistent.

Why this shows up with dataclasses

This is reached whenever an attribute is set from interpreted code. The clearest
real-world case is the init that the stdlib dataclasses module
synthesizes for a mypyc-compiled @DataClass: its self.v = v runs as
interpreted code and goes through the generated descriptor setter. So every
constructed instance of a compiled dataclass with a heap-boxed int field
(value ≥ 2**62) leaked one PyLong — silent, unbounded growth in long-lived
programs that build many such objects (e.g. 64-bit ids).

It does not depend on slots, frozen, eq, field count, or field
position; a hand-written native init is unaffected because it stores via
SetAttr rather than the descriptor. Small (inline-tagged) ints, floats, and
object-typed fields are also unaffected since they aren't refcounted through
this path.

Reproducer

from dataclasses import dataclass
import sys

@dataclass
class C:
    v: int

big = 1 << 70                  # heap-boxed int (>= 2**62), so it is refcounted
base = sys.getrefcount(big)
xs = [C(big) for _ in range(1000)]
del xs
print(sys.getrefcount(big) - base)   # before: 1000   after: 0

Fix

Unbox with borrow=True in the setter's unboxed branch, so all three branches
of generate_setter produce a borrowed value and the single emit_inc_ref
takes exactly one owned reference. borrow=True is a no-op for non-refcounted
unboxed types (float, fixed-width ints, bool) and is propagated correctly
through RTuple unboxing, which fixes the analogous leak for tuple-typed
attributes too.

Tests

Added testNativeAttrSetterRefcountLeak to mypyc/test-data/run-classes.test,
covering a boxed-int dataclass field, an unboxed Tuple[int, int] field, and
setter re-assignment. It fails before the fix (expected 100 live refs, got 200) and passes after.

Tombana and others added 2 commits June 29, 2026 17:42
…descriptor

The generated getset descriptor setter for a native attribute over-increfed
refcounted unboxed values. For such a type emit_unbox already produces a new
(owned) reference (e.g. CPyTagged_FromObject increfs the heap-boxed int case),
and generate_setter then applied an additional emit_inc_ref, taking two
references while the deallocator releases only one. The result was a leaked
reference on every set through the setter.

This is hit whenever the attribute is set from interpreted code, most notably
the __init__ that the dataclasses module synthesizes for a mypyc-compiled
@DataClass: `self.v = v` goes through the descriptor, so every constructed
instance with a heap-boxed int (>= 2**62) field leaked one PyLong. Tuple
attributes with refcounted items were affected the same way.

Fix: unbox with borrow=True so all three branches of generate_setter produce a
borrowed value and the single emit_inc_ref takes exactly one owned reference.
borrow=True is a no-op for non-refcounted unboxed types and is propagated
correctly through RTuple unboxing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On free-threaded CPython, all code constants are immortal, so the folded
literals `1 << 70` / `1 << 71` in the test driver had an unchanging refcount and
sys.getrefcount could not observe the leak (the test saw delta 0 instead of 100
on py314t). Compute the values at runtime via a variable shift so they stay
mortal and the leak is detectable on both GIL and free-threaded builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Tombana Tombana marked this pull request as ready for review June 30, 2026 10:32

@p-sawicki p-sawicki left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@p-sawicki p-sawicki merged commit 79a769a into python:master Jun 30, 2026
18 checks passed
@Tombana Tombana deleted the fix/mypyc-dataclass-int-field-leak branch June 30, 2026 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants