Bug report
In a free-threading build (--disable-gil), the garbage collector debug flag
gcstate->debug is read and written without synchronisation:
- Write:
gc_set_debug_impl() (Modules/gcmodule.c) does a plain
gcstate->debug = flags. Unlike gc_set_threshold_impl(), which runs its
free-threading branch under _PyEval_StopTheWorld(), gc.set_debug() takes
no lock and does not stop the world.
- Read:
gc_get_debug_impl() returns gcstate->debug directly, and the
collector in Python/gc_free_threading.c reads gcstate->debug /
interp->gc.debug in several places while walking the graph.
So one thread calling gc.set_debug() concurrently with another calling
gc.get_debug() or triggering a collection is an unsynchronised read/write of
the same int. The flag is only an int, so the effect stays benign at the
Python level, but it is undefined behaviour under C11 and ThreadSanitizer
reports it as a data race.
This is the same class of issue already fixed for sys dlopenflags
(gh-151644) and gc.get_stats() (gh-151646). gc.enable() / gc.disable()
in the same file already access gcstate->enabled atomically; the debug flag
was missed.
ThreadSanitizer output
Built with ./configure --with-thread-sanitizer --disable-gil and stressed
with concurrent gc.set_debug() / gc.get_debug() plus a thread churning
cyclic garbage so the collector runs:
WARNING: ThreadSanitizer: data race
Write of size 4 at 0x...6c by thread T2:
#0 gc_set_debug gcmodule.c.h:186
Previous write of size 4 at 0x...6c by thread T1:
#0 gc_set_debug gcmodule.c.h:186
Location is global '_PyRuntime'
How to reproduce
./configure --with-thread-sanitizer --disable-gil && make
- Run a script that starts a few threads calling
gc.set_debug(...) /
gc.get_debug() in a loop, plus a thread that builds reference cycles and
calls gc.collect().
- TSan reports the write/write race on
gcstate->debug.
Suggested fix
Access the flag with FT_ATOMIC_STORE_INT_RELAXED / FT_ATOMIC_LOAD_INT_RELAXED
in gc_set_debug_impl() / gc_get_debug_impl() and in the collector reads,
matching how gcstate->enabled and dlopenflags are already handled. Relaxed
ordering is correct for an independent int flag. These wrappers compile to a
plain load/store in the default (GIL) build, so there is no change there.
Linked PRs
Bug report
In a free-threading build (
--disable-gil), the garbage collector debug flaggcstate->debugis read and written without synchronisation:gc_set_debug_impl()(Modules/gcmodule.c) does a plaingcstate->debug = flags. Unlikegc_set_threshold_impl(), which runs itsfree-threading branch under
_PyEval_StopTheWorld(),gc.set_debug()takesno lock and does not stop the world.
gc_get_debug_impl()returnsgcstate->debugdirectly, and thecollector in
Python/gc_free_threading.creadsgcstate->debug/interp->gc.debugin several places while walking the graph.So one thread calling
gc.set_debug()concurrently with another callinggc.get_debug()or triggering a collection is an unsynchronised read/write ofthe same
int. The flag is only anint, so the effect stays benign at thePython level, but it is undefined behaviour under C11 and ThreadSanitizer
reports it as a data race.
This is the same class of issue already fixed for
sysdlopenflags(gh-151644) and
gc.get_stats()(gh-151646).gc.enable()/gc.disable()in the same file already access
gcstate->enabledatomically; the debug flagwas missed.
ThreadSanitizer output
Built with
./configure --with-thread-sanitizer --disable-giland stressedwith concurrent
gc.set_debug()/gc.get_debug()plus a thread churningcyclic garbage so the collector runs:
How to reproduce
./configure --with-thread-sanitizer --disable-gil && makegc.set_debug(...)/gc.get_debug()in a loop, plus a thread that builds reference cycles andcalls
gc.collect().gcstate->debug.Suggested fix
Access the flag with
FT_ATOMIC_STORE_INT_RELAXED/FT_ATOMIC_LOAD_INT_RELAXEDin
gc_set_debug_impl()/gc_get_debug_impl()and in the collector reads,matching how
gcstate->enabledand dlopenflags are already handled. Relaxedordering is correct for an independent
intflag. These wrappers compile to aplain load/store in the default (GIL) build, so there is no change there.
Linked PRs