From 28b8fb54521dad6d77d29a981ebed17ce9a79040 Mon Sep 17 00:00:00 2001 From: Savannah Ostrowski Date: Thu, 2 Jul 2026 19:30:03 -0700 Subject: [PATCH 1/8] Add JIT PEP and codeowners --- .github/CODEOWNERS | 1 + peps/pep-0836.rst | 1016 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 1017 insertions(+) create mode 100644 peps/pep-0836.rst diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index ea7a8d4543b..509e23f3e1e 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -710,6 +710,7 @@ peps/pep-0831.rst @pablogsal @Fidget-Spinner @savannahostrowski peps/pep-0832.rst @brettcannon peps/pep-0833.rst @dstufft peps/pep-0835.rst @ilevkivskyi +peps/pep-0836.rst @savannahostrowski @Fidget-Spinner @brandtbucher # ... peps/pep-2026.rst @hugovk # ... diff --git a/peps/pep-0836.rst b/peps/pep-0836.rst new file mode 100644 index 00000000000..bf7c50b6f16 --- /dev/null +++ b/peps/pep-0836.rst @@ -0,0 +1,1016 @@ +PEP: 836 +Title: JIT Go Brrr: The Path to a Supported JIT Compiler for CPython +Author: Savannah Ostrowski , + Ken Jin , + Brandt Bucher +Discussions-To: Pending +Status: Draft +Type: Standards Track +Created: 02-Jul-2026 +Python-Version: 3.16 +Post-History: Pending + + +Abstract +======== + +The experimental Just-in-Time (JIT) compiler has been part of CPython's main +branch since Python 3.13. :pep:`744` described part of its initial design and +explicitly deferred a number of questions about the JIT's long-term status. +Since then, the JIT has been re-architected and matured considerably. In Python +3.15, it delivers a measurable, reproducible speedup over the interpreter +(about 4-12% geometric mean performance improvement across measured Tier 1 +platforms; see :ref:`appendix-jit-speedup-2wk`), emits frames that native +debuggers can unwind through, and reduces the memory footprint of generated +code relative to 3.14. Along the way, we have learned a good deal about what +works for a JIT in CPython. + +This PEP proposes a path for the JIT to become a supported, non-experimental +part of CPython if it meets measurable performance, compatibility, tooling, +platform, distribution, security, and maintenance goals. The initial +performance target is at least 20% geometric mean improvement on pyperformance +for the JIT + free-threaded build compared to the non-JIT free-threaded build's +interpreter, measured as the mean across the supported Tier 1 platforms, by the +first beta release of Python 3.17. The target is set as a minimum bar for +continued in-tree development of the JIT. + + +Proposal +======== + +This PEP does not propose declaring the JIT as supported immediately. Instead, +it proposes a time-bounded path for keeping JIT development in CPython main +while the project meets explicit performance, compatibility, tooling, +distribution, security, and maintenance goals. + +If these goals are met, the JIT can be promoted to a non-experimental feature +of CPython. If they are not met, the Steering Council and core team should +re-evaluate whether the JIT should remain in CPython main. After promotion, +enabling the JIT by default on supported platforms would require a separate +final approval from the Release Manager. + +At a high level, these are our milestones and goals for the JIT over the next +2.5 years: + +* **Year 1 (ending with Python 3.16's first beta) - Developer experience + improvements** + + * :ref:`Evolve the frontend ` from trace recording to + method-based. We believe that a method frontend will put us on a path that + allows for easier maintenance, teachability, debugging, and so on. The + first implementation should be minimal, may initially use more memory or + perform slightly worse, and may be rolled back to the current tracing + frontend if the approach does not meet the project's goals in the first + year. + + * :ref:`Make the JIT compatible with free-threading `. We + believe that this is important to prioritize early on in the next phase of + the JIT as free-threading adoption is expanding rapidly. + + * :ref:`Add further tooling testing `, and address any + discovered remaining gaps in coverage, for native and Python profilers and + debuggers. At a minimum, this will include anything that uses frame + pointers to unwind, but should also be expanded to support tools that + symbolize Python frames. Third-party tooling must have documented + remediation paths when existing behavior cannot be preserved exactly. + + * :ref:`A better JIT distribution story `. Provide + redistributors with a documented and reproducible way to build or verify + JIT stencils, without requiring long-term dependence on one exact LLVM + version [#community-perspectives]_. + + * **No lower than 5% uplift on the JIT versus the interpreter on the mean of + our supported platforms after implementing the above changes.** In other + words, we will not significantly regress base performance improvements + while in pursuit of longer-term goals. We will also not discourage other + contributors from contributing performance improvements during this stage. + However, our main focus will be the developer experience improvements. + +* **Year 2 (ending with Python 3.17's first beta) - Improved performance** + + * **Achieve at least 20% performance geometric mean improvement on + pyperformance for JIT + free-threading compared to free-threading + interpreter alone.** This is the minimum target for keeping the JIT in + CPython main, with free-threading treated as the primary performance focus. + +* **Year 2.5 (ending with Python 3.17's first release candidate) - Adoption + and compatibility** + + * :ref:`Compatibility review `. Run test suites for selected + popular PyPI packages and representative real-world workloads under the + JIT. Regressions should be triaged case by case, with fixes or documented + explanations for issues that reflect real compatibility breaks. + + +Motivation +========== + +Improving CPython's performance is essential to Python's future. A JIT compiler +is one of the few performance strategies that can improve CPython while +preserving the runtime that users, extension authors, embedders, distributors, +debuggers and profilers already target. Other dynamic languages, such as Ruby, +PHP and JavaScript, have successfully used JIT compilers to deliver substantial +performance improvements while maintaining compatibility with large existing +ecosystems. CPython has different constraints but the JIT is explicitly aimed +at improving performance within those constraints. + +Alternative Python implementations, such as PyPy and GraalPy, demonstrate that +larger speedups are possible for some Python programs, and we deeply value +those projects. However, many users cannot adopt an alternative runtime even +when it may perform better on their code due to factors such as supported +Python versions, extension compatibility, embedding requirements, deployment +constraints, and tooling support. + +:pep:`744` did valuable work explaining the JIT's copy-and-patch approach, made +the case for keeping the implementation in CPython main branch so that it could +be maintained by a broader group of volunteers, and sketched some criteria +under which the JIT might eventually graduate from an experimental state. +However, the original PEP for the JIT left many questions open about +guarantees, maintenance commitments, success metrics, timelines, tooling +compatibility, impact on redistributors, its relationship to other JITs, and +likely architectural evolution. + +The current CPython JIT has shown some promising results (see +:ref:`appendix-jit-speedup-2wk`), especially in the last 9-12 months. However, +as with any good experiment, it's important to evaluate the current approach +and evolve plans based on what we've learned. + + +.. _current-state: + +Current State +============= + +As of CPython 3.15, the current JIT compiler is roughly 4-12% faster geometric +mean on the pyperformance benchmark suite compared to the interpreter across +measured Tier 1 platforms (see :ref:`appendix-jit-speedup-2wk`). In order to +achieve this, the JIT and supporting infrastructure has undergone a number of +revisions across the last four major versions of Python: + +* **3.12:** Introduction of the CPython bytecode DSL, and refactoring of + interpreter bytecodes to micro-operations ("uops"). +* **3.13:** JIT trace projector, optimizer and Copy and Patch backend + introduced, and :pep:`744` was written. +* **3.14:** More refactoring of interpreter bytecodes and optimizer work. +* **3.15:** JIT tracer rewritten to recording, JIT optimizer improvements, + community engagement and involvement. + +Today, the JIT is an experimental and opt-in part of CPython. The official +Python binaries for Windows and macOS ship with the JIT built but disabled by +default (end users can enable using ``PYTHON_JIT=1``). Other distributors and +certain Linux distributions, such as Fedora and Gentoo, are also known to do +the same. As it stands, the JIT requires an LLVM build-time requirement for +stencil generation. + +The JIT has garnered many excellent community contributors, and this has picked +up momentum in recent months. We are extremely grateful to these volunteers. A +sizable and active community now exists today, as evidenced by the contributor +list in CPython 3.15's What's New entry for the JIT [#whats-new]_. The JIT team +has learnt important lessons to attract new contributors, such as making +approachable work units in the public issue tracker, and mentorship. + + +.. _jit-learnings: + +Learnings +========= + +JIT projects evolve over their lifespan, as seen for example in CRuby which has +seen multiple JIT compilers. CPython's JIT is no different. + +CPython's JIT compiler has areas to improve. To be sustainable in the +long-term, meet our performance goals, and continue fostering community +engagement, certain tradeoffs are required. Tooling, compatibility with other +JIT projects, and free-threading must be first-class citizens. + +In our experience, there are several areas which we believe have been +successful: + +* **The bytecode DSL and uops.** This approach lowers maintenance burden even + in the interpreter, as repeated units of code can be shared without + interpretive overhead, and we can reduce error-proneness when modifying the + interpreter through bytecode validation. These should remain in CPython even + if the current JIT is unsuccessful and asked to be removed. The uops + themselves also form the intermediate representation for the JIT + automatically. + +* **Generating a JIT translator automatically using our own tooling.** + CPython's JIT can automatically generate bytecode to JIT Intermediate + Representation (IR) rules automatically using the bytecodes DSL. Again, this + means most JIT translations are correct-by-construction, reducing + error-proneness and maintenance burden. This also means the complexity of + the JIT is + self-contained: adding new features to CPython generally do not need JIT + support unless implementers want the JIT to optimize the feature. For + example, the initial lazy imports pull request [#lazy-imports-pr]_ did not + require touching any JIT files apart from adding new headers to include in C. + +* **A JIT optimizer that resembles the CPython interpreter.** The current JIT + optimizer (middle-end), analyzes type information over CPython uops. The key + maintainability advantage here is that the middle-end is written in a similar + fashion as the normal CPython interpreter: as a bytecode DSL over an + interpreter. However, instead of interpreting objects, we interpret types of + the objects. This means knowledge of the CPython interpreter is transferrable + to the JIT optimizer, and if a contributor knows how to work on the + interpreter's bytecodes, they also know how to work on the JIT's middle-end. + +* **Generating a JIT machine code backend automatically using our own + tooling.** CPython's JIT does not require custom handwritten operations, as + the JIT machine code is generated automatically from the interpreter. This + further reduces the maintenance burden of a JIT and allows a small team to + maintain it for a wide variety of platforms. + +* **Trace recording provides some benefits naturally.** For example, + polymorphism, speculation, dead code elimination, value recording are well + handled in a trace recording JIT. + +* **Traces are an easy starting point.** Traces don't have control-flow within + them, making analysis simpler. + +* **A community-maintained JIT.** Despite partial funding from corporate + sources (which we are grateful for), a sizable portion of JIT work comes from + volunteers. Breaking the JIT into understandable chunks for contributors to + work on is an effective way of compartmentalizing complexity and encouraging + ownership. + +Conversely, we have also learned quite a bit about what has not worked for +CPython and what could be improved upon: + +* **We can continue to improve community outreach and engagement.** We are + taking active steps to onboard new members. However, continuous engagement + with the wider community and their requests/needs is critical for the + project. This includes talking to system distributors for example, or + maintainers of third-party tooling, understanding their concerns, and + accommodating them better. + +* **Reconsidering if our current JIT frontend is the right fit for CPython.** + The current tracing JIT has some great benefits, as discussed in + :ref:`Learnings `. However, a successful JIT is much more than + just good performance; we must consider other factors like maintainability, + testability, and teachability too. Furthermore, as the JIT matures, the + cost-benefit proposition of tracing in CPython shifts. To be clear, this is + not a value judgement of tracing as an approach, but rather an assessment of + its state within CPython. These observations are from the authors, some of + whom implemented the current tracing frontend in CPython 3.15: + + * **Tracing's initial ease does not seem to continue in the medium term in + the case of CPython.** As described in :ref:`Learnings `, + tracing is easy to start with. However the simple implementation of tracing + in CPython yielded no speedups initially in 3.13 and 3.14. Only when we + shifted to a more complex tracing runtime and modified the interpreter did + we experience performance gains. We believe the initial ease of tracing + will be eroded as our tracing runtime matures. + + * **A mature tracing runtime's complexity seems to require many + non-conventional clever "tricks", in our experience.** For example, the + current trace recording mechanism relies on such tricks + [#trace-recording-docs]_ to make recording the interpreter execution + efficient and effective. Additional complexities include managing the trace + graph and its lifetime. We would like to reduce such tricks to make the JIT + easier to maintain and teach. Other frontends such as method-based ones + also implement tricks. However, they seem to be more well-studied in recent + years and thus more well-documented due to their prevalence in other + dynamic language runtimes. + + * **Tracing's interactions in CPython are nontraditional to teach and + analyze.** Tracing is commonly found in AI/ML compilers, but less + frequently used and taught in traditional compiler literature. We believe + this increases the barrier to entry for a new contributor who knows + compilers but does not know about CPython. Furthermore, when a trace + performs badly, its interactions with the CPython interpreter can be hard + to analyze. There can be myriad reasons for less predictable performance, + and analyzing them requires a deep understanding of the interpreter as + well. For example, the current trace recording runtime has complicated + tracing heuristics which decide whether to continue or terminate the trace. + These heuristics took a contributor and one of this PEP's authors many + attempts to get right (through no fault of their own). We wish to make it + easier to teach and onboard new contributors without requiring them to + deeply analyze the interpreter. + +* **We do not have a strong pulse on whether the JIT currently benefits larger + real-world workloads.** At present, the JIT is primarily measured and + evaluated via pyperformance benchmark suite runs. We would like to spend more + time evaluating its impact on real end-user code (see :ref:`compatibility`). + +* **The distribution story should be improved and codified.** Distributor + feedback so far suggests that LLVM itself is not always the main obstacle as + most can provide a recent LLVM toolchain. The harder problem is depending on + one exact LLVM version for the lifetime of a Python release, which can force + redistributors to carry multiple LLVM versions, rely on unsupported + toolchains, disable the JIT or maintain bespoke stencil generation workflows. + We must codify a solution that is workable for distributors. :pep:`774` is + one such solution but more research needs to be done to prevent each Linux + distribution rolling their own bespoke solution. + + +Rationale +========= + +As described in :ref:`Current State `, the JIT has achieved +roughly 4-12% faster geometric mean on the pyperformance benchmark suite for +measured Tier 1 platforms (see :ref:`appendix-jit-speedup-2wk`), with some +limitations, challenges and areas of improvement. In this next phase of the +JIT, **we want to set an initial ambitious but attainable target of at least +20% performance improvement over the interpreter on the free-threaded build +achieved within the next 2.5 years (in other words, by Python 3.17).** + +However, we know that performance for performance sake and at the cost of +tooling incompatibility is not meaningful or attractive for the project. As +such, we want to enter this next phase intentionally and with a clear plan, +enumerated in detail in the :ref:`Specification ` section. + + +.. _jit-specification: + +Specification +============= + +In order to achieve a sustainable and maintainable 20%+ performance gain with +full tooling compatibility in the next 2.5 years, there are several areas worth +discussing: + +* Key JIT infrastructure, including an evolution of the JIT frontend +* Optimizations +* First-class support for free-threading +* A better distribution story +* Compatibility +* Tooling support + + +.. _jit-infrastructure: + +Key JIT Infrastructure +---------------------- + +Traditionally, compilers are split into a *frontend*, *middle-end*, and +*backend*. They have the following meaning in our context: + +* **Frontend:** Selects what to compile. This can be methods or traces of + CPython specialized bytecode. +* **Middle-end:** Optimizes instructions. Translates specialized bytecode to + uops and optimizes them. +* **Backend:** Generates machine code. + +At present, the frontend uses trace recording. Elaborating more, trace +recording records the actual flow of execution through the program's bytecodes, +along with live values during execution. We instrumented the interpreter to +achieve this. This frontend was not the one originally introduced in 3.13, +which seemed to be ineffective at the time due to various reasons +[#jit-on-track]_. + +To ease maintenance burden, disentangle the JIT and the interpreter, and unlock +future optimizations in a sustainable fashion, we propose changing the frontend +by 3.16 to a method one. The method frontend can be rolled back midway to the +trace recording one if it does not meet our goals. + +Changing the frontend is not free. Time spent on this work is time not spent +directly adding optimizations to the current tracing frontend, and some +trace-specific performance wins may need to be recovered after the transition. +We believe this opportunity cost is justified only because the current frontend +appears likely to impose increasing maintenance, teaching, debugging, and +optimization costs as it matures that will outpace the initial implementation +cost of the method frontend. + +To elaborate on the difference, trace recording records straight-line sequences +through the code, while methods generally select one or more Python functions +to compile. + +The middle-end and backend will not require major changes. Nearly all of the +current code can be reused for the method frontend. The current backend which +uses Copy and Patch compilation already supports branches and jumps in the +control-flow. The middle-end which analyzes types over uops just needs to +support merging type information at control-flow merge points. + +Motivated by our learnings over the past several years, our goals for the +method frontend are as follows: + +* To make optimization as simple and as *traditional* as possible so as to + avoid unnecessary experimentation on CPython's main branch. +* To make the JIT easier to maintain. We don't mean this in lines of code, but + rather in conceptual burden. A single maintainer should be able to "fit" the + entire system in their head and accurately predict/understand its behavior, + even in reasonably complex programs. The current tracing frontend can produce + head-scratching results even for very simple benchmarking programs. +* To enable higher-level optimizations more easily, without requiring a higher + JIT tier (which requires another JIT bolted on top) or inter-trace knowledge. + +The following is the reference design. It is subject to change as the code +evolves: + +* **Uop IR.** The benefits for this are explained in + :ref:`Learnings `. +* **Some Single Static Assignment (SSA) form properties over the stack.** This + does not mean we need to rewrite our IR to SSA form, but rather, the + optimizer should have some SSA properties. We believe this aligns more + closely with other compilers (e.g. Cinder, PyPy, Chrome's V8, CRuby's + YJIT/ZJIT), and makes understanding *how* to optimize in the JIT easier and + more powerful. The current JIT optimizer already nearly supports this, and + only requires minimal changes to have SSA properties. An IR with proper stack + discipline already has many useful properties that are analogous to SSA form. + SSA form will basically come for free for stack variables. +* **A simple way to represent high-level constructs.** We have an + implementation that forms *regions* (groups of basic blocks), inspired by the + similarly named concept in MLIR (an LLVM project). Rather than degenerating + programs to single basic blocks pointing to each other, we opt to keep the + high-level construct information around. In MLIR, this was motivated by + better loop analysis and optimization. In CPython's JIT, this is motivated by + better generator/coroutine/loop/etc. (high-level construct) analysis and + optimizations. + +With the design described in this section, most optimizations in the JIT can be +implemented as local rewrites. This is again, inspired by certain properties of +other runtime's intermediate representations. Our goal is to make the JIT more +traditional and teachable, without sacrificing what we can optimize. We do +acknowledge that a method JIT requires joining control-flow. However, we +believe this is not a large conceptual overhead, as a tracing JIT already +requires teaching the concept of joining control-flow once anything other than +the most basic optimizations are implemented. + +In terms of what code we need to achieve this frontend, most of the +infrastructure required is already present. The main code modifications +required are the data structures to represent a control-flow graph, and +worklist algorithms to drive the pre-existing optimizer/analysis pass. We can +proceed to remove most of the current tracing frontend from the JIT from the +interpreter, which will simplify the interpreter's core dispatch mechanism and +simplify the main interpreter loop. We believe these are not foreign concepts +to CPython; the current bytecode compiler in CPython already represents +control-flow graphs and has worklist algorithms. + +Finally, both method and tracing JITs gain complexity and have various +tradeoffs to achieve great performance. Where tracing has greater simplicity in +value recording and profiling, methods need more advanced polymorphic inline +caches. Where tracing needs inter-trace optimization to get higher-level +optimizations, method JITs have it simpler by seeing more code. Both of these +need tight coupling with the interpreter to achieve great performance. We +understand both technologies come with tradeoffs, and we are once again not +making a judgement of which is ultimately better. Our claim is just that for +the optimizations that CPython requires, and for the ease of teaching, +debugging and analyzing, and for finding solutions in similar language runtimes +to our problems, a method-based JIT in this case is ultimately our choice. To +be upfront, and provide an understanding of the potential additional complexity +needed: a proposed method JIT may also require certain additional features (in +literature) like recording extra type profiling data in an extra side table. +However, the complexity can be greatly mitigated by the current bytecode DSL +and automatically generating the profiling operations, similar to how the +current tracing JIT already does things. We also propose mitigation strategies +in :ref:`Optimizations `. We thus believe the conceptual and +maintenance leap is not a huge one. + + +.. _jit-optimizations: + +Optimizations +------------- + +The method JIT builds on the pre-existing optimizations already present in the +current trace recording JIT. Namely it will come with the following +optimizations by virtue of the pre-existing JIT middle-end: + +* Type speculation (via the specializing adaptive interpreter's typed bytecode) +* Useless check/guard removal +* Redundant reference counting removal +* Constant folding + +Knowledge of the current middle-end is transferrable and contributors who have +worked on the current middle-end need not relearn much as the JIT middle-end +can work with the method frontend with minimal changes. + +Switching frontends has a short-term performance opportunity cost. The trace +recording frontend has benefited from nearly a year of focused work, and some +of its gains, especially those tied to trace-specific behavior or +free-threading-unsafe optimizations, may need to be recovered after the +transition. The reason to accept this cost is that a method frontend should +make the next set of larger optimizations easier to implement, reason about, +test, and maintain. + +As part of our plans, we plan to optimize generators/coroutines better and +improve the efficiency of calls. These high-level optimizations motivated us +towards a method-based JIT. These optimizations require seeing more of the +user's code to be effective, and the current tracing JIT in CPython cannot +achieve this without inter-trace optimization or trace stitching, which +increases the complexity and coupling with the runtime. + +Further optimizations are possible. However, they do not differ much if a trace +recording or method frontend is used: + +* Lock removal on free-threading +* Detecting deferred reclamation to reduce escaping sites in the JIT on + free-threading +* Conservating unboxing of integers, floats, and small strings. + +To recover the optimizations tracing gives for free, we plan to explore: + +* Recording extra type profiling information from the interpreter's specializer +* Path splitting (duplicating the control-flow graph) +* Cold code elimination +* Respecializing instructions in the middle-end. + +One may argue that this introduces a lot of complexity in the method JIT. +However, optimization 1 involves no changes to the interpreter and only minimal +changes to the specializer. Optimization 2 can be found in standard compiler +textbooks. Optimization 3 is trivial to implement in current CPython due to +branch information tracking already in the interpreter (we can just choose not +to compile branches/blocks that are never taken). Optimization 4 can be done by +leveraging the existing specializer's decisions. This continues the trend that +we feel method JITs are less entangled with the interpreter in the case of +CPython. + + +.. _free-threading: + +First-Class Support for Free-Threading +-------------------------------------- + +Free threading is already a part of Python's future, and the current JIT must +be made free threading safe as soon as possible to be a viable option for +improved performance. This involves making the frontend and middle-end's +optimizations free threading safe (the backend should already be safe). We do +not anticipate that a method frontend will make free threading support more +difficult over a tracing one. Furthermore, all major optimizations for the +pre-existing JIT implemented in the past year have already been designed with +free threading in mind. However, a slight performance penalty may initially be +encountered as we remove free threading unsafe optimizations. For example, we +anticipate that the major optimizations that need addressing will be +globals/builtins dictionary and type watchers. Resolving attribute/global +lookup at JIT compile time should still be feasible, but removing their guards +altogether may be unsafe in free threading. We expect a naive fix to produce a +slight (1-2% geomean) performance hit initially. + +All future optimizations upon resuming JIT development will be reviewed with +free threading compatibility and performance impact required before merge. +Optimizations that rely solely on the GIL build and break on the free threaded +build will be rejected. + +Additionally, the JIT may eventually even produce better performance versus the +free threading build than the current GIL build. Early experiments in the JIT +suggest free threaded optimization may gain a few more percentage points on +pyperformance. For example: + +* Reference counting on the free threading build is more expensive than on the + GIL build, and the JIT can eliminate much of reference counting. +* The JIT has more leeway with the lifetimes of certain objects, due to + deferred reference counting (see :pep:`703`) and Quiescent State-Based + Reclamation (QSBR). This unlocks even more optimization opportunities that + are not possible with immediate reclamation. +* The JIT can remove locks and atomics in the specializing adaptive interpreter + when it detects that objects are uniquely referenced. This is a source of + slowdown on architectures where atomics are more expensive. + +**We believe that the right framing here is not the JIT or free-threading, but +rather, the JIT and free-threading.** We understand the JIT may initially lose +some performance opportunities from free threading's semantics. However, both +the JIT and free threading have much to gain. The JIT can recover all of free +threading's single-threaded performance losses and maybe even more. + + +.. _distribution: + +A Better JIT Distribution Story +------------------------------- + +We can choose to adopt the solution proposed in :pep:`774` or allow a range of +LLVM versions to build the JIT. This is up for more discussion and +experimentation. At minimum, feedback from popular Linux distributors must be +collected, deliberated on, and incorporated into a holistic solution. Our +current understanding of the situation is that supporting multiple LLVM +versions is required. This should not be much additional complexity. However, +it may require more CI resources for testing. For the JIT to be successful, it +must not unduly burden third-party distributors. + + +.. _compatibility: + +Compatibility Review +-------------------- + +As part of our roadmap, we plan to run the test suites of the top PyPI packages +and detect if the JIT breaks them, similar to the initial nogil repository's +approach where Sam Gross ran popular PyPI packages to detect bugs and +incompatibilities (see the labeille project [#la-beille]_ for additional prior +art). + +Failing a package's test suite does not mean the goal is automatically not met. +Certain test suites may rely on CPython internal details that are not +guaranteed. Therefore, this requires a case-by-case examination. The bottom +line is that we must have at least made a concerted effort to assess JIT +compatibility with existing Python code out there, and made a best-effort +attempt at correcting any "real" bugs. + + +.. _tooling-support: + +Tooling Support +--------------- + +The JIT will not regress on the current native unwinding support. To recap, the +JIT currently supports all frame pointer-based unwinders and eh_frame-based +ones as well (such as GNU backtrace). + +The JIT will continue supporting out-of-process profilers/debuggers that +require Python frames. We understand that frame elision (inlining) is a +promising optimization. However, completely eliding frames in the JIT would +break third party tools. We will take care to negotiate and provide alternative +methods for Python frame unwinders the required information to recover the +elided frame, such as storing metadata for the elided frame. Furthermore, tools +that inspect the Python stack may need to symbolize the JIT C shim frame (i.e., +relate it to a Python function call). In this case, all necessary information +to support these tools will be provided in the CPython runtime, either through +executor objects or elsewhere, and also in the debug offsets for these tools to +support making sense of a callstack with JIT frames. For this, we may consult +with maintainers of popular Python frame unwinding applications. As a general +rule: if something works with the JIT off, we should do everything we can to +make sure it also works (or has usable alternatives) with the JIT on, and not +break genuinely useful observability and debugging features in the name of raw +performance. + +The JIT will continue supporting in-process tools. This means it will not break +``sys._getframe``, ``pdb`` or ``sys.monitoring``. + + +Relationship to Other JITs and Compiler Tools +--------------------------------------------- + +CPython's JIT is not intended to replace third-party specialist JITs or +compiler projects, such as CinderX, Numba, PyTorch Compile or other +domain-specific compilers. Those projects often optimize different workloads, +use different assumptions or operate at different layers of the stack. The JIT +is intended to be a "backstop" for the execution of any code that ends up being +the responsibility of the interpreter itself, just as it is today. + + +Platform Support +---------------- + +The JIT will support all Tier 1 platforms, as specified in :pep:`11`, at time +of writing: + ++---------------------------+------------+ +| Target Triple | Notes | ++===========================+============+ +| aarch64-apple-darwin | clang | ++---------------------------+------------+ +| aarch64-unknown-linux-gnu | glibc, gcc | ++---------------------------+------------+ +| i686-pc-windows-msvc | | ++---------------------------+------------+ +| x86_64-pc-windows-msvc | | ++---------------------------+------------+ +| x86_64-unknown-linux-gnu | glibc, gcc | ++---------------------------+------------+ + +However, we do not plan to concentrate dedicated cycles to improving 32-bit +Windows performance and would like to exclude the platform from our goals as +PyPI statistics suggest 32-bit Windows builds are a vanishingly small number of +downloads. Furthermore, if other conventionally non-JIT platforms eventually +get promoted to Tier 1 (such as WASI), we do not expect to support those +either. + +Thanks to the Copy and Patch backend, the JIT supports the platforms of +interest with minimal additional work required from us. The key idea is to not +handwrite machine code equivalents of our IR, as that causes too much churn and +is unsustainable with CPython's rapid bytecode changes. + + +.. _maintenance-model: + +Maintenance Model +----------------- + +The JIT is maintained by a group of CPython core developers and contributors +working across its three stages: the frontend, the optimizer and the +code-generation backend. A central goal of the JIT has been to keep more than +one active maintainer familiar with each stage, so that no part of the JIT +depends on a single person. The contributor base has grown deliberately rather +than by chance. During the 3.15 cycle, optimization work was decomposed into +small, individually actionable tasks, which lowered the barrier to contributing +and drew roughly a dozen people into the trace-recording conversion effort +while increasing the number of recurring optimizer contributors +[#jit-on-track]_. This task decomposition is an ongoing mechanism for bringing +in and retaining contributors, and it is how the project intends to sustain and +widen its maintainer pool over time. + +At present, the project does not depend on any single sponsor. It has continued +as a community-led effort after its initial principal corporate sponsor wound +down its dedicated funding, and it currently combines volunteer work with some +ongoing corporate contributions, primarily from Arm, FastAPI Labs, and OpenAI. +Sustaining the JIT also depends on shared and key infrastructure: the +continuous integration and build configurations that exercise JIT builds +(currently part of regular CI on main [#jit-ci]_), and the self-hosted +benchmarking machines and infrastructure that publish nightly results +[#does-jit-go-brrr]_ (currently maintained by Savannah; machines contributed by +Savannah and Arm). + +Finally, and perhaps most importantly, the JIT must remain accessible for +contributors who do not work on it. This means committing to keeping the +interpreter approachable or decoupled from the JIT, to documenting the workflow +for regenerating generated code and contributing changes, and to keeping the +internals documentation current. The simplification of the optimizer's +operations shipped in 3.15, discussed in :ref:`Learnings `, is +an example of this maintenance investment in practice. Obligations on +redistributors who build and ship the JIT are described in +:ref:`A Better JIT Distribution Story `. + + +Backwards Compatibility +======================= + +Since the JIT is an optimization and not a change to the language, its central +compatibility guarantee is that a JIT-enabled build must produce behavior +indistinguishable from a non-JIT build, just faster: same results, same +exceptions and tracebacks, and same supported introspectable state. + +As described in :ref:`Compatibility Review `, we will conduct a +compatibility analysis on the top PyPI packages' test suite as a requirement to +regard the JIT as supported in CPython. + + +Security Implications +===================== + +As stated in :pep:`744`, CPython's JIT, like all JITs, produces large amounts +of executable data at runtime. This is an attack vector of all JIT compilers, a +malicious actor capable of influencing the contents of this data is therefore +capable of executing arbitrary code. + +In order to mitigate this risk, the JIT has been written with best practices in +mind. In particular, the data in question is not exposed by the JIT compiler to +other parts of the program while it remains writable, and at no point is the +data both writable and executable. + +The nature of template-based JITs also seriously limits the kinds of code that +can be generated, further reducing the likelihood of a successful exploit. As +an additional precaution, the templates themselves are stored in static, +read-only memory. + +However, it would be naive to assume that no possible vulnerabilities exist in +the JIT, especially at this early stage. The authors are not security experts, +but will work closely with the Python Security Response Team to triage and fix +security issues as they arise. + +Supporting CET/BTI has also been requested by Fedora maintainers +[#issue-149697]_. We believe supporting this option in the generated stencils +is required for meeting our goals for security. + +Finally, since the inception of :pep:`744`, multiple fuzzing projects have been +initiated to fuzz the JIT. For example, Lafleur [#la-fleur]_ has found +numerous JIT bugs that lead to crashes or wrong optimizations (mostly in the +JIT middle-end, not the backend). We will continue using these projects to fuzz +the JIT. + + +How to Teach This +================= + +For the vast majority of Python users, the most important thing to teach about +the JIT is that there is nothing they need to do and nothing they need to watch +out for. No code should need to be rewritten to benefit from the JIT, and none +should need to be changed to remain correct. + +For users who want a mental model, a short and accurate one is enough: the JIT +is an optimization layer that sits above the interpreter and compiles +frequently-executed code to machine code on the fly. It changes how fast a +program runs, not what it does. This framing is sufficient for most educational +contexts and does not require teaching the internals (for example, uops, +optimizer, code generation). + +Two audiences that need more specific guidance are redistributors and +packagers. These users will need to understand the build-time requirements and +the path toward distributable artifacts (see +:ref:`A Better JIT Distribution Story `). Maintainers of +debuggers, profilers, and other native tooling need to know that JIT frames are +unwindable on supported platforms and what they can rely on when inspecting a +running process. Python stack unwinders will need to understand the JIT frame +layout and recover information during symbolization (see +:ref:`Tooling Support `). + +Finally, core developers only need to care about the JIT if they want their +feature to be optimized by it. Otherwise, the current JIT architecture means +that core developers working on the interpreter or other parts of the runtime +do not need to care that a JIT exists, apart from the occasional CI breakage. +Once the JIT is regarded as supported, it should not be broken catastrophically +by any new changes. However, we expect that in almost all cases, introducing a +new feature to Python will not be obstructed by a JIT, unless the contributor +explicitly wants the JIT to support their feature or optimize for it. Once +again, see for example the lazy imports initial implementation which modified +bytecode, but did not need to modify the JIT other than ``#include`` the new +headers introduced [#lazy-imports-pr]_. + + +Reference Implementation +======================== + +The current implementation for the JIT can be found in CPython's main branch, +largely in: + +* ``Tools/jit/README.md``: Instructions for how to build the JIT. +* ``Python/jit.c``: The entire backend portion of the JIT compiler. +* ``Python/optimizer.c``: Part of the frontend of the JIT compiler (partially + shared from ``Python/ceval.c``). +* ``Python/optimizer_analysis.c``: The middle-end of the JIT compiler. +* ``Python/optimizer_bytecodes.c``: The middle-end of the JIT compiler's + optimization rules. +* ``jit_stencils.h``: An example of the JIT's build-time generated templates + (not currently checked into CPython repository). +* ``Tools/jit/template.c``: The code which is compiled to produce the JIT's + templates. +* ``Tools/jit/_targets.py``: The code to compile and parse the templates at + build time. + +While this PEP does propose and outline an evolution for the JIT (transitioning +from tracing to method-based, with heavy reuse of existing code), it does not +prescribe a particular implementation of that design. With that said, a working +proof-of-concept implementation against main exists, and will be shared soon. + +Despite the fact that it is currently under development and incomplete (it does +not yet handle generators and coroutines, for example, and has no support for +polymorphism, both of which are supported partially by the existing tracing +frontend), it is still 4-5% faster on the pyperformance and Pyston +macrobenchmark suites vs. JIT off, on a GIL-enabled build. This demonstrates +that the new design developed in just a few weeks can be competitive with the +existing tracing design (which is 7-8% faster on the same x86-64 Linux +configuration after 3 years of work evolving it). + +Excluding tests, the size of the current method-JIT implementation vs. main is +approximately as follows: + ++---------------+---------------+-------------+---------------+ +| File Type | Files Changed | Lines Added | Lines Removed | ++===============+===============+=============+===============+ +| Generated | 13 | 5300 | 4200 | ++---------------+---------------+-------------+---------------+ +| Non-Generated | 38 | 2700 | 5800 | ++---------------+---------------+-------------+---------------+ +| Total | 51 | 8000 | 10000 | ++---------------+---------------+-------------+---------------+ + +Broken down by file extension: + ++-----------+---------------+-------------+---------------+ +| Extension | Files Changed | Lines Added | Lines Removed | ++===========+===============+=============+===============+ +| .c | 20 | 2300 | 5400 | ++-----------+---------------+-------------+---------------+ +| .c.h | 6 | 3300 | 2400 | ++-----------+---------------+-------------+---------------+ +| .h | 20 | 2300 | 2100 | ++-----------+---------------+-------------+---------------+ +| .py | 5 | 100 | 100 | ++-----------+---------------+-------------+---------------+ +| Total | 51 | 8000 | 10000 | ++-----------+---------------+-------------+---------------+ + + +Rejected Ideas +============== + +Maintain the JIT Outside of CPython Main +---------------------------------------- + +It has been suggested, both during the JIT's history and in recent discussion, +that a compiler of this complexity might be better developed and maintained out +of tree or as a separate project, rather than in CPython's main branch. +However, keeping the JIT in main is a deliberate and hugely beneficial choice, +originally articulated in :pep:`744`: it allows the JIT to be co-developed with +the interpreter and maintained by the broader group of core developers and +contributors rather than a small set of specialists working on a fork. The uops +the JIT consumes are also co-designed with and regenerated from the +interpreter. An out-of-tree JIT would have to track those definitions across a +branch boundary, which raises the maintenance cost and the risk of drift +precisely in the area where correctness matters most. The growth of the +contributor base during the 3.15 cycle (see +:ref:`Maintenance Model `) is itself evidence that in-tree +development lowers, rather than raises, the barrier to participation. Keeping +the JIT in the main branch of CPython also allows us to have a better pulse on +the needs of distributors, and means that it's easier for end users to try out +the JIT and let us know what behavior they observe and what issues they find. + + +Pluggable JIT Infrastructure +---------------------------- + +Another recurring idea is for CPython to expose a stable, general-purpose +interface for plugging in arbitrary third-party JIT compilers, rather than +maintaining one in tree. This PEP rejects this idea for the same reasons as +maintaining the JIT outside of CPython main. Introducing a pluggable JIT risks +diverting contributor effort and increases maintenance overhead. For example, +an earlier version of the JIT in 3.13 had a semi-public experimental API. +However, it leaked internal details to users and made internal JIT development +more difficult. Thus, we removed it. Furthermore, most language runtime JITs +are deeply integrated with their respective runtimes to the extent that a +pluggable JIT infrastructure may not be feasible. + +We agree however, that efforts that maintain a JIT outside of CPython using +:pep:`523`, such as CinderX and TorchDynamo are commendable. We believe the +discussion to be had for improving the pre-existing interfaces are best left to +a separate PEP, and consider them out of scope for this one. + + +Dropping the Build-Time LLVM Requirement +---------------------------------------- + +This PEP does not propose changing the JIT's reliance on the LLVM toolchain at +build-time. We treat reducing build-time friction as important, but not as a +precondition for agreeing on the path outlined here. :pep:`774` proposes a +solution for removing the LLVM prerequisite but at time of submission, the +sitting Steering Council decided to defer making a decision on it until the JIT +had paid more substantial performance gains. We would also like to keep +exploring options in this space and as such, would like to save this for a +separate PEP. + + +A Higher-Tier JIT +----------------- + +We believe that multi-tiered JITs produce great performance and compelling +warmup times. However, we also believe that for the time being, CPython's +complexity and maintenance budget may not support such an endeavour. We are not +saying this should never happen. Rather, our goal is to produce the best JIT we +can for the current state of CPython, given the constraints we can work with. +For that, we reject building yet another JIT on top of the current one for peak +performance. + + +Enable/Support the Current JIT As-Is +------------------------------------ + +The current JIT is undoubtedly the product of much attention and care; we thank +everyone who contributed to it. However, we understand the community as a whole +have concerns that are still unaddressed and therefore need remedying. We also +acknowledge that Python, and indeed CPython, is so widely-used that a change of +this scale must be properly examined and considered before it can be a part of +the project proper. As such, the current JIT cannot be enabled without more +scrutiny and evolution. + + +Open Issues +=========== + +None at this time. + + +Appendix +======== + +.. _appendix-jit-speedup-2wk: + +Average JIT Speedup by Machine +------------------------------ + +Calculated from 2026-06-16 to 2026-06-27. + ++-------------------------------------+--------------+-------------+--------------+------+-------------+ +| Machine | Config | Avg speedup | Result | Days | Range | ++=====================================+==============+=============+==============+======+=============+ +| jones (M3 Pro, macOS) | JIT+TAILCALL | 1.126x | 12.6% faster | 9 | 1.050-1.180 | ++-------------------------------------+--------------+-------------+--------------+------+-------------+ +| sulaco (AmpereOne, Linux aarch64) | JIT | 1.073x | 7.3% faster | 7 | 1.060-1.080 | ++-------------------------------------+--------------+-------------+--------------+------+-------------+ +| ripley (i5-8400, Linux x86_64) | JIT | 1.069x | 6.9% faster | 9 | 1.060-1.070 | ++-------------------------------------+--------------+-------------+--------------+------+-------------+ +| prometheus (Ryzen 5 3600X, Windows) | JIT+TAILCALL | 1.047x | 4.7% faster | 9 | 1.040-1.050 | ++-------------------------------------+--------------+-------------+--------------+------+-------------+ + +.. note:: + + Note that JIT+TAILCALL is used on Windows and Mac run as regular CPython + builds ship with tailcalling enabled. All data used for this calculation can + be found on Does JIT Go Brrr? [#does-jit-go-brrr]_. + + +Footnotes +========= + +.. [#whats-new] `3.15 What's New + `__ +.. [#jit-on-track] `Python 3.15's JIT is now back on track + `__ +.. [#jit-ci] `jit.yml in CPython + `__ +.. [#does-jit-go-brrr] `Does JIT Go Brrr? + `__ +.. [#la-beille] `labeille - Hunt for CPython JIT bugs by running real-world + test suites `__ +.. [#la-fleur] `lafleur - A feedback-driven, evolutionary fuzzer for the + CPython JIT compiler `__ +.. [#issue-149697] `JIT shim object drops GNU property notes (CET/BTI/PAC) from + output binaries `__ +.. [#trace-recording-docs] `The Trace Recorder and Executors + `__ +.. [#lazy-imports-pr] `Initial lazy imports implementation + `__ +.. [#community-perspectives] `Community perspectives on the JIT: experiences, + expectations, and concerns - post 15 + `__ + + +Change History +============== + +None at this time. + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. From 57c6d4c50df459ebf96bb998b280bd27adf4c36b Mon Sep 17 00:00:00 2001 From: Savannah Ostrowski Date: Thu, 2 Jul 2026 20:04:27 -0700 Subject: [PATCH 2/8] A bit of cleanup --- peps/pep-0836.rst | 163 +++++++++++++++++++++++----------------------- 1 file changed, 81 insertions(+), 82 deletions(-) diff --git a/peps/pep-0836.rst b/peps/pep-0836.rst index bf7c50b6f16..acfb6f68156 100644 --- a/peps/pep-0836.rst +++ b/peps/pep-0836.rst @@ -20,10 +20,10 @@ explicitly deferred a number of questions about the JIT's long-term status. Since then, the JIT has been re-architected and matured considerably. In Python 3.15, it delivers a measurable, reproducible speedup over the interpreter (about 4-12% geometric mean performance improvement across measured Tier 1 -platforms; see :ref:`appendix-jit-speedup-2wk`), emits frames that native -debuggers can unwind through, and reduces the memory footprint of generated -code relative to 3.14. Along the way, we have learned a good deal about what -works for a JIT in CPython. +platforms (see :ref:`Appendix `), +emits frames that native debuggers can unwind through, and reduces the memory +footprint of generated code relative to 3.14. Along the way, we have learned a +good deal about what works for a JIT in CPython. This PEP proposes a path for the JIT to become a supported, non-experimental part of CPython if it meets measurable performance, compatibility, tooling, @@ -55,9 +55,9 @@ At a high level, these are our milestones and goals for the JIT over the next * **Year 1 (ending with Python 3.16's first beta) - Developer experience improvements** - * :ref:`Evolve the frontend ` from trace recording to - method-based. We believe that a method frontend will put us on a path that - allows for easier maintenance, teachability, debugging, and so on. The + * :ref:`Evolve the frontend from trace recording to method-based + `. We believe that a method frontend will put us on a + path that allows for easier maintenance, teachability, debugging, etc. The first implementation should be minimal, may initially use more memory or perform slightly worse, and may be rolled back to the current tracing frontend if the approach does not meet the project's goals in the first @@ -67,12 +67,12 @@ At a high level, these are our milestones and goals for the JIT over the next believe that this is important to prioritize early on in the next phase of the JIT as free-threading adoption is expanding rapidly. - * :ref:`Add further tooling testing `, and address any - discovered remaining gaps in coverage, for native and Python profilers and - debuggers. At a minimum, this will include anything that uses frame - pointers to unwind, but should also be expanded to support tools that - symbolize Python frames. Third-party tooling must have documented - remediation paths when existing behavior cannot be preserved exactly. + * :ref:`Add further testing (and address any discovered remaining gaps in + coverage) for native and Python profilers and debuggers `. + At a minimum, this will include anything that uses frame pointers to + unwind, but should also be expanded to support tools that symbolize Python + frames. Third-party tooling must have documented remediation paths when + existing behavior cannot be preserved exactly. * :ref:`A better JIT distribution story `. Provide redistributors with a documented and reproducible way to build or verify @@ -130,10 +130,10 @@ guarantees, maintenance commitments, success metrics, timelines, tooling compatibility, impact on redistributors, its relationship to other JITs, and likely architectural evolution. -The current CPython JIT has shown some promising results (see -:ref:`appendix-jit-speedup-2wk`), especially in the last 9-12 months. However, -as with any good experiment, it's important to evaluate the current approach -and evolve plans based on what we've learned. +The current CPython JIT has shown some promising results +(see :ref:`Appendix `), especially in +the last 9-12 months. However, as with any good experiment, it's important to +evaluate the current approach and evolve plans based on what we've learned. .. _current-state: @@ -143,14 +143,15 @@ Current State As of CPython 3.15, the current JIT compiler is roughly 4-12% faster geometric mean on the pyperformance benchmark suite compared to the interpreter across -measured Tier 1 platforms (see :ref:`appendix-jit-speedup-2wk`). In order to +measured Tier 1 platforms +(see :ref:`Appendix `). In order to achieve this, the JIT and supporting infrastructure has undergone a number of revisions across the last four major versions of Python: * **3.12:** Introduction of the CPython bytecode DSL, and refactoring of interpreter bytecodes to micro-operations ("uops"). * **3.13:** JIT trace projector, optimizer and Copy and Patch backend - introduced, and :pep:`744` was written. + introduced, :pep:`744` written. * **3.14:** More refactoring of interpreter bytecodes and optimizer work. * **3.15:** JIT tracer rewritten to recording, JIT optimizer improvements, community engagement and involvement. @@ -202,13 +203,13 @@ successful: the JIT is self-contained: adding new features to CPython generally do not need JIT support unless implementers want the JIT to optimize the feature. For - example, the initial lazy imports pull request [#lazy-imports-pr]_ did not + example, the initial lazy imports pull request[#lazy-imports-pr]_ did not require touching any JIT files apart from adding new headers to include in C. * **A JIT optimizer that resembles the CPython interpreter.** The current JIT optimizer (middle-end), analyzes type information over CPython uops. The key maintainability advantage here is that the middle-end is written in a similar - fashion as the normal CPython interpreter: as a bytecode DSL over an + fashion as the normal CPython interpreter -- as a bytecode DSL over an interpreter. However, instead of interpreting objects, we interpret types of the objects. This means knowledge of the CPython interpreter is transferrable to the JIT optimizer, and if a contributor knows how to work on the @@ -241,12 +242,12 @@ CPython and what could be improved upon: with the wider community and their requests/needs is critical for the project. This includes talking to system distributors for example, or maintainers of third-party tooling, understanding their concerns, and - accommodating them better. + accommodating them better * **Reconsidering if our current JIT frontend is the right fit for CPython.** - The current tracing JIT has some great benefits, as discussed in - :ref:`Learnings `. However, a successful JIT is much more than - just good performance; we must consider other factors like maintainability, + The current tracing JIT has some great benefits (see + :ref:`Learnings `). However, a successful JIT is much more than + just good performance, we must consider other factors like maintainability, testability, and teachability too. Furthermore, as the JIT matures, the cost-benefit proposition of tracing in CPython shifts. To be clear, this is not a value judgement of tracing as an approach, but rather an assessment of @@ -254,8 +255,8 @@ CPython and what could be improved upon: whom implemented the current tracing frontend in CPython 3.15: * **Tracing's initial ease does not seem to continue in the medium term in - the case of CPython.** As described in :ref:`Learnings `, - tracing is easy to start with. However the simple implementation of tracing + the case of CPython.** As mentioned in :ref:`Learnings `, tracing + is easy to start with. However the simple implementation of tracing in CPython yielded no speedups initially in 3.13 and 3.14. Only when we shifted to a more complex tracing runtime and modified the interpreter did we experience performance gains. We believe the initial ease of tracing @@ -290,7 +291,8 @@ CPython and what could be improved upon: * **We do not have a strong pulse on whether the JIT currently benefits larger real-world workloads.** At present, the JIT is primarily measured and evaluated via pyperformance benchmark suite runs. We would like to spend more - time evaluating its impact on real end-user code (see :ref:`compatibility`). + time evaluating its impact on real end-user code + (see :ref:`Compatibility Review `). * **The distribution story should be improved and codified.** Distributor feedback so far suggests that LLVM itself is not always the main obstacle as @@ -306,18 +308,19 @@ CPython and what could be improved upon: Rationale ========= -As described in :ref:`Current State `, the JIT has achieved -roughly 4-12% faster geometric mean on the pyperformance benchmark suite for -measured Tier 1 platforms (see :ref:`appendix-jit-speedup-2wk`), with some -limitations, challenges and areas of improvement. In this next phase of the -JIT, **we want to set an initial ambitious but attainable target of at least -20% performance improvement over the interpreter on the free-threaded build -achieved within the next 2.5 years (in other words, by Python 3.17).** +As noted :ref:`above `, the JIT has achieved roughly 4-12% +faster geometric mean on the pyperformance benchmark suite for measured Tier 1 +platforms (see :ref:`Appendix `), with +some limitations, challenges and areas of improvement. In this next phase of +the JIT, **we want to set an initial ambitious but attainable target of at +least 20% performance improvement over the interpreter on the free-threaded +build achieved within the next 2.5 years (in other words, by Python 3.17).** However, we know that performance for performance sake and at the cost of tooling incompatibility is not meaningful or attractive for the project. As such, we want to enter this next phase intentionally and with a clear plan, -enumerated in detail in the :ref:`Specification ` section. +enumerated in detail in the :ref:`specification section below +`. .. _jit-specification: @@ -397,8 +400,7 @@ method frontend are as follows: The following is the reference design. It is subject to change as the code evolves: -* **Uop IR.** The benefits for this are explained in - :ref:`Learnings `. +* **Uop IR.** The benefits for this are explained in previous sections. * **Some Single Static Assignment (SSA) form properties over the stack.** This does not mean we need to rewrite our IR to SSA form, but rather, the optimizer should have some SSA properties. We believe this aligns more @@ -417,9 +419,9 @@ evolves: better generator/coroutine/loop/etc. (high-level construct) analysis and optimizations. -With the design described in this section, most optimizations in the JIT can be -implemented as local rewrites. This is again, inspired by certain properties of -other runtime's intermediate representations. Our goal is to make the JIT more +With all of the above, most optimizations in the JIT can be implemented as +local rewrites. This is again, inspired by certain properties of other +runtime's intermediate representations. Our goal is to make the JIT more traditional and teachable, without sacrificing what we can optimize. We do acknowledge that a method JIT requires joining control-flow. However, we believe this is not a large conceptual overhead, as a tracing JIT already @@ -433,7 +435,7 @@ worklist algorithms to drive the pre-existing optimizer/analysis pass. We can proceed to remove most of the current tracing frontend from the JIT from the interpreter, which will simplify the interpreter's core dispatch mechanism and simplify the main interpreter loop. We believe these are not foreign concepts -to CPython; the current bytecode compiler in CPython already represents +to CPython -- the current bytecode compiler in CPython already represents control-flow graphs and has worklist algorithms. Finally, both method and tracing JITs gain complexity and have various @@ -452,9 +454,9 @@ needed: a proposed method JIT may also require certain additional features (in literature) like recording extra type profiling data in an extra side table. However, the complexity can be greatly mitigated by the current bytecode DSL and automatically generating the profiling operations, similar to how the -current tracing JIT already does things. We also propose mitigation strategies -in :ref:`Optimizations `. We thus believe the conceptual and -maintenance leap is not a huge one. +current tracing JIT already does things. We also propose solutions to +mitigate them in :ref:`Optimizations `. We thus believe +the conceptual and maintenance leap is not huge. .. _jit-optimizations: @@ -496,7 +498,7 @@ recording or method frontend is used: * Lock removal on free-threading * Detecting deferred reclamation to reduce escaping sites in the JIT on free-threading -* Conservating unboxing of integers, floats, and small strings. +* Conservative unboxing of integers, floats, and small strings To recover the optimizations tracing gives for free, we plan to explore: @@ -556,11 +558,11 @@ pyperformance. For example: when it detects that objects are uniquely referenced. This is a source of slowdown on architectures where atomics are more expensive. -**We believe that the right framing here is not the JIT or free-threading, but -rather, the JIT and free-threading.** We understand the JIT may initially lose -some performance opportunities from free threading's semantics. However, both -the JIT and free threading have much to gain. The JIT can recover all of free -threading's single-threaded performance losses and maybe even more. +**We believe that the right framing here is not the JIT *or* free-threading, +but rather, the JIT *and* free-threading**. We understand the JIT may initially +lose some performance opportunities from free threading's semantics. However, +both the JIT and free threading have much to gain. The JIT can recover all of +free threading's single-threaded performance losses and maybe even more. .. _distribution: @@ -568,8 +570,8 @@ threading's single-threaded performance losses and maybe even more. A Better JIT Distribution Story ------------------------------- -We can choose to adopt the solution proposed in :pep:`774` or allow a range of -LLVM versions to build the JIT. This is up for more discussion and +We can choose to adopt either :pep:`774`'s solution or allow a range of LLVM +versions to build the JIT. This is up for more discussion and experimentation. At minimum, feedback from popular Linux distributors must be collected, deliberated on, and incorporated into a holistic solution. Our current understanding of the situation is that supporting multiple LLVM @@ -705,10 +707,10 @@ contributors who do not work on it. This means committing to keeping the interpreter approachable or decoupled from the JIT, to documenting the workflow for regenerating generated code and contributing changes, and to keeping the internals documentation current. The simplification of the optimizer's -operations shipped in 3.15, discussed in :ref:`Learnings `, is -an example of this maintenance investment in practice. Obligations on -redistributors who build and ship the JIT are described in -:ref:`A Better JIT Distribution Story `. +operations shipped in 3.15 (see :ref:`Learnings `) +is an example of this maintenance investment in practice. Obligations on +redistributors who build and ship the JIT are described in the +:ref:`A Better JIT Distribution Story ` section. Backwards Compatibility @@ -719,9 +721,9 @@ compatibility guarantee is that a JIT-enabled build must produce behavior indistinguishable from a non-JIT build, just faster: same results, same exceptions and tracebacks, and same supported introspectable state. -As described in :ref:`Compatibility Review `, we will conduct a -compatibility analysis on the top PyPI packages' test suite as a requirement to -regard the JIT as supported in CPython. +As :ref:`covered above `, we will conduct a compatibility +analysis on the top PyPI packages' test suite as a requirement to regard the +JIT as supported in CPython. Security Implications @@ -751,7 +753,7 @@ Supporting CET/BTI has also been requested by Fedora maintainers [#issue-149697]_. We believe supporting this option in the generated stencils is required for meeting our goals for security. -Finally, since the inception of :pep:`744`, multiple fuzzing projects have been +Finally, since :pep:`744`'s inception, multiple fuzzing projects have been initiated to fuzz the JIT. For example, Lafleur [#la-fleur]_ has found numerous JIT bugs that lead to crashes or wrong optimizations (mostly in the JIT middle-end, not the backend). We will continue using these projects to fuzz @@ -773,15 +775,14 @@ program runs, not what it does. This framing is sufficient for most educational contexts and does not require teaching the internals (for example, uops, optimizer, code generation). -Two audiences that need more specific guidance are redistributors and -packagers. These users will need to understand the build-time requirements and -the path toward distributable artifacts (see -:ref:`A Better JIT Distribution Story `). Maintainers of -debuggers, profilers, and other native tooling need to know that JIT frames are -unwindable on supported platforms and what they can rely on when inspecting a -running process. Python stack unwinders will need to understand the JIT frame -layout and recover information during symbolization (see -:ref:`Tooling Support `). +Two audiences that need more specific guidance - redistributors and packagers. +These users will need to understand the build-time requirements and the path +toward distributable artifacts (see :ref:`"A Better JIT Distribution Story" +`). Maintainers of debuggers, profilers, and other native +tooling need to know that JIT frames are unwindable on supported platforms and +what they can rely on when inspecting a running process. Python stack unwinders +will need to understand the JIT frame layout and recover information during +symbolization (see :ref:`"Tooling Support" `). Finally, core developers only need to care about the JIT if they want their feature to be optimized by it. Otherwise, the current JIT architecture means @@ -863,7 +864,7 @@ Broken down by file extension: Rejected Ideas ============== -Maintain the JIT Outside of CPython Main +Maintain the JIT Outside of CPython main ---------------------------------------- It has been suggested, both during the JIT's history and in recent discussion, @@ -933,13 +934,13 @@ performance. Enable/Support the Current JIT As-Is ------------------------------------ -The current JIT is undoubtedly the product of much attention and care; we thank -everyone who contributed to it. However, we understand the community as a whole -have concerns that are still unaddressed and therefore need remedying. We also -acknowledge that Python, and indeed CPython, is so widely-used that a change of -this scale must be properly examined and considered before it can be a part of -the project proper. As such, the current JIT cannot be enabled without more -scrutiny and evolution. +The current JIT is undoubtedly the product of much attention and care -- we +thank everyone who contributed to it. However, we understand the community as a +whole have concerns that are still unaddressed and therefore need remedying. We +also acknowledge that Python, and indeed CPython, is so widely-used that a +change of this scale must be properly examined and considered before it can be +a part of the project proper. As such, the current JIT cannot be enabled +without more scrutiny and evolution. Open Issues @@ -953,10 +954,8 @@ Appendix .. _appendix-jit-speedup-2wk: -Average JIT Speedup by Machine ------------------------------- - -Calculated from 2026-06-16 to 2026-06-27. +Average JIT Speedup by Machine (calculated from 2026-06-16 to 2026-06-27) +------------------------------------------------------------------------- +-------------------------------------+--------------+-------------+--------------+------+-------------+ | Machine | Config | Avg speedup | Result | Days | Range | From e9908776be427813da19acdd91635bf90e23ec2b Mon Sep 17 00:00:00 2001 From: Savannah Ostrowski Date: Thu, 2 Jul 2026 20:20:21 -0700 Subject: [PATCH 3/8] A couple nits --- peps/pep-0836.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/peps/pep-0836.rst b/peps/pep-0836.rst index acfb6f68156..1ee13af89f2 100644 --- a/peps/pep-0836.rst +++ b/peps/pep-0836.rst @@ -827,7 +827,7 @@ not yet handle generators and coroutines, for example, and has no support for polymorphism, both of which are supported partially by the existing tracing frontend), it is still 4-5% faster on the pyperformance and Pyston macrobenchmark suites vs. JIT off, on a GIL-enabled build. This demonstrates -that the new design developed in just a few weeks can be competitive with the +that the new design developed in just a couple of months can be competitive with the existing tracing design (which is 7-8% faster on the same x86-64 Linux configuration after 3 years of work evolving it). @@ -895,10 +895,10 @@ maintaining one in tree. This PEP rejects this idea for the same reasons as maintaining the JIT outside of CPython main. Introducing a pluggable JIT risks diverting contributor effort and increases maintenance overhead. For example, an earlier version of the JIT in 3.13 had a semi-public experimental API. -However, it leaked internal details to users and made internal JIT development -more difficult. Thus, we removed it. Furthermore, most language runtime JITs -are deeply integrated with their respective runtimes to the extent that a -pluggable JIT infrastructure may not be feasible. +However, it leaked internal details to "users" (there were none) and made +internal JIT development more difficult. Thus, we removed it. Furthermore, +most language runtime JITs are deeply integrated with their respective +runtimes to the extent that a pluggable JIT infrastructure may not be feasible. We agree however, that efforts that maintain a JIT outside of CPython using :pep:`523`, such as CinderX and TorchDynamo are commendable. We believe the From bbb4face898b756f417fb411fd64d456e2705a72 Mon Sep 17 00:00:00 2001 From: Savannah Ostrowski Date: Thu, 2 Jul 2026 20:58:16 -0700 Subject: [PATCH 4/8] Apply suggestions from code review Co-authored-by: Wulian233 <1055917385@qq.com> --- peps/pep-0836.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/peps/pep-0836.rst b/peps/pep-0836.rst index 1ee13af89f2..7845feeb625 100644 --- a/peps/pep-0836.rst +++ b/peps/pep-0836.rst @@ -242,7 +242,7 @@ CPython and what could be improved upon: with the wider community and their requests/needs is critical for the project. This includes talking to system distributors for example, or maintainers of third-party tooling, understanding their concerns, and - accommodating them better + accommodating them better. * **Reconsidering if our current JIT frontend is the right fit for CPython.** The current tracing JIT has some great benefits (see @@ -316,7 +316,7 @@ the JIT, **we want to set an initial ambitious but attainable target of at least 20% performance improvement over the interpreter on the free-threaded build achieved within the next 2.5 years (in other words, by Python 3.17).** -However, we know that performance for performance sake and at the cost of +However, we know that performance for performance's sake and at the cost of tooling incompatibility is not meaningful or attractive for the project. As such, we want to enter this next phase intentionally and with a clear plan, enumerated in detail in the :ref:`specification section below @@ -575,7 +575,7 @@ versions to build the JIT. This is up for more discussion and experimentation. At minimum, feedback from popular Linux distributors must be collected, deliberated on, and incorporated into a holistic solution. Our current understanding of the situation is that supporting multiple LLVM -versions is required. This should not be much additional complexity. However, +versions is required. This should not add much additional complexity. However, it may require more CI resources for testing. For the JIT to be successful, it must not unduly burden third-party distributors. @@ -730,7 +730,7 @@ Security Implications ===================== As stated in :pep:`744`, CPython's JIT, like all JITs, produces large amounts -of executable data at runtime. This is an attack vector of all JIT compilers, a +of executable data at runtime. This is an attack vector of all JIT compilers: a malicious actor capable of influencing the contents of this data is therefore capable of executing arbitrary code. @@ -811,7 +811,7 @@ largely in: * ``Python/optimizer_bytecodes.c``: The middle-end of the JIT compiler's optimization rules. * ``jit_stencils.h``: An example of the JIT's build-time generated templates - (not currently checked into CPython repository). + (not currently checked into the CPython repository). * ``Tools/jit/template.c``: The code which is compiled to produce the JIT's templates. * ``Tools/jit/_targets.py``: The code to compile and parse the templates at @@ -914,7 +914,7 @@ build-time. We treat reducing build-time friction as important, but not as a precondition for agreeing on the path outlined here. :pep:`774` proposes a solution for removing the LLVM prerequisite but at time of submission, the sitting Steering Council decided to defer making a decision on it until the JIT -had paid more substantial performance gains. We would also like to keep +had achieved more substantial performance gains. We would also like to keep exploring options in this space and as such, would like to save this for a separate PEP. @@ -971,7 +971,7 @@ Average JIT Speedup by Machine (calculated from 2026-06-16 to 2026-06-27) .. note:: - Note that JIT+TAILCALL is used on Windows and Mac run as regular CPython + Note that JIT+TAILCALL is used on Windows and macOS runs, as regular CPython builds ship with tailcalling enabled. All data used for this calculation can be found on Does JIT Go Brrr? [#does-jit-go-brrr]_. From aca1158c063e0450ff3c1a85bdc1db8870be0e46 Mon Sep 17 00:00:00 2001 From: Ken Jin Date: Fri, 3 Jul 2026 09:58:28 +0100 Subject: [PATCH 5/8] Address review comments Co-Authored-By: Jelle Zijlstra --- peps/pep-0836.rst | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/peps/pep-0836.rst b/peps/pep-0836.rst index 7845feeb625..4452041337c 100644 --- a/peps/pep-0836.rst +++ b/peps/pep-0836.rst @@ -197,17 +197,17 @@ successful: * **Generating a JIT translator automatically using our own tooling.** CPython's JIT can automatically generate bytecode to JIT Intermediate - Representation (IR) rules automatically using the bytecodes DSL. Again, this + Representation (IR) rules using the bytecodes DSL. Again, this means most JIT translations are correct-by-construction, reducing error-proneness and maintenance burden. This also means the complexity of the JIT is - self-contained: adding new features to CPython generally do not need JIT + self-contained: new features added to CPython generally do not need JIT support unless implementers want the JIT to optimize the feature. For - example, the initial lazy imports pull request[#lazy-imports-pr]_ did not + example, the initial lazy imports pull request [#lazy-imports-pr]_ did not require touching any JIT files apart from adding new headers to include in C. * **A JIT optimizer that resembles the CPython interpreter.** The current JIT - optimizer (middle-end), analyzes type information over CPython uops. The key + optimizer (middle-end) analyzes type information over CPython uops. The key maintainability advantage here is that the middle-end is written in a similar fashion as the normal CPython interpreter -- as a bytecode DSL over an interpreter. However, instead of interpreting objects, we interpret types of @@ -508,11 +508,11 @@ To recover the optimizations tracing gives for free, we plan to explore: * Respecializing instructions in the middle-end. One may argue that this introduces a lot of complexity in the method JIT. -However, optimization 1 involves no changes to the interpreter and only minimal -changes to the specializer. Optimization 2 can be found in standard compiler -textbooks. Optimization 3 is trivial to implement in current CPython due to +However, type profiling involves no changes to the interpreter and only minimal +changes to the specializer. Path splitting can be found in standard compiler +textbooks. Cold code elimination is trivial to implement in current CPython due to branch information tracking already in the interpreter (we can just choose not -to compile branches/blocks that are never taken). Optimization 4 can be done by +to compile branches/blocks that are never taken). Respecialization can be done by leveraging the existing specializer's decisions. This continues the trend that we feel method JITs are less entangled with the interpreter in the case of CPython. @@ -775,7 +775,7 @@ program runs, not what it does. This framing is sufficient for most educational contexts and does not require teaching the internals (for example, uops, optimizer, code generation). -Two audiences that need more specific guidance - redistributors and packagers. +Two audiences need more specific guidance: redistributors and packagers. These users will need to understand the build-time requirements and the path toward distributable artifacts (see :ref:`"A Better JIT Distribution Story" `). Maintainers of debuggers, profilers, and other native @@ -901,8 +901,8 @@ most language runtime JITs are deeply integrated with their respective runtimes to the extent that a pluggable JIT infrastructure may not be feasible. We agree however, that efforts that maintain a JIT outside of CPython using -:pep:`523`, such as CinderX and TorchDynamo are commendable. We believe the -discussion to be had for improving the pre-existing interfaces are best left to +:pep:`523`, such as CinderX and TorchDynamo, are commendable. We believe the +discussions to be had for improving the pre-existing interfaces are best left to a separate PEP, and consider them out of scope for this one. From 7b111cc260f92bd2530d2773b422c697a2137d07 Mon Sep 17 00:00:00 2001 From: Ken Jin Date: Fri, 3 Jul 2026 10:11:08 +0100 Subject: [PATCH 6/8] Clarify perf lower bound for FT/GIL --- peps/pep-0836.rst | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/peps/pep-0836.rst b/peps/pep-0836.rst index 4452041337c..507e318e82c 100644 --- a/peps/pep-0836.rst +++ b/peps/pep-0836.rst @@ -79,8 +79,10 @@ At a high level, these are our milestones and goals for the JIT over the next JIT stencils, without requiring long-term dependence on one exact LLVM version [#community-perspectives]_. - * **No lower than 5% uplift on the JIT versus the interpreter on the mean of - our supported platforms after implementing the above changes.** In other + * **No lower than 4% uplift on the JIT + free-threading versus + the free-threading interpreter on the mean of our supported platforms + after implementing the above changes.** Additionally, no lower than 5% + uplift on the JIT + GIL build versus the GIL interpreter. In other words, we will not significantly regress base performance improvements while in pursuit of longer-term goals. We will also not discourage other contributors from contributing performance improvements during this stage. @@ -89,7 +91,7 @@ At a high level, these are our milestones and goals for the JIT over the next * **Year 2 (ending with Python 3.17's first beta) - Improved performance** * **Achieve at least 20% performance geometric mean improvement on - pyperformance for JIT + free-threading compared to free-threading + pyperformance for JIT + free-threading compared to the free-threading interpreter alone.** This is the minimum target for keeping the JIT in CPython main, with free-threading treated as the primary performance focus. From e258477aad0082e1320e6575b304d3f10dba2808 Mon Sep 17 00:00:00 2001 From: Ken Jin Date: Fri, 3 Jul 2026 10:14:42 +0100 Subject: [PATCH 7/8] Clarify again --- peps/pep-0836.rst | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/peps/pep-0836.rst b/peps/pep-0836.rst index 507e318e82c..9310013fb41 100644 --- a/peps/pep-0836.rst +++ b/peps/pep-0836.rst @@ -79,14 +79,12 @@ At a high level, these are our milestones and goals for the JIT over the next JIT stencils, without requiring long-term dependence on one exact LLVM version [#community-perspectives]_. - * **No lower than 4% uplift on the JIT + free-threading versus - the free-threading interpreter on the mean of our supported platforms - after implementing the above changes.** Additionally, no lower than 5% - uplift on the JIT + GIL build versus the GIL interpreter. In other - words, we will not significantly regress base performance improvements - while in pursuit of longer-term goals. We will also not discourage other - contributors from contributing performance improvements during this stage. - However, our main focus will be the developer experience improvements. + * **No lower than 5% uplift for JIT + GIL versus the GIL + interpreter alone.** In other words, we will not significantly regress + existing performance improvements while in pursuit of longer-term goals. We + will also not discourage other contributors from contributing performance + improvements during this stage. However, our main focus will be the + developer experience improvements. * **Year 2 (ending with Python 3.17's first beta) - Improved performance** From d1f454afb34447ed33bc6468373a16a26731a9b7 Mon Sep 17 00:00:00 2001 From: Ken Jin Date: Fri, 3 Jul 2026 10:20:11 +0100 Subject: [PATCH 8/8] standardize on free-threading, can change later if needed --- peps/pep-0836.rst | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/peps/pep-0836.rst b/peps/pep-0836.rst index 9310013fb41..4417d0e67da 100644 --- a/peps/pep-0836.rst +++ b/peps/pep-0836.rst @@ -523,32 +523,32 @@ CPython. First-Class Support for Free-Threading -------------------------------------- -Free threading is already a part of Python's future, and the current JIT must -be made free threading safe as soon as possible to be a viable option for +free-threading is already a part of Python's future, and the current JIT must +be made free-threading safe as soon as possible to be a viable option for improved performance. This involves making the frontend and middle-end's -optimizations free threading safe (the backend should already be safe). We do -not anticipate that a method frontend will make free threading support more +optimizations free-threading safe (the backend should already be safe). We do +not anticipate that a method frontend will make free-threading support more difficult over a tracing one. Furthermore, all major optimizations for the pre-existing JIT implemented in the past year have already been designed with -free threading in mind. However, a slight performance penalty may initially be -encountered as we remove free threading unsafe optimizations. For example, we +free-threading in mind. However, a slight performance penalty may initially be +encountered as we remove free-threading unsafe optimizations. For example, we anticipate that the major optimizations that need addressing will be globals/builtins dictionary and type watchers. Resolving attribute/global lookup at JIT compile time should still be feasible, but removing their guards -altogether may be unsafe in free threading. We expect a naive fix to produce a +altogether may be unsafe in free-threading. We expect a naive fix to produce a slight (1-2% geomean) performance hit initially. All future optimizations upon resuming JIT development will be reviewed with -free threading compatibility and performance impact required before merge. +free-threading compatibility and performance impact required before merge. Optimizations that rely solely on the GIL build and break on the free threaded build will be rejected. Additionally, the JIT may eventually even produce better performance versus the -free threading build than the current GIL build. Early experiments in the JIT +free-threading build than the current GIL build. Early experiments in the JIT suggest free threaded optimization may gain a few more percentage points on pyperformance. For example: -* Reference counting on the free threading build is more expensive than on the +* Reference counting on the free-threading build is more expensive than on the GIL build, and the JIT can eliminate much of reference counting. * The JIT has more leeway with the lifetimes of certain objects, due to deferred reference counting (see :pep:`703`) and Quiescent State-Based @@ -560,9 +560,9 @@ pyperformance. For example: **We believe that the right framing here is not the JIT *or* free-threading, but rather, the JIT *and* free-threading**. We understand the JIT may initially -lose some performance opportunities from free threading's semantics. However, -both the JIT and free threading have much to gain. The JIT can recover all of -free threading's single-threaded performance losses and maybe even more. +lose some performance opportunities from free-threading's semantics. However, +both the JIT and free-threading have much to gain. The JIT can recover all of +free-threading's single-threaded performance losses and maybe even more. .. _distribution: