diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index ea7a8d4543b..509e23f3e1e 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -710,6 +710,7 @@ peps/pep-0831.rst @pablogsal @Fidget-Spinner @savannahostrowski peps/pep-0832.rst @brettcannon peps/pep-0833.rst @dstufft peps/pep-0835.rst @ilevkivskyi +peps/pep-0836.rst @savannahostrowski @Fidget-Spinner @brandtbucher # ... peps/pep-2026.rst @hugovk # ... diff --git a/peps/pep-0836.rst b/peps/pep-0836.rst new file mode 100644 index 00000000000..4417d0e67da --- /dev/null +++ b/peps/pep-0836.rst @@ -0,0 +1,1015 @@ +PEP: 836 +Title: JIT Go Brrr: The Path to a Supported JIT Compiler for CPython +Author: Savannah Ostrowski , + Ken Jin , + Brandt Bucher +Discussions-To: Pending +Status: Draft +Type: Standards Track +Created: 02-Jul-2026 +Python-Version: 3.16 +Post-History: Pending + + +Abstract +======== + +The experimental Just-in-Time (JIT) compiler has been part of CPython's main +branch since Python 3.13. :pep:`744` described part of its initial design and +explicitly deferred a number of questions about the JIT's long-term status. +Since then, the JIT has been re-architected and matured considerably. In Python +3.15, it delivers a measurable, reproducible speedup over the interpreter +(about 4-12% geometric mean performance improvement across measured Tier 1 +platforms (see :ref:`Appendix `), +emits frames that native debuggers can unwind through, and reduces the memory +footprint of generated code relative to 3.14. Along the way, we have learned a +good deal about what works for a JIT in CPython. + +This PEP proposes a path for the JIT to become a supported, non-experimental +part of CPython if it meets measurable performance, compatibility, tooling, +platform, distribution, security, and maintenance goals. The initial +performance target is at least 20% geometric mean improvement on pyperformance +for the JIT + free-threaded build compared to the non-JIT free-threaded build's +interpreter, measured as the mean across the supported Tier 1 platforms, by the +first beta release of Python 3.17. The target is set as a minimum bar for +continued in-tree development of the JIT. + + +Proposal +======== + +This PEP does not propose declaring the JIT as supported immediately. Instead, +it proposes a time-bounded path for keeping JIT development in CPython main +while the project meets explicit performance, compatibility, tooling, +distribution, security, and maintenance goals. + +If these goals are met, the JIT can be promoted to a non-experimental feature +of CPython. If they are not met, the Steering Council and core team should +re-evaluate whether the JIT should remain in CPython main. After promotion, +enabling the JIT by default on supported platforms would require a separate +final approval from the Release Manager. + +At a high level, these are our milestones and goals for the JIT over the next +2.5 years: + +* **Year 1 (ending with Python 3.16's first beta) - Developer experience + improvements** + + * :ref:`Evolve the frontend from trace recording to method-based + `. We believe that a method frontend will put us on a + path that allows for easier maintenance, teachability, debugging, etc. The + first implementation should be minimal, may initially use more memory or + perform slightly worse, and may be rolled back to the current tracing + frontend if the approach does not meet the project's goals in the first + year. + + * :ref:`Make the JIT compatible with free-threading `. We + believe that this is important to prioritize early on in the next phase of + the JIT as free-threading adoption is expanding rapidly. + + * :ref:`Add further testing (and address any discovered remaining gaps in + coverage) for native and Python profilers and debuggers `. + At a minimum, this will include anything that uses frame pointers to + unwind, but should also be expanded to support tools that symbolize Python + frames. Third-party tooling must have documented remediation paths when + existing behavior cannot be preserved exactly. + + * :ref:`A better JIT distribution story `. Provide + redistributors with a documented and reproducible way to build or verify + JIT stencils, without requiring long-term dependence on one exact LLVM + version [#community-perspectives]_. + + * **No lower than 5% uplift for JIT + GIL versus the GIL + interpreter alone.** In other words, we will not significantly regress + existing performance improvements while in pursuit of longer-term goals. We + will also not discourage other contributors from contributing performance + improvements during this stage. However, our main focus will be the + developer experience improvements. + +* **Year 2 (ending with Python 3.17's first beta) - Improved performance** + + * **Achieve at least 20% performance geometric mean improvement on + pyperformance for JIT + free-threading compared to the free-threading + interpreter alone.** This is the minimum target for keeping the JIT in + CPython main, with free-threading treated as the primary performance focus. + +* **Year 2.5 (ending with Python 3.17's first release candidate) - Adoption + and compatibility** + + * :ref:`Compatibility review `. Run test suites for selected + popular PyPI packages and representative real-world workloads under the + JIT. Regressions should be triaged case by case, with fixes or documented + explanations for issues that reflect real compatibility breaks. + + +Motivation +========== + +Improving CPython's performance is essential to Python's future. A JIT compiler +is one of the few performance strategies that can improve CPython while +preserving the runtime that users, extension authors, embedders, distributors, +debuggers and profilers already target. Other dynamic languages, such as Ruby, +PHP and JavaScript, have successfully used JIT compilers to deliver substantial +performance improvements while maintaining compatibility with large existing +ecosystems. CPython has different constraints but the JIT is explicitly aimed +at improving performance within those constraints. + +Alternative Python implementations, such as PyPy and GraalPy, demonstrate that +larger speedups are possible for some Python programs, and we deeply value +those projects. However, many users cannot adopt an alternative runtime even +when it may perform better on their code due to factors such as supported +Python versions, extension compatibility, embedding requirements, deployment +constraints, and tooling support. + +:pep:`744` did valuable work explaining the JIT's copy-and-patch approach, made +the case for keeping the implementation in CPython main branch so that it could +be maintained by a broader group of volunteers, and sketched some criteria +under which the JIT might eventually graduate from an experimental state. +However, the original PEP for the JIT left many questions open about +guarantees, maintenance commitments, success metrics, timelines, tooling +compatibility, impact on redistributors, its relationship to other JITs, and +likely architectural evolution. + +The current CPython JIT has shown some promising results +(see :ref:`Appendix `), especially in +the last 9-12 months. However, as with any good experiment, it's important to +evaluate the current approach and evolve plans based on what we've learned. + + +.. _current-state: + +Current State +============= + +As of CPython 3.15, the current JIT compiler is roughly 4-12% faster geometric +mean on the pyperformance benchmark suite compared to the interpreter across +measured Tier 1 platforms +(see :ref:`Appendix `). In order to +achieve this, the JIT and supporting infrastructure has undergone a number of +revisions across the last four major versions of Python: + +* **3.12:** Introduction of the CPython bytecode DSL, and refactoring of + interpreter bytecodes to micro-operations ("uops"). +* **3.13:** JIT trace projector, optimizer and Copy and Patch backend + introduced, :pep:`744` written. +* **3.14:** More refactoring of interpreter bytecodes and optimizer work. +* **3.15:** JIT tracer rewritten to recording, JIT optimizer improvements, + community engagement and involvement. + +Today, the JIT is an experimental and opt-in part of CPython. The official +Python binaries for Windows and macOS ship with the JIT built but disabled by +default (end users can enable using ``PYTHON_JIT=1``). Other distributors and +certain Linux distributions, such as Fedora and Gentoo, are also known to do +the same. As it stands, the JIT requires an LLVM build-time requirement for +stencil generation. + +The JIT has garnered many excellent community contributors, and this has picked +up momentum in recent months. We are extremely grateful to these volunteers. A +sizable and active community now exists today, as evidenced by the contributor +list in CPython 3.15's What's New entry for the JIT [#whats-new]_. The JIT team +has learnt important lessons to attract new contributors, such as making +approachable work units in the public issue tracker, and mentorship. + + +.. _jit-learnings: + +Learnings +========= + +JIT projects evolve over their lifespan, as seen for example in CRuby which has +seen multiple JIT compilers. CPython's JIT is no different. + +CPython's JIT compiler has areas to improve. To be sustainable in the +long-term, meet our performance goals, and continue fostering community +engagement, certain tradeoffs are required. Tooling, compatibility with other +JIT projects, and free-threading must be first-class citizens. + +In our experience, there are several areas which we believe have been +successful: + +* **The bytecode DSL and uops.** This approach lowers maintenance burden even + in the interpreter, as repeated units of code can be shared without + interpretive overhead, and we can reduce error-proneness when modifying the + interpreter through bytecode validation. These should remain in CPython even + if the current JIT is unsuccessful and asked to be removed. The uops + themselves also form the intermediate representation for the JIT + automatically. + +* **Generating a JIT translator automatically using our own tooling.** + CPython's JIT can automatically generate bytecode to JIT Intermediate + Representation (IR) rules using the bytecodes DSL. Again, this + means most JIT translations are correct-by-construction, reducing + error-proneness and maintenance burden. This also means the complexity of + the JIT is + self-contained: new features added to CPython generally do not need JIT + support unless implementers want the JIT to optimize the feature. For + example, the initial lazy imports pull request [#lazy-imports-pr]_ did not + require touching any JIT files apart from adding new headers to include in C. + +* **A JIT optimizer that resembles the CPython interpreter.** The current JIT + optimizer (middle-end) analyzes type information over CPython uops. The key + maintainability advantage here is that the middle-end is written in a similar + fashion as the normal CPython interpreter -- as a bytecode DSL over an + interpreter. However, instead of interpreting objects, we interpret types of + the objects. This means knowledge of the CPython interpreter is transferrable + to the JIT optimizer, and if a contributor knows how to work on the + interpreter's bytecodes, they also know how to work on the JIT's middle-end. + +* **Generating a JIT machine code backend automatically using our own + tooling.** CPython's JIT does not require custom handwritten operations, as + the JIT machine code is generated automatically from the interpreter. This + further reduces the maintenance burden of a JIT and allows a small team to + maintain it for a wide variety of platforms. + +* **Trace recording provides some benefits naturally.** For example, + polymorphism, speculation, dead code elimination, value recording are well + handled in a trace recording JIT. + +* **Traces are an easy starting point.** Traces don't have control-flow within + them, making analysis simpler. + +* **A community-maintained JIT.** Despite partial funding from corporate + sources (which we are grateful for), a sizable portion of JIT work comes from + volunteers. Breaking the JIT into understandable chunks for contributors to + work on is an effective way of compartmentalizing complexity and encouraging + ownership. + +Conversely, we have also learned quite a bit about what has not worked for +CPython and what could be improved upon: + +* **We can continue to improve community outreach and engagement.** We are + taking active steps to onboard new members. However, continuous engagement + with the wider community and their requests/needs is critical for the + project. This includes talking to system distributors for example, or + maintainers of third-party tooling, understanding their concerns, and + accommodating them better. + +* **Reconsidering if our current JIT frontend is the right fit for CPython.** + The current tracing JIT has some great benefits (see + :ref:`Learnings `). However, a successful JIT is much more than + just good performance, we must consider other factors like maintainability, + testability, and teachability too. Furthermore, as the JIT matures, the + cost-benefit proposition of tracing in CPython shifts. To be clear, this is + not a value judgement of tracing as an approach, but rather an assessment of + its state within CPython. These observations are from the authors, some of + whom implemented the current tracing frontend in CPython 3.15: + + * **Tracing's initial ease does not seem to continue in the medium term in + the case of CPython.** As mentioned in :ref:`Learnings `, tracing + is easy to start with. However the simple implementation of tracing + in CPython yielded no speedups initially in 3.13 and 3.14. Only when we + shifted to a more complex tracing runtime and modified the interpreter did + we experience performance gains. We believe the initial ease of tracing + will be eroded as our tracing runtime matures. + + * **A mature tracing runtime's complexity seems to require many + non-conventional clever "tricks", in our experience.** For example, the + current trace recording mechanism relies on such tricks + [#trace-recording-docs]_ to make recording the interpreter execution + efficient and effective. Additional complexities include managing the trace + graph and its lifetime. We would like to reduce such tricks to make the JIT + easier to maintain and teach. Other frontends such as method-based ones + also implement tricks. However, they seem to be more well-studied in recent + years and thus more well-documented due to their prevalence in other + dynamic language runtimes. + + * **Tracing's interactions in CPython are nontraditional to teach and + analyze.** Tracing is commonly found in AI/ML compilers, but less + frequently used and taught in traditional compiler literature. We believe + this increases the barrier to entry for a new contributor who knows + compilers but does not know about CPython. Furthermore, when a trace + performs badly, its interactions with the CPython interpreter can be hard + to analyze. There can be myriad reasons for less predictable performance, + and analyzing them requires a deep understanding of the interpreter as + well. For example, the current trace recording runtime has complicated + tracing heuristics which decide whether to continue or terminate the trace. + These heuristics took a contributor and one of this PEP's authors many + attempts to get right (through no fault of their own). We wish to make it + easier to teach and onboard new contributors without requiring them to + deeply analyze the interpreter. + +* **We do not have a strong pulse on whether the JIT currently benefits larger + real-world workloads.** At present, the JIT is primarily measured and + evaluated via pyperformance benchmark suite runs. We would like to spend more + time evaluating its impact on real end-user code + (see :ref:`Compatibility Review `). + +* **The distribution story should be improved and codified.** Distributor + feedback so far suggests that LLVM itself is not always the main obstacle as + most can provide a recent LLVM toolchain. The harder problem is depending on + one exact LLVM version for the lifetime of a Python release, which can force + redistributors to carry multiple LLVM versions, rely on unsupported + toolchains, disable the JIT or maintain bespoke stencil generation workflows. + We must codify a solution that is workable for distributors. :pep:`774` is + one such solution but more research needs to be done to prevent each Linux + distribution rolling their own bespoke solution. + + +Rationale +========= + +As noted :ref:`above `, the JIT has achieved roughly 4-12% +faster geometric mean on the pyperformance benchmark suite for measured Tier 1 +platforms (see :ref:`Appendix `), with +some limitations, challenges and areas of improvement. In this next phase of +the JIT, **we want to set an initial ambitious but attainable target of at +least 20% performance improvement over the interpreter on the free-threaded +build achieved within the next 2.5 years (in other words, by Python 3.17).** + +However, we know that performance for performance's sake and at the cost of +tooling incompatibility is not meaningful or attractive for the project. As +such, we want to enter this next phase intentionally and with a clear plan, +enumerated in detail in the :ref:`specification section below +`. + + +.. _jit-specification: + +Specification +============= + +In order to achieve a sustainable and maintainable 20%+ performance gain with +full tooling compatibility in the next 2.5 years, there are several areas worth +discussing: + +* Key JIT infrastructure, including an evolution of the JIT frontend +* Optimizations +* First-class support for free-threading +* A better distribution story +* Compatibility +* Tooling support + + +.. _jit-infrastructure: + +Key JIT Infrastructure +---------------------- + +Traditionally, compilers are split into a *frontend*, *middle-end*, and +*backend*. They have the following meaning in our context: + +* **Frontend:** Selects what to compile. This can be methods or traces of + CPython specialized bytecode. +* **Middle-end:** Optimizes instructions. Translates specialized bytecode to + uops and optimizes them. +* **Backend:** Generates machine code. + +At present, the frontend uses trace recording. Elaborating more, trace +recording records the actual flow of execution through the program's bytecodes, +along with live values during execution. We instrumented the interpreter to +achieve this. This frontend was not the one originally introduced in 3.13, +which seemed to be ineffective at the time due to various reasons +[#jit-on-track]_. + +To ease maintenance burden, disentangle the JIT and the interpreter, and unlock +future optimizations in a sustainable fashion, we propose changing the frontend +by 3.16 to a method one. The method frontend can be rolled back midway to the +trace recording one if it does not meet our goals. + +Changing the frontend is not free. Time spent on this work is time not spent +directly adding optimizations to the current tracing frontend, and some +trace-specific performance wins may need to be recovered after the transition. +We believe this opportunity cost is justified only because the current frontend +appears likely to impose increasing maintenance, teaching, debugging, and +optimization costs as it matures that will outpace the initial implementation +cost of the method frontend. + +To elaborate on the difference, trace recording records straight-line sequences +through the code, while methods generally select one or more Python functions +to compile. + +The middle-end and backend will not require major changes. Nearly all of the +current code can be reused for the method frontend. The current backend which +uses Copy and Patch compilation already supports branches and jumps in the +control-flow. The middle-end which analyzes types over uops just needs to +support merging type information at control-flow merge points. + +Motivated by our learnings over the past several years, our goals for the +method frontend are as follows: + +* To make optimization as simple and as *traditional* as possible so as to + avoid unnecessary experimentation on CPython's main branch. +* To make the JIT easier to maintain. We don't mean this in lines of code, but + rather in conceptual burden. A single maintainer should be able to "fit" the + entire system in their head and accurately predict/understand its behavior, + even in reasonably complex programs. The current tracing frontend can produce + head-scratching results even for very simple benchmarking programs. +* To enable higher-level optimizations more easily, without requiring a higher + JIT tier (which requires another JIT bolted on top) or inter-trace knowledge. + +The following is the reference design. It is subject to change as the code +evolves: + +* **Uop IR.** The benefits for this are explained in previous sections. +* **Some Single Static Assignment (SSA) form properties over the stack.** This + does not mean we need to rewrite our IR to SSA form, but rather, the + optimizer should have some SSA properties. We believe this aligns more + closely with other compilers (e.g. Cinder, PyPy, Chrome's V8, CRuby's + YJIT/ZJIT), and makes understanding *how* to optimize in the JIT easier and + more powerful. The current JIT optimizer already nearly supports this, and + only requires minimal changes to have SSA properties. An IR with proper stack + discipline already has many useful properties that are analogous to SSA form. + SSA form will basically come for free for stack variables. +* **A simple way to represent high-level constructs.** We have an + implementation that forms *regions* (groups of basic blocks), inspired by the + similarly named concept in MLIR (an LLVM project). Rather than degenerating + programs to single basic blocks pointing to each other, we opt to keep the + high-level construct information around. In MLIR, this was motivated by + better loop analysis and optimization. In CPython's JIT, this is motivated by + better generator/coroutine/loop/etc. (high-level construct) analysis and + optimizations. + +With all of the above, most optimizations in the JIT can be implemented as +local rewrites. This is again, inspired by certain properties of other +runtime's intermediate representations. Our goal is to make the JIT more +traditional and teachable, without sacrificing what we can optimize. We do +acknowledge that a method JIT requires joining control-flow. However, we +believe this is not a large conceptual overhead, as a tracing JIT already +requires teaching the concept of joining control-flow once anything other than +the most basic optimizations are implemented. + +In terms of what code we need to achieve this frontend, most of the +infrastructure required is already present. The main code modifications +required are the data structures to represent a control-flow graph, and +worklist algorithms to drive the pre-existing optimizer/analysis pass. We can +proceed to remove most of the current tracing frontend from the JIT from the +interpreter, which will simplify the interpreter's core dispatch mechanism and +simplify the main interpreter loop. We believe these are not foreign concepts +to CPython -- the current bytecode compiler in CPython already represents +control-flow graphs and has worklist algorithms. + +Finally, both method and tracing JITs gain complexity and have various +tradeoffs to achieve great performance. Where tracing has greater simplicity in +value recording and profiling, methods need more advanced polymorphic inline +caches. Where tracing needs inter-trace optimization to get higher-level +optimizations, method JITs have it simpler by seeing more code. Both of these +need tight coupling with the interpreter to achieve great performance. We +understand both technologies come with tradeoffs, and we are once again not +making a judgement of which is ultimately better. Our claim is just that for +the optimizations that CPython requires, and for the ease of teaching, +debugging and analyzing, and for finding solutions in similar language runtimes +to our problems, a method-based JIT in this case is ultimately our choice. To +be upfront, and provide an understanding of the potential additional complexity +needed: a proposed method JIT may also require certain additional features (in +literature) like recording extra type profiling data in an extra side table. +However, the complexity can be greatly mitigated by the current bytecode DSL +and automatically generating the profiling operations, similar to how the +current tracing JIT already does things. We also propose solutions to +mitigate them in :ref:`Optimizations `. We thus believe +the conceptual and maintenance leap is not huge. + + +.. _jit-optimizations: + +Optimizations +------------- + +The method JIT builds on the pre-existing optimizations already present in the +current trace recording JIT. Namely it will come with the following +optimizations by virtue of the pre-existing JIT middle-end: + +* Type speculation (via the specializing adaptive interpreter's typed bytecode) +* Useless check/guard removal +* Redundant reference counting removal +* Constant folding + +Knowledge of the current middle-end is transferrable and contributors who have +worked on the current middle-end need not relearn much as the JIT middle-end +can work with the method frontend with minimal changes. + +Switching frontends has a short-term performance opportunity cost. The trace +recording frontend has benefited from nearly a year of focused work, and some +of its gains, especially those tied to trace-specific behavior or +free-threading-unsafe optimizations, may need to be recovered after the +transition. The reason to accept this cost is that a method frontend should +make the next set of larger optimizations easier to implement, reason about, +test, and maintain. + +As part of our plans, we plan to optimize generators/coroutines better and +improve the efficiency of calls. These high-level optimizations motivated us +towards a method-based JIT. These optimizations require seeing more of the +user's code to be effective, and the current tracing JIT in CPython cannot +achieve this without inter-trace optimization or trace stitching, which +increases the complexity and coupling with the runtime. + +Further optimizations are possible. However, they do not differ much if a trace +recording or method frontend is used: + +* Lock removal on free-threading +* Detecting deferred reclamation to reduce escaping sites in the JIT on + free-threading +* Conservative unboxing of integers, floats, and small strings + +To recover the optimizations tracing gives for free, we plan to explore: + +* Recording extra type profiling information from the interpreter's specializer +* Path splitting (duplicating the control-flow graph) +* Cold code elimination +* Respecializing instructions in the middle-end. + +One may argue that this introduces a lot of complexity in the method JIT. +However, type profiling involves no changes to the interpreter and only minimal +changes to the specializer. Path splitting can be found in standard compiler +textbooks. Cold code elimination is trivial to implement in current CPython due to +branch information tracking already in the interpreter (we can just choose not +to compile branches/blocks that are never taken). Respecialization can be done by +leveraging the existing specializer's decisions. This continues the trend that +we feel method JITs are less entangled with the interpreter in the case of +CPython. + + +.. _free-threading: + +First-Class Support for Free-Threading +-------------------------------------- + +free-threading is already a part of Python's future, and the current JIT must +be made free-threading safe as soon as possible to be a viable option for +improved performance. This involves making the frontend and middle-end's +optimizations free-threading safe (the backend should already be safe). We do +not anticipate that a method frontend will make free-threading support more +difficult over a tracing one. Furthermore, all major optimizations for the +pre-existing JIT implemented in the past year have already been designed with +free-threading in mind. However, a slight performance penalty may initially be +encountered as we remove free-threading unsafe optimizations. For example, we +anticipate that the major optimizations that need addressing will be +globals/builtins dictionary and type watchers. Resolving attribute/global +lookup at JIT compile time should still be feasible, but removing their guards +altogether may be unsafe in free-threading. We expect a naive fix to produce a +slight (1-2% geomean) performance hit initially. + +All future optimizations upon resuming JIT development will be reviewed with +free-threading compatibility and performance impact required before merge. +Optimizations that rely solely on the GIL build and break on the free threaded +build will be rejected. + +Additionally, the JIT may eventually even produce better performance versus the +free-threading build than the current GIL build. Early experiments in the JIT +suggest free threaded optimization may gain a few more percentage points on +pyperformance. For example: + +* Reference counting on the free-threading build is more expensive than on the + GIL build, and the JIT can eliminate much of reference counting. +* The JIT has more leeway with the lifetimes of certain objects, due to + deferred reference counting (see :pep:`703`) and Quiescent State-Based + Reclamation (QSBR). This unlocks even more optimization opportunities that + are not possible with immediate reclamation. +* The JIT can remove locks and atomics in the specializing adaptive interpreter + when it detects that objects are uniquely referenced. This is a source of + slowdown on architectures where atomics are more expensive. + +**We believe that the right framing here is not the JIT *or* free-threading, +but rather, the JIT *and* free-threading**. We understand the JIT may initially +lose some performance opportunities from free-threading's semantics. However, +both the JIT and free-threading have much to gain. The JIT can recover all of +free-threading's single-threaded performance losses and maybe even more. + + +.. _distribution: + +A Better JIT Distribution Story +------------------------------- + +We can choose to adopt either :pep:`774`'s solution or allow a range of LLVM +versions to build the JIT. This is up for more discussion and +experimentation. At minimum, feedback from popular Linux distributors must be +collected, deliberated on, and incorporated into a holistic solution. Our +current understanding of the situation is that supporting multiple LLVM +versions is required. This should not add much additional complexity. However, +it may require more CI resources for testing. For the JIT to be successful, it +must not unduly burden third-party distributors. + + +.. _compatibility: + +Compatibility Review +-------------------- + +As part of our roadmap, we plan to run the test suites of the top PyPI packages +and detect if the JIT breaks them, similar to the initial nogil repository's +approach where Sam Gross ran popular PyPI packages to detect bugs and +incompatibilities (see the labeille project [#la-beille]_ for additional prior +art). + +Failing a package's test suite does not mean the goal is automatically not met. +Certain test suites may rely on CPython internal details that are not +guaranteed. Therefore, this requires a case-by-case examination. The bottom +line is that we must have at least made a concerted effort to assess JIT +compatibility with existing Python code out there, and made a best-effort +attempt at correcting any "real" bugs. + + +.. _tooling-support: + +Tooling Support +--------------- + +The JIT will not regress on the current native unwinding support. To recap, the +JIT currently supports all frame pointer-based unwinders and eh_frame-based +ones as well (such as GNU backtrace). + +The JIT will continue supporting out-of-process profilers/debuggers that +require Python frames. We understand that frame elision (inlining) is a +promising optimization. However, completely eliding frames in the JIT would +break third party tools. We will take care to negotiate and provide alternative +methods for Python frame unwinders the required information to recover the +elided frame, such as storing metadata for the elided frame. Furthermore, tools +that inspect the Python stack may need to symbolize the JIT C shim frame (i.e., +relate it to a Python function call). In this case, all necessary information +to support these tools will be provided in the CPython runtime, either through +executor objects or elsewhere, and also in the debug offsets for these tools to +support making sense of a callstack with JIT frames. For this, we may consult +with maintainers of popular Python frame unwinding applications. As a general +rule: if something works with the JIT off, we should do everything we can to +make sure it also works (or has usable alternatives) with the JIT on, and not +break genuinely useful observability and debugging features in the name of raw +performance. + +The JIT will continue supporting in-process tools. This means it will not break +``sys._getframe``, ``pdb`` or ``sys.monitoring``. + + +Relationship to Other JITs and Compiler Tools +--------------------------------------------- + +CPython's JIT is not intended to replace third-party specialist JITs or +compiler projects, such as CinderX, Numba, PyTorch Compile or other +domain-specific compilers. Those projects often optimize different workloads, +use different assumptions or operate at different layers of the stack. The JIT +is intended to be a "backstop" for the execution of any code that ends up being +the responsibility of the interpreter itself, just as it is today. + + +Platform Support +---------------- + +The JIT will support all Tier 1 platforms, as specified in :pep:`11`, at time +of writing: + ++---------------------------+------------+ +| Target Triple | Notes | ++===========================+============+ +| aarch64-apple-darwin | clang | ++---------------------------+------------+ +| aarch64-unknown-linux-gnu | glibc, gcc | ++---------------------------+------------+ +| i686-pc-windows-msvc | | ++---------------------------+------------+ +| x86_64-pc-windows-msvc | | ++---------------------------+------------+ +| x86_64-unknown-linux-gnu | glibc, gcc | ++---------------------------+------------+ + +However, we do not plan to concentrate dedicated cycles to improving 32-bit +Windows performance and would like to exclude the platform from our goals as +PyPI statistics suggest 32-bit Windows builds are a vanishingly small number of +downloads. Furthermore, if other conventionally non-JIT platforms eventually +get promoted to Tier 1 (such as WASI), we do not expect to support those +either. + +Thanks to the Copy and Patch backend, the JIT supports the platforms of +interest with minimal additional work required from us. The key idea is to not +handwrite machine code equivalents of our IR, as that causes too much churn and +is unsustainable with CPython's rapid bytecode changes. + + +.. _maintenance-model: + +Maintenance Model +----------------- + +The JIT is maintained by a group of CPython core developers and contributors +working across its three stages: the frontend, the optimizer and the +code-generation backend. A central goal of the JIT has been to keep more than +one active maintainer familiar with each stage, so that no part of the JIT +depends on a single person. The contributor base has grown deliberately rather +than by chance. During the 3.15 cycle, optimization work was decomposed into +small, individually actionable tasks, which lowered the barrier to contributing +and drew roughly a dozen people into the trace-recording conversion effort +while increasing the number of recurring optimizer contributors +[#jit-on-track]_. This task decomposition is an ongoing mechanism for bringing +in and retaining contributors, and it is how the project intends to sustain and +widen its maintainer pool over time. + +At present, the project does not depend on any single sponsor. It has continued +as a community-led effort after its initial principal corporate sponsor wound +down its dedicated funding, and it currently combines volunteer work with some +ongoing corporate contributions, primarily from Arm, FastAPI Labs, and OpenAI. +Sustaining the JIT also depends on shared and key infrastructure: the +continuous integration and build configurations that exercise JIT builds +(currently part of regular CI on main [#jit-ci]_), and the self-hosted +benchmarking machines and infrastructure that publish nightly results +[#does-jit-go-brrr]_ (currently maintained by Savannah; machines contributed by +Savannah and Arm). + +Finally, and perhaps most importantly, the JIT must remain accessible for +contributors who do not work on it. This means committing to keeping the +interpreter approachable or decoupled from the JIT, to documenting the workflow +for regenerating generated code and contributing changes, and to keeping the +internals documentation current. The simplification of the optimizer's +operations shipped in 3.15 (see :ref:`Learnings `) +is an example of this maintenance investment in practice. Obligations on +redistributors who build and ship the JIT are described in the +:ref:`A Better JIT Distribution Story ` section. + + +Backwards Compatibility +======================= + +Since the JIT is an optimization and not a change to the language, its central +compatibility guarantee is that a JIT-enabled build must produce behavior +indistinguishable from a non-JIT build, just faster: same results, same +exceptions and tracebacks, and same supported introspectable state. + +As :ref:`covered above `, we will conduct a compatibility +analysis on the top PyPI packages' test suite as a requirement to regard the +JIT as supported in CPython. + + +Security Implications +===================== + +As stated in :pep:`744`, CPython's JIT, like all JITs, produces large amounts +of executable data at runtime. This is an attack vector of all JIT compilers: a +malicious actor capable of influencing the contents of this data is therefore +capable of executing arbitrary code. + +In order to mitigate this risk, the JIT has been written with best practices in +mind. In particular, the data in question is not exposed by the JIT compiler to +other parts of the program while it remains writable, and at no point is the +data both writable and executable. + +The nature of template-based JITs also seriously limits the kinds of code that +can be generated, further reducing the likelihood of a successful exploit. As +an additional precaution, the templates themselves are stored in static, +read-only memory. + +However, it would be naive to assume that no possible vulnerabilities exist in +the JIT, especially at this early stage. The authors are not security experts, +but will work closely with the Python Security Response Team to triage and fix +security issues as they arise. + +Supporting CET/BTI has also been requested by Fedora maintainers +[#issue-149697]_. We believe supporting this option in the generated stencils +is required for meeting our goals for security. + +Finally, since :pep:`744`'s inception, multiple fuzzing projects have been +initiated to fuzz the JIT. For example, Lafleur [#la-fleur]_ has found +numerous JIT bugs that lead to crashes or wrong optimizations (mostly in the +JIT middle-end, not the backend). We will continue using these projects to fuzz +the JIT. + + +How to Teach This +================= + +For the vast majority of Python users, the most important thing to teach about +the JIT is that there is nothing they need to do and nothing they need to watch +out for. No code should need to be rewritten to benefit from the JIT, and none +should need to be changed to remain correct. + +For users who want a mental model, a short and accurate one is enough: the JIT +is an optimization layer that sits above the interpreter and compiles +frequently-executed code to machine code on the fly. It changes how fast a +program runs, not what it does. This framing is sufficient for most educational +contexts and does not require teaching the internals (for example, uops, +optimizer, code generation). + +Two audiences need more specific guidance: redistributors and packagers. +These users will need to understand the build-time requirements and the path +toward distributable artifacts (see :ref:`"A Better JIT Distribution Story" +`). Maintainers of debuggers, profilers, and other native +tooling need to know that JIT frames are unwindable on supported platforms and +what they can rely on when inspecting a running process. Python stack unwinders +will need to understand the JIT frame layout and recover information during +symbolization (see :ref:`"Tooling Support" `). + +Finally, core developers only need to care about the JIT if they want their +feature to be optimized by it. Otherwise, the current JIT architecture means +that core developers working on the interpreter or other parts of the runtime +do not need to care that a JIT exists, apart from the occasional CI breakage. +Once the JIT is regarded as supported, it should not be broken catastrophically +by any new changes. However, we expect that in almost all cases, introducing a +new feature to Python will not be obstructed by a JIT, unless the contributor +explicitly wants the JIT to support their feature or optimize for it. Once +again, see for example the lazy imports initial implementation which modified +bytecode, but did not need to modify the JIT other than ``#include`` the new +headers introduced [#lazy-imports-pr]_. + + +Reference Implementation +======================== + +The current implementation for the JIT can be found in CPython's main branch, +largely in: + +* ``Tools/jit/README.md``: Instructions for how to build the JIT. +* ``Python/jit.c``: The entire backend portion of the JIT compiler. +* ``Python/optimizer.c``: Part of the frontend of the JIT compiler (partially + shared from ``Python/ceval.c``). +* ``Python/optimizer_analysis.c``: The middle-end of the JIT compiler. +* ``Python/optimizer_bytecodes.c``: The middle-end of the JIT compiler's + optimization rules. +* ``jit_stencils.h``: An example of the JIT's build-time generated templates + (not currently checked into the CPython repository). +* ``Tools/jit/template.c``: The code which is compiled to produce the JIT's + templates. +* ``Tools/jit/_targets.py``: The code to compile and parse the templates at + build time. + +While this PEP does propose and outline an evolution for the JIT (transitioning +from tracing to method-based, with heavy reuse of existing code), it does not +prescribe a particular implementation of that design. With that said, a working +proof-of-concept implementation against main exists, and will be shared soon. + +Despite the fact that it is currently under development and incomplete (it does +not yet handle generators and coroutines, for example, and has no support for +polymorphism, both of which are supported partially by the existing tracing +frontend), it is still 4-5% faster on the pyperformance and Pyston +macrobenchmark suites vs. JIT off, on a GIL-enabled build. This demonstrates +that the new design developed in just a couple of months can be competitive with the +existing tracing design (which is 7-8% faster on the same x86-64 Linux +configuration after 3 years of work evolving it). + +Excluding tests, the size of the current method-JIT implementation vs. main is +approximately as follows: + ++---------------+---------------+-------------+---------------+ +| File Type | Files Changed | Lines Added | Lines Removed | ++===============+===============+=============+===============+ +| Generated | 13 | 5300 | 4200 | ++---------------+---------------+-------------+---------------+ +| Non-Generated | 38 | 2700 | 5800 | ++---------------+---------------+-------------+---------------+ +| Total | 51 | 8000 | 10000 | ++---------------+---------------+-------------+---------------+ + +Broken down by file extension: + ++-----------+---------------+-------------+---------------+ +| Extension | Files Changed | Lines Added | Lines Removed | ++===========+===============+=============+===============+ +| .c | 20 | 2300 | 5400 | ++-----------+---------------+-------------+---------------+ +| .c.h | 6 | 3300 | 2400 | ++-----------+---------------+-------------+---------------+ +| .h | 20 | 2300 | 2100 | ++-----------+---------------+-------------+---------------+ +| .py | 5 | 100 | 100 | ++-----------+---------------+-------------+---------------+ +| Total | 51 | 8000 | 10000 | ++-----------+---------------+-------------+---------------+ + + +Rejected Ideas +============== + +Maintain the JIT Outside of CPython main +---------------------------------------- + +It has been suggested, both during the JIT's history and in recent discussion, +that a compiler of this complexity might be better developed and maintained out +of tree or as a separate project, rather than in CPython's main branch. +However, keeping the JIT in main is a deliberate and hugely beneficial choice, +originally articulated in :pep:`744`: it allows the JIT to be co-developed with +the interpreter and maintained by the broader group of core developers and +contributors rather than a small set of specialists working on a fork. The uops +the JIT consumes are also co-designed with and regenerated from the +interpreter. An out-of-tree JIT would have to track those definitions across a +branch boundary, which raises the maintenance cost and the risk of drift +precisely in the area where correctness matters most. The growth of the +contributor base during the 3.15 cycle (see +:ref:`Maintenance Model `) is itself evidence that in-tree +development lowers, rather than raises, the barrier to participation. Keeping +the JIT in the main branch of CPython also allows us to have a better pulse on +the needs of distributors, and means that it's easier for end users to try out +the JIT and let us know what behavior they observe and what issues they find. + + +Pluggable JIT Infrastructure +---------------------------- + +Another recurring idea is for CPython to expose a stable, general-purpose +interface for plugging in arbitrary third-party JIT compilers, rather than +maintaining one in tree. This PEP rejects this idea for the same reasons as +maintaining the JIT outside of CPython main. Introducing a pluggable JIT risks +diverting contributor effort and increases maintenance overhead. For example, +an earlier version of the JIT in 3.13 had a semi-public experimental API. +However, it leaked internal details to "users" (there were none) and made +internal JIT development more difficult. Thus, we removed it. Furthermore, +most language runtime JITs are deeply integrated with their respective +runtimes to the extent that a pluggable JIT infrastructure may not be feasible. + +We agree however, that efforts that maintain a JIT outside of CPython using +:pep:`523`, such as CinderX and TorchDynamo, are commendable. We believe the +discussions to be had for improving the pre-existing interfaces are best left to +a separate PEP, and consider them out of scope for this one. + + +Dropping the Build-Time LLVM Requirement +---------------------------------------- + +This PEP does not propose changing the JIT's reliance on the LLVM toolchain at +build-time. We treat reducing build-time friction as important, but not as a +precondition for agreeing on the path outlined here. :pep:`774` proposes a +solution for removing the LLVM prerequisite but at time of submission, the +sitting Steering Council decided to defer making a decision on it until the JIT +had achieved more substantial performance gains. We would also like to keep +exploring options in this space and as such, would like to save this for a +separate PEP. + + +A Higher-Tier JIT +----------------- + +We believe that multi-tiered JITs produce great performance and compelling +warmup times. However, we also believe that for the time being, CPython's +complexity and maintenance budget may not support such an endeavour. We are not +saying this should never happen. Rather, our goal is to produce the best JIT we +can for the current state of CPython, given the constraints we can work with. +For that, we reject building yet another JIT on top of the current one for peak +performance. + + +Enable/Support the Current JIT As-Is +------------------------------------ + +The current JIT is undoubtedly the product of much attention and care -- we +thank everyone who contributed to it. However, we understand the community as a +whole have concerns that are still unaddressed and therefore need remedying. We +also acknowledge that Python, and indeed CPython, is so widely-used that a +change of this scale must be properly examined and considered before it can be +a part of the project proper. As such, the current JIT cannot be enabled +without more scrutiny and evolution. + + +Open Issues +=========== + +None at this time. + + +Appendix +======== + +.. _appendix-jit-speedup-2wk: + +Average JIT Speedup by Machine (calculated from 2026-06-16 to 2026-06-27) +------------------------------------------------------------------------- + ++-------------------------------------+--------------+-------------+--------------+------+-------------+ +| Machine | Config | Avg speedup | Result | Days | Range | ++=====================================+==============+=============+==============+======+=============+ +| jones (M3 Pro, macOS) | JIT+TAILCALL | 1.126x | 12.6% faster | 9 | 1.050-1.180 | ++-------------------------------------+--------------+-------------+--------------+------+-------------+ +| sulaco (AmpereOne, Linux aarch64) | JIT | 1.073x | 7.3% faster | 7 | 1.060-1.080 | ++-------------------------------------+--------------+-------------+--------------+------+-------------+ +| ripley (i5-8400, Linux x86_64) | JIT | 1.069x | 6.9% faster | 9 | 1.060-1.070 | ++-------------------------------------+--------------+-------------+--------------+------+-------------+ +| prometheus (Ryzen 5 3600X, Windows) | JIT+TAILCALL | 1.047x | 4.7% faster | 9 | 1.040-1.050 | ++-------------------------------------+--------------+-------------+--------------+------+-------------+ + +.. note:: + + Note that JIT+TAILCALL is used on Windows and macOS runs, as regular CPython + builds ship with tailcalling enabled. All data used for this calculation can + be found on Does JIT Go Brrr? [#does-jit-go-brrr]_. + + +Footnotes +========= + +.. [#whats-new] `3.15 What's New + `__ +.. [#jit-on-track] `Python 3.15's JIT is now back on track + `__ +.. [#jit-ci] `jit.yml in CPython + `__ +.. [#does-jit-go-brrr] `Does JIT Go Brrr? + `__ +.. [#la-beille] `labeille - Hunt for CPython JIT bugs by running real-world + test suites `__ +.. [#la-fleur] `lafleur - A feedback-driven, evolutionary fuzzer for the + CPython JIT compiler `__ +.. [#issue-149697] `JIT shim object drops GNU property notes (CET/BTI/PAC) from + output binaries `__ +.. [#trace-recording-docs] `The Trace Recorder and Executors + `__ +.. [#lazy-imports-pr] `Initial lazy imports implementation + `__ +.. [#community-perspectives] `Community perspectives on the JIT: experiences, + expectations, and concerns - post 15 + `__ + + +Change History +============== + +None at this time. + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.