Fact-checked by Grok 2 weeks ago

Global interpreter lock

The Global Interpreter Lock (GIL) is a mutex in the implementation of that protects access to objects and internal data structures, such as reference counts, by ensuring that only one native can execute at a time. This mechanism simplifies in 's reference-counting system, which lacks inherent protections against concurrent modifications that could lead to or crashes. Introduced in the early design of to avoid the complexity of fine-grained locking across its codebase, the GIL has been a defining feature since 's initial threading support in version 1.5. While the GIL enables straightforward implementation of Python's threading model, it imposes significant limitations on multi-threaded performance, particularly for tasks, as it serializes execution even on multi-core processors and prevents true parallelism within a single process. Threading in Python thus excels in I/O-bound applications, such as network or operations, where threads can release the GIL during blocking calls, allowing context switching without contention. For CPU-intensive workloads, developers often resort to the multiprocessing module, which spawns separate processes to bypass the GIL and utilize multiple cores. The GIL's impact has long been a point of contention in the Python community, hindering scalability in fields like scientific computing, , and , where multi-threading could otherwise leverage modern hardware. Efforts to address this date back to proposals like PEP 703, accepted in 2023, which introduced experimental "free-threaded" builds of starting in Python 3.13, configurable via a --disable-gil option. These builds incorporate thread-safe alternatives, such as atomic and per-object locks, to eliminate the GIL while maintaining , though it incurs a single-threaded performance overhead of approximately 5-10% due to added . In Python 3.14 (released October 2025), free-threading became a stable, opt-in feature, with continued refinements for extension modules and the .

Fundamentals

Definition and Purpose

The Global Interpreter Lock (GIL) is a mutex-like mechanism in interpreters such as that restricts execution to a single native at any given time, ensuring that only one can execute simultaneously within the same interpreter instance. This lock serializes access to the interpreter's core components, preventing multiple s from concurrently manipulating shared resources like objects. By design, the GIL applies specifically to -level parallelism in multi-threaded programs, but it does not affect process-level concurrency (e.g., via the module) or asynchronous programming models like asyncio, which operate without shared interpreter state. The primary purpose of the GIL is to safeguard against race conditions in CPython's system, particularly its mechanism for tracking object lifetimes and enabling automatic garbage collection. Without the GIL, concurrent threads could interfere with reference count increments or decrements—for instance, two threads attempting to increment an object's reference count might result in only one increment being recorded, leading to premature deallocation and potential crashes. This protection is essential because CPython's is inherently non-thread-safe, relying on the GIL to maintain consistency across threads without requiring complex per-object locking. Additionally, the GIL was introduced to simplify the development of C extensions and the interpreter's internals in multi-threaded environments, allowing extensions to interact with Python objects without implementing their own synchronization primitives in most cases. By providing a centralized lock, it reduces the burden on extension authors to handle thread safety manually, promoting stability while enabling basic multi-threading for I/O-bound tasks.

Historical Development

The Global Interpreter Lock (GIL) was introduced in with the release of 1.5 in early 1998, primarily developed by to enable multithreading support while addressing implementation challenges in the interpreter. At the time, 's growing popularity prompted interest in leveraging operating system threading capabilities, but the reference-counting-based system posed risks of race conditions if multiple threads accessed Python objects simultaneously. Early motivations for the GIL centered on simplifying collection and ensuring compatibility with the API for extensions. By serializing access to the , the GIL prevented concurrent modifications to reference counts, avoiding the need for complex thread-safe collection mechanisms that could complicate the interpreter's design. This approach also streamlined interactions for extensions, allowing developers to focus on functionality without extensive primitives, amid the late 1990s shift toward multithreaded applications in scripting languages. Throughout the 2000s and 2010s, the GIL persisted in despite ongoing community debates about its limitations, contrasting with alternative implementations like , which runs on the and lacks a GIL due to Java's inherent model. Key events included repeated discussions on the python-dev mailing list, where proposals to remove the GIL were rejected to preserve single-threaded performance. In 2023, PEP 703 proposed making the GIL optional through a new build configuration (--disable-gil), marking a significant shift after years of experimentation. The GIL's evolution continued into the , transitioning from a mandatory feature to an optional one in Python 3.13 (released 2024), where free-threaded builds became experimentally available without the lock. By 3.14 (released 2025), the implementation of PEP 703 was fully integrated, allowing users to compile without the GIL while maintaining for standard builds, though the lock remains the default to avoid performance regressions in legacy code.

Technical Implementation

Mechanism in CPython

In CPython, the Global Interpreter Lock (GIL) serves as a mutex that integrates deeply with the Virtual Machine (PVM) to safeguard execution. The PVM, responsible for interpreting compiled into machine instructions, relies on the GIL to ensure that only one native can execute code at any given moment, thereby maintaining the atomicity of operations on shared Python objects. This protection is crucial during the evaluation loop, where instructions are processed sequentially to prevent concurrent modifications that could corrupt object states. A key aspect of this integration is the GIL's role in protecting CPython's mechanism for . Python objects maintain an internal reference count to track active references; increments via Py_INCREF and decrements via Py_DECREF must occur atomically to avoid race conditions in multi-threaded scenarios, where simultaneous updates could lead to premature deallocation or leaks. By serializing these operations under the GIL, CPython ensures thread-safe garbage collection without requiring finer-grained locks on every object, simplifying the interpreter's design while preventing crashes from inconsistent reference counts. CPython manages multi-threading through thread state structures defined in PyThreadState, which encapsulate each thread's execution context within the interpreter. These structures track GIL ownership and coordinate access during sensitive operations like collection. This allows the interpreter to switch between threads efficiently while upholding the GIL's . At the platform level, the GIL is realized as a pthread_mutex_t on systems for POSIX-compliant locking, providing robust with low overhead for contention scenarios. On Windows, it utilizes a CriticalSection object, a lightweight primitive optimized for single-process critical sections, ensuring compatibility and performance across operating systems. These implementations are abstracted through 's threading to handle the underlying transparently.

Acquisition and Release

In , threads acquire the Global Interpreter Lock (GIL) prior to executing to ensure thread-safe access to the interpreter's shared resources. This process involves internal calls, such as those akin to the now-removed C API function PyEval_AcquireLock() (deprecated in Python 3.2 and removed in Python 3.13), which atomically attach the thread and seize the lock; waiting threads employ a timed condition variable wait to avoid indefinite blocking and mitigate risks. The GIL is relinquished through defined triggers to enable context switching among threads. Primary among these is the completion of a tunable time slice during bytecode execution, defaulting to 5 milliseconds and adjustable via sys.setswitchinterval() to balance responsiveness and overhead. Release also occurs automatically during blocking I/O operations or sleeps, allowing other threads to proceed without explicit intervention. Context switching is orchestrated by the interpreter's evaluation loop, where the GIL-holding thread periodically checks for drop requests after the time interval elapses. If set—typically due to a timeout from waiting threads—the current thread finishes its ongoing bytecode instruction before yielding the lock via an internal release mechanism, such as drop_gil(), permitting the operating system scheduler to select the next thread. This voluntary or forced yielding promotes fair access in multithreaded environments. Starting with Python 3.2, the GIL underwent a significant rewrite, shifting from opcode-count-based to time-based switching for more precise control. This enables finer-grained releases during prolonged computations, enhancing overall thread responsiveness by minimizing extended lock holds in CPU-intensive scenarios without altering single-threaded performance.

Advantages

Thread Safety and Simplicity

The Global Interpreter Lock (GIL) in provides automatic for pure code by serializing the execution of across s, thereby eliminating the need for developers to implement manual locks or primitives in most cases. This mechanism ensures that only one can access and modify objects at a time, inherently preventing race conditions that could arise from concurrent manipulation of shared data structures like reference counts or built-in types such as dictionaries. As a result, multithreaded applications achieve without the overhead of explicit locking, making concurrent programming more straightforward for tasks like I/O-bound operations. For C extensions that interface with Python objects, the GIL further simplifies development by offering a predictable single-threaded environment, reducing the complexity required to ensure thread safety when calling into or from Python code. Extensions can leverage GIL-aware APIs, such as PyGILState_Ensure, to safely acquire the lock and interact with the interpreter, avoiding the need for intricate per-object locking mechanisms that would otherwise be necessary in a fully concurrent system. This design choice lowers the barrier for creating performant, thread-compatible modules while maintaining compatibility with Python's core object model. The GIL's serialization also benefits debugging and maintenance of multithreaded code by minimizing the occurrence of race conditions, which leads to more deterministic behavior and easier testing. With fewer nondeterministic errors, developers can focus on logical issues rather than elusive concurrency bugs, enhancing overall code reliability. Moreover, the GIL preserves the consistency of Python's dynamic typing and object model across threads by enforcing exclusive access, ensuring that operations like attribute access or method calls behave predictably without interference.

Performance in Single-Threaded Scenarios

In single-threaded applications, the Global Interpreter Lock (GIL) in introduces negligible overhead, as there is no contention for the lock among multiple threads. The interpreter can execute without invoking primitives, allowing for efficient, uninterrupted operation in non-concurrent code paths. The GIL enables key optimizations in that enhance single-threaded efficiency, such as non-atomic . Under the GIL, reference counts for objects can use standard integer operations rather than slower atomic instructions required for , minimizing computational costs during object allocation and deallocation. Additionally, common operations on built-in types like lists and dictionaries avoid per-object locking, further reducing runtime expenses. With the GIL acquired at the start of evaluation and held throughout the interpreter loop, 's dispatch mechanism operates without the need for repeated lock checks or releases at each instruction. This streamlined design accelerates the execution of tasks by eliminating barriers that would otherwise the flow, contributing to faster overall in solo-thread environments. Benchmarks confirm the GIL's benefits for single-threaded workloads, where standard consistently outperforms experimental no-GIL builds. For example, in 3.13 free-threaded mode, single-threaded execution on the pyperformance suite showed about 40% overhead compared to the GIL-enabled build, largely due to disabled adaptive interpreter specializations; this gap narrowed to 5-10% in 3.14 with optimizations reinstated.

Disadvantages

Multithreading Limitations

The Global Interpreter Lock (GIL) in enforces serialization of Python bytecode execution, permitting only one to run Python code at any given time, even on multi-core processors. This mechanism ensures by preventing concurrent access to shared interpreter state but results in other threads idling while waiting for the lock, effectively negating the benefits of multithreading for parallel computation. In tasks, such as numerical computations implemented in pure , the GIL prevents any from additional s, as the workload cannot be distributed across cores. For instance, a pure matrix multiplication using multiple s shows virtually no performance improvement over a single-threaded , with runtimes remaining constant at approximately 22 seconds regardless of count on an i7-10700K system with 8 cores. This limitation arises because the GIL serializes instructions like loops and arithmetic operations, forcing s to contend for the lock rather than executing in . A direct consequence is uneven CPU utilization, where a multi-threaded loop, such as a simple , pegs one core at 100% usage while leaving others idle, as observed in profiling tools on systems with four or more cores. Measurements using timing utilities like timeit further highlight this issue: a 50 million- takes about 6.2 seconds single-threaded but 6.9 seconds with multiple threads, revealing overhead from lock acquisition and release without any parallelism gains. This behavior underscores how the GIL limits Python's ability to leverage modern hardware for concurrent CPU-intensive workloads, as predicted by principles like , where the serialized portion dominates execution time.

Scalability Challenges

The Global Interpreter Lock (GIL) in significantly hinders utilization, as it serializes access to the interpreter, preventing multiple s from executing simultaneously even on systems with numerous cores. This limitation is particularly pronounced in (HPC) environments, where workflows often resort to to bypass the GIL, resulting in uncontrolled and potential underutilization of available cores due to orphaned processes and inefficient . For instance, in HPC clusters, the GIL forces the creation of separate processes for parallel tasks, which can lead to excessive memory usage and difficulties in adhering to job scheduling constraints, thereby complicating 's use for compute-intensive simulations on architectures. In server applications, the GIL poses substantial challenges for scaling web servers and systems, as it bottlenecks operations and necessitates forking or horizontal scaling across multiple instances to achieve concurrency, rather than leveraging threads within a single . This approach increases overhead from and memory duplication, making it harder to efficiently handle high-throughput workloads like ingestion or serving on multi-core servers without significant infrastructure costs. As of 2025, the GIL continues to pose challenges in core training pipelines, where CPU-intensive tasks such as and loading are often offloaded to or C++ backends to circumvent the lock's restrictions on multi-threading. In these pipelines, the GIL causes in data loading stages, leading to underutilized cores during training on large datasets and prompting reliance on libraries that explicitly release the GIL during computations. The GIL's persistence has shaped Python's ecosystem toward hybrid approaches, where large-scale deployments integrate C extensions or external libraries to release the lock and enable parallelism, thereby adding layers of complexity in development, maintenance, and integration for distributed systems. With the proliferation of multi-core processors, this reliance on hybrid strategies amplifies deployment challenges in resource-constrained environments, as developers must balance Python's ease of use with performance optimizations via lower-level integrations.

Workarounds and Alternatives

Multiprocessing and Async IO

The multiprocessing module in provides a standard library approach to achieve true parallelism by leveraging operating system processes rather than threads, thereby circumventing the limitations imposed by the Global Interpreter Lock (GIL). Each spawned process runs an independent instance of the Python interpreter, complete with its own space and GIL, which enables multiple processes to execute code simultaneously across multiple cores. To facilitate coordination between these isolated processes, the module employs (IPC) mechanisms such as queues, pipes, and , allowing data exchange without shared state conflicts. In contrast, the asyncio module implements through with coroutines, operating within a single to handle concurrent operations without invoking the threading model. This design is particularly effective for I/O-bound workloads, where tasks frequently await external resources like network responses or file operations; during these waits, the GIL is temporarily released, permitting the event loop to schedule other coroutines efficiently. As a result, asyncio avoids the overhead of thread creation and context switching while scaling well for scenarios involving numerous non-blocking I/O calls, such as serving multiple client connections. A key distinction in their application lies in task suitability: asyncio excels in I/O-intensive activities like web scraping or API polling, where concurrency rather than parallelism is paramount, whereas multiprocessing is better suited for compute-heavy simulations or data processing that benefit from parallel CPU utilization.

Subinterpreters and Extensions

Subinterpreters provide a mechanism for running multiple isolated Python interpreters within a single process, enabling better concurrency by associating each with its own independent Global Interpreter Lock (GIL). This feature, enhanced by PEP 684 accepted for Python 3.12, relocates the GIL to a per-interpreter scope, allowing subinterpreters to execute Python bytecode simultaneously without contending for a shared lock. Prior to this, subinterpreters shared a single GIL, limiting their utility for parallelism, but the per-interpreter GIL isolates runtime state and enables true multi-threaded execution across interpreters. This approach mitigates GIL bottlenecks in multi-threaded applications by confining thread contention to within each subinterpreter, while maintaining isolation to prevent cross-interpreter data races. C extensions offer another workaround by explicitly releasing the GIL during compute-intensive or I/O-bound operations, permitting other threads to proceed. Developers use the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros in C code to yield the GIL temporarily, as documented in the CPython C API. For instance, NumPy employs this technique in its low-level array operations, releasing the GIL to allow parallel execution of numerical computations across multiple threads. This selective release is particularly effective for extensions performing long-running tasks, balancing Python's thread safety with improved multi-core utilization without altering the core interpreter. Third-party projects have explored advanced modifications to address GIL limitations through experimental locking strategies. The nogil project, initiated before as a proof-of-concept of , investigated removing the GIL entirely by implementing fine-grained, per-object locks to enable multithreading without a global mutex. These efforts, which demonstrated viable parallelism in benchmarks, have since been integrated into official no-GIL development tracks, influencing proposals like PEP 703 for optional GIL builds. Hybrid approaches combine with lower-level languages like C++ to bypass GIL constraints in parallel sections. Tools such as facilitate this by compiling Python-like code to C, where developers can declare functions with the nogil qualifier to release the GIL during execution, enabling multi-threaded C++ integration via directives like . Similarly, the ctypes module allows calling C++ libraries from , and if those libraries manage their own threading without Python API interactions, they avoid GIL acquisition altogether. These methods are ideal for performance-critical code, such as scientific computing, where handles high-level logic and C++ manages parallel workloads.

Recent Developments

Experimental No-GIL Builds

In 3.13, released in 2024, introduced experimental support for free-threaded builds through the --disable-gil configuration flag, as outlined in PEP 703. This option allows compiling the interpreter without the Global Interpreter Lock (GIL), enabling true parallelism in multithreaded code while maintaining via a distinct ABI tagged with "t" for threading. The feature represents an early milestone in efforts to optionalize the GIL, building on years of community debate about its removal to better leverage multi-core processors. A key challenge in implementing no-GIL builds was ensuring thread-safe , particularly , which traditionally relies on non-atomic operations protected by the GIL. PEP 703 addresses this through biased , where increments are optimistic and decrements may be deferred, combined with per-object locks for contested cases; this approach draws on prior advancements like immortal objects in PEP 683, which fix reference counts for common immutable objects to reduce overhead. These mechanisms minimize the need for expensive atomic operations in most scenarios, though they introduce additional complexity for object lifecycle management. Early no-GIL builds in 3.13 exhibited a notable single-threaded overhead of 20-50% compared to standard GIL-enabled builds, primarily due to the added costs in and execution. For instance, in a prime-counting , the no-GIL variant took approximately 33% longer for single-threaded execution. However, this comes with significant benefits for multithreaded workloads, where the absence of the GIL allows threads to run concurrently across cores. Community-driven testing has highlighted these trade-offs through benchmarks on platforms like . In tasks, such as parallel numerical computations, no-GIL builds demonstrated speedups of up to 3.4x on four threads relative to their single-threaded baseline, approaching near-linear scaling on multi-core systems—contrasting sharply with GIL-limited threading, which offers little to no parallelism. Repositories like faster-cpython/benchmarking-public provide ongoing comparative data, showing that while single-threaded slowdowns persist, multithreaded gains make no-GIL viable for parallel-intensive applications like scientific computing.

Python 3.14 and Beyond

Python 3.14, released on October 7, 2025, marks a significant milestone in addressing the Global Interpreter Lock (GIL) by officially supporting free-threaded builds as a standard variant, no longer designating them as experimental. These builds can be compiled from source using the ./configure --disable-gil option or obtained via official macOS and Windows installers, enabling true multi-core parallelism without the GIL by default. In free-threaded mode, the GIL can optionally be re-enabled at runtime using the PYTHON_GIL=1 environment variable or the -X gil=1 command-line flag for compatibility testing. Performance in free-threaded 3.14 has been optimized, with single-threaded workloads incurring only a 5-10% overhead compared to GIL-enabled builds, achieved through enhancements like the thread-safe specializing adaptive interpreter and deferred for immortal objects. For multi-threaded, applications, this mode delivers substantial speedups by allowing concurrent execution across cores, with gains scaling based on workload and hardware—typically 2-3x or more on multi-core systems for parallelizable tasks. As of November 2025, default 3.14 distributions retain the GIL for , but free-threaded builds are recommended for new multi-threaded applications to leverage these parallelism benefits. Looking ahead, PEP 779 establishes criteria for advancing free-threaded support, including performance thresholds met in 3.14, and outlines a phased approach toward making it the default build in future releases, potentially Python 3.16 or later, pending community adoption and further optimizations. Ecosystem adaptations are underway, with tools like and pybind11 updating for compatibility, while major libraries such as are progressively supporting free-threaded mode through ongoing compatibility efforts and ABI stability. This progression aims to balance innovation with the stability of Python's vast extension .

Examples

Python Threading Demonstration

To demonstrate the impact of the Global Interpreter Lock (GIL) on threading, consider a simple task that performs intensive computations, such as summing large ranges of numbers in loops. In a single-threaded execution, the code runs straightforwardly without concurrency overhead.
python
import time

def cpu_bound_task(n):
    total = 0
    for i in range(n):
        total += i * i
    return total

def single_threaded():
    start = time.perf_counter()
    result1 = cpu_bound_task(10**7)
    result2 = cpu_bound_task(10**7)
    end = time.perf_counter()
    print(f"Single-threaded time: {end - start:.2f} seconds")
    print(f"Results: {result1}, {result2}")

single_threaded()
This single-threaded version executes two identical tasks sequentially, measuring the total time using time.perf_counter() for high-resolution timing. On a typical modern CPU, this might take around 1-2 seconds, depending on hardware, as the computations fully utilize the without interruption. Now, contrast this with a multi-threaded version using threading.Thread to run the same two tasks concurrently:
python
import threading
import time

def cpu_bound_task(n):
    total = 0
    for i in range(n):
        total += i * i
    return total

def threaded_task():
    result = cpu_bound_task(10**7)
    print(f"Thread result: {result}")

def multi_threaded():
    start = time.perf_counter()
    thread1 = threading.Thread(target=threaded_task)
    thread2 = threading.Thread(target=threaded_task)
    thread1.start()
    thread2.start()
    thread1.join()
    thread2.join()
    end = time.perf_counter()
    print(f"Multi-threaded time: {end - start:.2f} seconds")

multi_threaded()
When executed, the multi-threaded version shows execution times similar to the single-threaded case—often within 10-20% variance, but without the expected halving for two threads on a multi-core . This lack of speedup occurs because the GIL serializes access to execution, preventing true parallelization of CPU-bound work; threads contend for the GIL, effectively running one at a time on the interpreter level. In contrast, for I/O-bound tasks where the GIL is temporarily released (e.g., during system calls like network requests), threading can provide benefits through concurrency. A brief example simulates this with time.sleep() to mimic blocking I/O:
python
import threading
import time

def io_bound_task(duration):
    print(f"Task sleeping for {duration} seconds")
    time.sleep(duration)
    print(f"Task {duration} completed")

def multi_threaded_io():
    start = time.perf_counter()
    thread1 = threading.Thread(target=io_bound_task, args=(2,))
    thread2 = threading.Thread(target=io_bound_task, args=(2,))
    thread1.start()
    thread2.start()
    thread1.join()
    thread2.join()
    end = time.perf_counter()
    print(f"Multi-threaded I/O time: {end - start:.2f} seconds")

multi_threaded_io()
Here, the total time approximates 2 seconds rather than 4, as threads overlap during sleep periods when the GIL is not held, allowing efficient handling of multiple I/O operations.

Comparative Code Analysis

To illustrate the impact of the Global Interpreter Lock (GIL) on parallelism, consider a task: computing the sum of a large range of integers (e.g., 0 to 100,000,000) divided across four workers. In standard with the GIL enabled, multithreading fails to achieve true parallelism because only one thread executes Python bytecode at a time, leading to performance equivalent to a single-threaded . The following code uses the threading module to attempt summation, but due to the GIL, it does not utilize multiple cores effectively:
python
import threading
import time

def sum_range(start, end):
    total = 0
    for i in range(start, end):
        total += i
    return total

if __name__ == "__main__":
    n = 100_000_000
    chunk_size = n // 4
    partial_sums = [None] * 4
    threads = []
    start_time = time.time()
    
    def worker(start, end, idx):
        partial_sums[idx] = sum_range(start, end)
    
    for i in range(4):
        start = i * chunk_size
        end = start + chunk_size if i < 3 else n
        t = threading.Thread(target=worker, args=(start, end, i))
        threads.append(t)
        t.start()
    
    for t in threads:
        t.join()
    
    total = sum(partial_sums)
    end_time = time.time()
    print(f"Total: {total}, Time: {end_time - start_time:.2f}s")
On a 4-core CPU, this typically takes around 10-12 seconds, matching single-threaded performance since the GIL serializes execution. In contrast, the multiprocessing module bypasses the GIL by spawning separate processes, each with its own interpreter and memory space, enabling true parallelism across cores. The code below uses a process pool for the same summation task:
python
from multiprocessing import Pool
import time

def sum_range(start, end):
    total = 0
    for i in range(start, end):
        total += i
    return total

if __name__ == "__main__":
    n = 100_000_000
    chunk_size = n // 4
    segments = [(i * chunk_size, (i + 1) * chunk_size if i < 3 else n) for i in range(4)]
    
    start_time = time.time()
    with Pool(4) as p:
        partial_sums = p.starmap(sum_range, segments)
    total = sum(partial_sums)
    end_time = time.time()
    print(f"Total: {total}, Time: {end_time - start_time:.2f}s")
This scales nearly linearly with the number of cores, achieving approximately 3.5-4x on a 4-core CPU (e.g., ~2.5-3 seconds), though with some overhead from process creation and data serialization. For a GIL-free within a single interpreter, 3.14's free-threaded build (configured with --disable-gil during compilation or via official free-threaded installers), which became officially supported in 3.14 (released October 2025) per PEP 779, allows the same threading code to execute in parallel without lock contention. The code remains identical to the multithreading example above, but requires a free-threaded environment:
  • Build from source: ./configure --disable-gil && make && make install.
  • Or download pre-built binaries from python.org for supported platforms.
In this mode, threads can run code simultaneously on multiple cores, matching or exceeding performance for tasks due to and lower overhead. On a 4-core CPU, the threading code in free-threaded 3.14 typically completes in ~2-2.5 seconds, approaching linear scaling while avoiding costs.
ApproachRuntime on 4-Core CPU (approx.)Scalability Notes
Single-Threaded10-12 secondsBaseline; uses 1 core.
Threading (GIL-enabled)10-12 secondsNo parallelism; GIL serializes execution.
2.5-3 secondsLinear scaling with cores; process overhead ~20-30%.
Threading (No-GIL, Python 3.14)2-2.5 secondsMatches ; shared memory reduces overhead to 5-10%.
These results demonstrate that while provides reliable parallelism today, free-threaded builds in 3.14 offer a lighter-weight path to multi-core utilization for thread-based code, with single-threaded overhead minimized to 5-10% compared to GIL-enabled builds.

References

  1. [1]
    threading — Thread-based parallelism — Python 3.14.0 ...
    CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance ...multiprocessing.Process · Concurrent Execution · Thread
  2. [2]
    PEP 703 – Making the Global Interpreter Lock Optional in CPython
    Jan 9, 2023 · CPython's global interpreter lock (“GIL”) prevents multiple threads from executing Python code at the same time. The GIL is an obstacle to using ...<|control11|><|separator|>
  3. [3]
    Initialization, Finalization, and Threads — Python 3.14.0 ...
    They are intended to replace reliance on the global interpreter lock, and are no-ops in versions of Python with the global interpreter lock. Critical ...Process-Wide Parameters · Thread State And The Global... · Non-Python Created Threads
  4. [4]
    GlobalInterpreterLock - Python Wiki
    Dec 22, 2020 · The global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once.
  5. [5]
    None
    ### Summary of GIL in Python from http://www.dabeaz.com/python/GIL.pdf
  6. [6]
    Chapter 19: Concurrency - The Definitive Guide to Jython
    No Global Interpreter Lock¶. Jython lacks the global interpreter lock (GIL), which is an implementation detail of CPython. For CPython, the GIL means that only ...
  7. [7]
    Python Release Python 3.13.0
    Oct 7, 2024 · Python 3.13.0 is the newest major release of the Python programming language, and it contains many new features and optimizations compared to Python 3.12.<|separator|>
  8. [8]
    What's new in Python 3.14 — Python 3.14.0 documentation
    This article explains the new features in Python 3.14, compared to 3.13. Python 3.14 was released on 7 October 2025. For full details, see the changelog. See ...
  9. [9]
    cpython/Python/ceval.c at main · python/cpython
    Insufficient relevant content. The provided text is a GitHub page header and navigation menu, not the actual content of `ceval.c`. It lacks details about GIL acquisition or any code-specific information.
  10. [10]
    cpython/Include/pystate.h at main · python/cpython
    Insufficient relevant content. The provided text is a GitHub page header and navigation menu, not the actual content of `pystate.h`. It does not contain thread state definitions like THREAD_LOCKED or THREAD_RUNNING.
  11. [11]
    None
    ### Summary of New GIL Acquisition and Release Mechanisms in Python 3.2
  12. [12]
  13. [13]
  14. [14]
    Python support for free threading — Python 3.14.0 documentation
    Starting with the 3.13 release, CPython has support for a build of Python called free threading where the global interpreter lock(GIL) is disabled.Python Support For Free... · Thread Safety · Known Limitations
  15. [15]
    What Is the Python Global Interpreter Lock (GIL)?
    The Python Global Interpreter Lock or GIL, in simple words, is a mutex (or a lock) that allows only one thread to hold the control of the Python interpreter.What Problem Did the GIL... · The Impact on Multi-Threaded...
  16. [16]
    [PDF] Impact of GIL-less Cpython on Performance and Compatibility
    Apr 29, 2025 · The Global Interpreter Lock (GIL) in CPython has long been a performance bottleneck for multi-threaded CPU-bound tasks, limiting Python's ...<|control11|><|separator|>
  17. [17]
    Python GIL prevent CPU usage to exceed 100% in multiple core ...
    Jan 29, 2016 · After some simple experiments, I found that although GIL lower down the performance, the total CPU usage may exceed 100% in multiple core machine.Python threads all executing on a single core - Stack OverflowLimit total CPU usage in python multiprocessing - Stack OverflowMore results from stackoverflow.com
  18. [18]
    [PDF] Python Workflows on HPC Systems - arXiv
    Dec 1, 2020 · In this paper, we analyze the key problems induced by the usage of Python on HPC clusters and sketch appropriate workarounds for efficiently ...
  19. [19]
    [PDF] Scalable and Performant Data Loading - arXiv
    Apr 23, 2025 · On top of that, Python's GIL (Global Interpreter Lock) makes it difficult to gain performance improvement from multi-threading. We found that ...
  20. [20]
    Hiding Latencies in Network-Based Image Loading for Deep Learning
    Mar 28, 2025 · Due to Python's Global Interpreter Lock (GIL), which limits parallel multi-threading, data loading suffers from considerable serialization and ...
  21. [21]
    multiprocessing — Process-based parallelism — Python 3.14.0 ...
    The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of ...
  22. [22]
    asyncio — Asynchronous I/O
    ### Summary of `asyncio` and GIL Relation
  23. [23]
    PEP 684 – A Per-Interpreter GIL | peps.python.org
    Mar 8, 2022 · Since Python 1.5 (1997), CPython users can run multiple interpreters in the same process. However, interpreters in the same process have ...
  24. [24]
    A per-interpreter GIL - LWN.net
    Aug 15, 2023 · "Subinterpreters", which are separate Python interpreters running in the same process that can be created using the C API, have been a part ...
  25. [25]
    colesbury/nogil: Multithreaded Python without the GIL - GitHub
    May 20, 2025 · A proof-of-concept implementation of CPython that supports multithreading without the global interpreter lock (GIL).
  26. [26]
    Cython and the GIL
    The sort of Cython code that can run without the GIL (no calls to Python, purely C-level numeric operations) is often the sort of code that runs efficiently.
  27. [27]
    PEP 703 (Making the Global Interpreter Lock Optional in CPython ...
    Oct 24, 2023 · The changes necessary to remove the GIL are substantive enough, and require so much coordination with other CPython development happening at the ...
  28. [28]
    PEP 683 – Immortal Objects, Using a Fixed Refcount | peps.python.org
    Feb 10, 2022 · This proposal mandates that, internally, CPython will support marking an object as one for which that runtime state will no longer change.
  29. [29]
    Faster Python: Unlocking the Python Global Interpreter Lock
    Jul 29, 2025 · The GIL is a mutex that prevents true multithreading. Removing it, as in Python 3.13, can enable true multithreading, leading to significant ...<|control11|><|separator|>
  30. [30]
    faster-cpython/benchmarking-public - GitHub
    (Exception to this rule are the weekly benchmarks of upstream main, there Tier 2, JIT, NOGIL and CLANG configurations are compared against default ...Faster Cpython Benchmark... · Documentation · Details About How Results...
  31. [31]
    PEP 779 – Criteria for supported status for free-threaded Python
    Mar 13, 2025 · This PEP establishes clear expectations and requirements for moving to Phase II, making the free-threaded Python build officially supported.
  32. [32]
    Parallelising Python with Threading and Multiprocessing - QuantStart
    Python uses Threading and Multiprocessing for parallelism. Threading is limited by GIL, while Multiprocessing spawns processes to utilize multiple cores.
  33. [33]
    How to parallel sum a loop using multiprocessing in Python
    Apr 22, 2015 · My question is how to use multiprocessing to compute a function over many segments and join-sum the results. I added some code above.Parallel multiprocessing in python easy example - Stack OverflowParallel CPU sum in Python - multithreading - Stack OverflowMore results from stackoverflow.com
  34. [34]
    Python 3.14 Is Here. How Fast Is It? - miguelgrinberg.com
    Oct 8, 2025 · The free-threading interpreter disables the global interpreter lock (GIL), a change that promises to unlock great speed gains in multi-threaded ...