Fact-checked by Grok 2 weeks ago

Double-checked locking

Double-checked locking is a software design pattern used in multithreaded programming to implement thread-safe lazy initialization of a shared resource, such as a singleton object, by checking the initialization condition twice: once without acquiring a lock to avoid unnecessary synchronization overhead, and a second time while holding the lock to ensure atomicity.^[1] The pattern's motivation stems from the performance cost of full synchronization in scenarios where the resource is accessed frequently after initial creation, allowing subsequent accesses to proceed without locking by relying on the visibility of the initialized state across threads.^[2] However, under early Java memory models (pre-Java 5), double-checked locking was unreliable due to compiler and processor reorderings that could cause threads to observe partially constructed objects or stale data, leading to undefined behavior.^[3] In 2001, a declaration signed by prominent computer science experts, including Joshua Bloch and Doug Lea, highlighted these flaws, deeming the idiom "broken" without additional safeguards.^[1] The issue was resolved in Java 5 through the JSR-133 memory model revision, which enables safe use by declaring the shared field as volatile to enforce proper happens-before relationships and visibility guarantees.^[2] While primarily associated with Java, the pattern applies to other languages like C++ and Go, though implementations must account for language-specific memory models to avoid similar pitfalls.^[1] Alternatives, such as the Initialization-on-Demand Holder idiom, offer lock-free lazy initialization without these risks in compatible environments.^[2]

Example in Java (Post-Java 5)

The following pseudocode illustrates a basic volatile-enabled implementation for a singleton:

java
class [Singleton](/page/Singleton) {
    private volatile static [Singleton](/page/Singleton) instance;
    
    private [Singleton](/page/Singleton)() {}
    
    public static [Singleton](/page/Singleton) getInstance() {
        if (instance == [null](/page/Null)) {
            synchronized ([Singleton](/page/Singleton).class) {
                if (instance == [null](/page/Null)) {
                    instance = new [Singleton](/page/Singleton)();
                }
            }
        }
        return instance;
    }
}
class [Singleton](/page/Singleton) {
    private volatile static [Singleton](/page/Singleton) instance;
    
    private [Singleton](/page/Singleton)() {}
    
    public static [Singleton](/page/Singleton) getInstance() {
        if (instance == [null](/page/Null)) {
            synchronized ([Singleton](/page/Singleton).class) {
                if (instance == [null](/page/Null)) {
                    instance = new [Singleton](/page/Singleton)();
                }
            }
        }
        return instance;
    }
}

This ensures thread safety while minimizing synchronization after the first access.^[2] Despite fixes, the pattern remains debated for its complexity, with recommendations to prefer simpler synchronization strategies unless performance profiling justifies its use.^[3]

Overview and Motivation

Definition and Core Concept

Double-checked locking is a synchronization optimization pattern used in concurrent programming to efficiently initialize shared resources, such as singletons or lazily loaded objects, in multithreaded environments. The pattern operates by first testing a locking predicate—typically a condition like whether an object reference is null—outside of any lock to determine if initialization is required. If the predicate suggests that the resource is uninitialized, the thread acquires a mutex or lock, re-evaluates the predicate within the protected critical section, and executes the initialization block only if the condition still holds before releasing the lock. This double verification ensures that the initialization occurs exactly once while avoiding unnecessary synchronization for subsequent accesses.^[4] The primary advantage of double-checked locking is its reduction of lock acquisition overhead, particularly in high-contention scenarios where multiple threads frequently attempt to access the same resource. By permitting lock-free reads after the initial setup, the pattern improves throughput and scalability compared to always-locked alternatives, all while maintaining thread safety to prevent race conditions that could result in duplicate initializations or corrupted states. This efficiency makes it suitable for performance-critical applications requiring deferred resource creation.^[4] Central terms in the pattern include the locking predicate, the boolean condition (e.g., object nullity) checked to assess the need for synchronization; the critical section, the mutually exclusive code segment guarded by the lock where the predicate is re-tested and initialization may occur; and the initialization block, the atomic sequence of operations that sets up the shared resource, such as allocating and configuring an object instance.^[4]

Role in Multithreaded Lazy Initialization

Lazy initialization is a technique in software design where the creation of an object or resource is deferred until it is first accessed, thereby conserving memory and computation resources that might otherwise be wasted on unused components.^[1] In multithreaded environments, this approach introduces race conditions, as multiple threads may simultaneously detect the need for initialization and attempt to create the object, potentially leading to duplicate instances, redundant computations, or inconsistent states.^[4] A naive solution to ensure thread safety involves applying full synchronization—such as a mutex or lock—on every access to the lazily initialized resource, guaranteeing that only one thread performs the initialization.^[1] However, this incurs significant performance overhead due to the cost of acquiring and releasing locks on each subsequent access, even after the resource is fully initialized, which can degrade throughput in high-concurrency scenarios.^[5] Double-checked locking addresses these bottlenecks by performing an initial unsynchronized check to see if the resource needs initialization; if so, it acquires the lock, performs a second check under synchronization, and initializes only if necessary.^[4] This optimization eliminates locking overhead for all accesses after the first successful initialization, yielding substantial performance gains—such as over 15 times faster access times compared to fully synchronized alternatives in benchmarked scenarios—while maintaining thread safety for deferred creation.^[4] Common use cases for double-checked locking include implementing the singleton pattern, where a single instance of a class must be shared across threads without eager creation; managing resource pools, such as thread pools or connection pools in server applications to avoid premature allocation; and deferring expensive computations, like loading large datasets or initializing complex algorithms, until demanded by concurrent requests.^[4] These applications are prevalent in performance-critical systems, such as web servers and distributed computing frameworks, where minimizing initialization latency and synchronization costs directly impacts scalability.^[1]

The Double-Checked Locking Pattern

Original Pattern Mechanics

The double-checked locking pattern, also known as the double-checked locking optimization, is a concurrency idiom designed to reduce the overhead of synchronization in multithreaded environments by performing lazy initialization of shared resources, such as singleton objects, while ensuring thread safety. In its classic form, the pattern operates through a sequence of steps that minimize lock acquisitions: first, a thread performs an unlocked check on a shared variable—typically a pointer or flag indicating whether the resource is initialized (e.g., if the variable is null). If the variable indicates that initialization is needed, the thread acquires a mutex or lock to enter a critical section. Inside this locked section, the thread re-examines the shared variable to confirm it remains uninitialized, as another thread might have completed the work in the interim. If the variable is still uninitialized, the thread proceeds to create and fully initialize the resource, such as by allocating memory and constructing the object, before releasing the lock. This process allows subsequent accesses by any thread to bypass locking entirely once initialization is complete, as the unlocked initial check will detect the initialized state.^[6] The pattern's efficiency stems from its emphasis on memory visibility guarantees: after the lock is released following successful initialization, all threads must observe the fully constructed resource when performing the unlocked check, preventing partial or inconsistent views that could lead to errors like accessing uninitialized data. This visibility is crucial for the pattern's correctness, as it ensures that the initialization steps—such as writes to the shared variable and any dependent stores—are propagated across threads without requiring ongoing synchronization for reads. In practice, this optimization can significantly lower contention in high-throughput scenarios, where frequent access to the shared resource occurs after the initial setup.^[6]^[7] Theoretically, the original pattern assumes a memory model providing sequential consistency, where operations appear to execute in the order specified by the program across all threads, ensuring that the re-check inside the lock sees the latest state and that post-initialization reads reflect the committed writes. Under this model, the lock acquisition and release act as synchronization points that order memory operations, making the pattern reliable for one-time initialization. However, the idiom breaks down in relaxed memory models common in modern multiprocessors and compilers, where instruction reordering or caching effects can cause writes to become visible out of order—such as a pointer being set before the object is fully constructed—leading to other threads observing invalid states despite the double check.^[7]^[6]

Pseudocode and Basic Implementation

The double-checked locking pattern can be illustrated through a generic pseudocode representation for implementing thread-safe lazy initialization of a singleton instance. This approach employs an initial null check outside the lock to bypass synchronization for threads accessing an already-initialized object, followed by locking and a repeated check within the synchronized block to ensure atomic creation if necessary.^[4] The following pseudocode depicts a typical singleton getter function:

function getSingleton():
    if instance == null:
        mutex.lock()
        if instance == null:
            instance = new Singleton()
        mutex.unlock()
    return instance
function getSingleton():
    if instance == null:
        mutex.lock()
        if instance == null:
            instance = new Singleton()
        mutex.unlock()
    return instance

In this structure, the outer if instance == null condition serves as the first check, allowing most subsequent calls—after initialization—to avoid acquiring the lock entirely and thus minimizing contention overhead.^[4] Upon entering the synchronized section, mutex.lock() serializes access among competing threads to prevent concurrent initialization attempts.^[4] The inner if instance == null performs the second verification, executing the instantiation instance = new Singleton() only if no other thread has already completed it, thereby ensuring the object is created exactly once.^[4] Finally, mutex.unlock() releases the lock, permitting other threads to proceed.^[4] The return instance outside the lock provides efficient access to the shared resource post-initialization.^[4] This pseudocode assumes an idealized sequential execution model where memory operations occur in the order specified by the program, without compiler optimizations, hardware reordering, or caching effects that could violate visibility across threads; such assumptions hold in a single-threaded context but require additional safeguards in real multithreaded environments.^[4]

Historical Challenges and Fixes

Reordering Issues from Compilers and Hardware

Compiler optimizations play a critical role in breaking the double-checked locking pattern by reordering instructions to improve performance, often moving writes before reads in ways that violate expected happens-before relationships between threads.^[7] For instance, a compiler might reorder the allocation of memory for an object before its constructor invocation, allowing another thread to observe a non-null reference to an uninitialized instance.^[1] This reordering occurs because compilers assume a single-threaded execution model and lack built-in mechanisms to enforce cross-thread ordering without explicit synchronization primitives.^[7] Hardware factors exacerbate these issues through mechanisms designed for efficiency in modern processors. Out-of-order execution allows CPUs to reorder instructions dynamically, potentially completing stores to the reference before the object's full initialization becomes visible across cores.^[7] Store buffers delay the propagation of writes to the main memory, while cache coherence protocols in multiprocessor systems can lead to temporary inconsistencies, where one thread sees an updated pointer but not the associated constructor side effects.^[7] These hardware optimizations prioritize speed over strict ordering, resulting in partial visibility of shared state during lazy initialization.^[1] At the foundation of these problems lie differing memory consistency models, which define the guarantees for how operations appear to threads. Sequential consistency, a strict model, requires that all memory accesses occur in a total order consistent with each thread's program order, ensuring predictable visibility. However, most modern systems employ relaxed memory models, such as those using acquire-release semantics, which permit reorderings to boost performance by allowing non-dependent operations to overlap across threads. The double-checked locking pattern implicitly assumes sequential consistency, failing under relaxed models where initialization writes may not be immediately observable, thus breaking the intended synchronization.^[7] A concrete example illustrates this failure: consider two threads accessing a lazily initialized singleton. Thread A allocates memory, sets the reference to point to it, and then invokes the constructor to initialize fields like a counter to 42. Due to reordering, Thread B might read the non-null reference before the constructor completes, accessing an object with default field values (e.g., counter at 0), leading to incorrect behavior or crashes from partially constructed state.^[1] This scenario, observed in early Java implementations, highlights the universal risk across languages without proper memory barriers.^[5]

Early Failures in Java Pre-5.0

In Java versions 1.4 and earlier, the memory model permitted unrestricted reordering of operations by compilers and processors, which rendered the double-checked locking pattern unreliable for multithreaded lazy initialization.^[8] This reordering could cause one thread to observe a non-null reference to an object before its constructor had fully executed, leading to partially constructed or uninitialized states visible to other threads.^[1] For instance, the assignment of a reference to a new object instance could be moved ahead of the constructor calls, exposing raw memory to concurrent readers without synchronization guarantees.^[9] Documented failures of the pattern were extensively analyzed in the 2001 "Double-Checked Locking is Broken" declaration, signed by David Bacon, Joshua Bloch, Doug Lea, Paul Haahr, and others, which demonstrated the idiom's unreliability across multiple platforms.^[1] Real-world examples included the Symantec JIT compiler reordering operations in a way that allowed threads to access uninitialized singleton fields, as shown in test cases where shared resources appeared initialized but contained garbage values.^[1] These issues were particularly pronounced on architectures like the Alpha processor, where hardware-level reorderings exacerbated visibility problems without explicit memory barriers.^[1] The double-checked locking pattern gained popularity in the 1990s as an optimization technique in early Java concurrency literature, but its flaws were prominently highlighted starting in 2001 through works like Doug Lea's analyses and the aforementioned declaration.^[1] Joshua Bloch's Effective Java (first edition, 2001) issued explicit warnings against its use for implementing thread-safe singletons, citing the potential for subtle concurrency bugs due to the weak memory model.^[10] These revelations prompted the Java Community Process to address the shortcomings via JSR-133, which revised the memory model in Java 5 (released in 2004) to restore reliability through enhanced volatile semantics, though implementation details of the fix lie beyond this historical context.^[8]

Implementations in Modern Languages

In Java (Post-5.0 with Volatile)

In Java 5.0 and later, the double-checked locking (DCL) pattern was rendered safe through revisions to the Java Memory Model (JMM) under JSR-133, which introduced stricter happens-before guarantees for synchronization and volatile variables. The critical fix involves declaring the shared reference field as volatile, which prevents instruction reordering by the compiler or processor and ensures that writes to the field are immediately visible to other threads. This addresses the pre-5.0 issues where partial object construction could become visible prematurely due to caching and reordering, potentially leading to undefined behavior in multithreaded environments.^[11]^[12] The updated DCL idiom typically implements lazy initialization for a singleton, where the initial null check avoids synchronization overhead for subsequent accesses, while the synchronized block protects the actual instantiation. Here's a representative example for a thread-safe singleton:

java
public class [Singleton](/page/Singleton) {
    private static volatile [Singleton](/page/Singleton) instance;

    private [Singleton](/page/Singleton)() {}

    public static [Singleton](/page/Singleton) getInstance() {
        if (instance == [null](/page/Null)) {
            synchronized ([Singleton](/page/Singleton).class) {
                if (instance == [null](/page/Null)) {
                    instance = new [Singleton](/page/Singleton)();
                }
            }
        }
        return instance;
    }
}
public class [Singleton](/page/Singleton) {
    private static volatile [Singleton](/page/Singleton) instance;

    private [Singleton](/page/Singleton)() {}

    public static [Singleton](/page/Singleton) getInstance() {
        if (instance == [null](/page/Null)) {
            synchronized ([Singleton](/page/Singleton).class) {
                if (instance == [null](/page/Null)) {
                    instance = new [Singleton](/page/Singleton)();
                }
            }
        }
        return instance;
    }
}

In this implementation, the volatile keyword on instance establishes a happens-before relationship: the write in the constructor is ordered before the volatile store, and subsequent reads see the fully constructed object. With the enhanced JMM, DCL became an acceptable optimization for reducing synchronization costs in high-contention scenarios, as it limits locking to the rare initialization path while maintaining thread safety. However, it requires careful use, particularly when the initialized object contains final fields, as improper construction could still expose partially initialized states despite volatility. Performance benchmarks show DCL can reduce latency by avoiding full synchronization after initialization, though its benefits diminish on modern hardware with advanced memory barriers.^[8] As of Java 25 (released in September 2025), the preview Stable Values API (JEP 502) offers a safer alternative to volatile-based DCL by enabling deferred immutability without manual synchronization, though the volatile approach remains the standard fix for DCL in earlier post-5.0 versions.^[13]

In C++11 and Later with Atomics

In C++11 and later standards, double-checked locking is rendered safe and portable through the introduction of the std::atomic template and a well-defined memory model, which prevent reordering issues that plagued earlier implementations. This allows developers to implement the pattern for lazy initialization of shared resources, such as singletons, by using atomic operations on a pointer with explicit memory ordering constraints to establish happens-before relationships between threads. The core mechanism relies on an initial relaxed or acquire load to check the pointer, followed by a locked critical section for initialization if needed, and a release store to publish the result atomically.^[14]^[15] The requirements for a correct implementation center on using std::atomic<T*> (where T is the type being lazily initialized) for the shared pointer variable. The first check employs an acquire load (std::memory_order_acquire) to ensure visibility of prior writes if the pointer is non-null, avoiding the need for a lock in the common case. Within the mutex-protected section, a second check uses a relaxed load (std::memory_order_relaxed) for efficiency, and if initialization is required, the object is created and stored via a release store (std::memory_order_release) to synchronize with subsequent acquire loads in other threads. This pairing of acquire and release semantics guarantees that all writes to the object (e.g., constructor side effects) become visible to readers after the release, without requiring sequential consistency overhead.^[16]^[14] A representative C++11 implementation for a thread-safe singleton might appear as follows:

cpp
#include <atomic>
#include <mutex>

class Singleton {
private:
    static std::atomic<Singleton*> instance;
    static std::mutex mtx;

public:
    static Singleton* getInstance() {
        Singleton* tmp = instance.load(std::memory_order_acquire);
        if (tmp == nullptr) {
            std::lock_guard<std::mutex> lock(mtx);
            tmp = instance.load(std::memory_order_relaxed);
            if (tmp == nullptr) {
                tmp = new Singleton();
                instance.store(tmp, std::memory_order_release);
            }
        }
        return tmp;
    }
};

std::atomic<Singleton*> Singleton::instance{nullptr};
std::mutex Singleton::mtx;
#include <atomic>
#include <mutex>

class Singleton {
private:
    static std::atomic<Singleton*> instance;
    static std::mutex mtx;

public:
    static Singleton* getInstance() {
        Singleton* tmp = instance.load(std::memory_order_acquire);
        if (tmp == nullptr) {
            std::lock_guard<std::mutex> lock(mtx);
            tmp = instance.load(std::memory_order_relaxed);
            if (tmp == nullptr) {
                tmp = new Singleton();
                instance.store(tmp, std::memory_order_release);
            }
        }
        return tmp;
    }
};

std::atomic<Singleton*> Singleton::instance{nullptr};
std::mutex Singleton::mtx;

This code ensures exception safety during the unlocked check but relies on manual memory management for the raw pointer.^[15] Prior to C++11, double-checked locking in C++ was unreliable due to the absence of a standardized memory model, forcing reliance on compiler-specific volatile qualifiers (which do not guarantee inter-thread visibility) or platform-dependent memory barriers, often leading to subtle bugs from instruction reordering. The C++11 standard addresses these by formalizing the memory model in section [intro.races] and providing std::atomic with configurable ordering, enabling portable synchronization without vendor extensions.^[7] For best practices, especially to enhance exception safety and automate lifetime management, use std::atomic<std::shared_ptr<T>> instead of raw pointers; this specialization, available since C++20, atomically manages reference counts during load and store operations, preventing leaks if initialization throws. In C++11–17, approximate this by constructing the std::shared_ptr within the lock and using atomic operations on a raw pointer, but pair it with std::shared_ptr returns for callers. Avoid manual double-checked locking when possible, favoring C++11's magic statics (static T instance;) for simple cases, as they internally employ similar optimizations.^[17]^[18]

In C# with Volatile Fields

In C#, the double-checked locking pattern leverages the volatile keyword applied to the shared field to inhibit just-in-time (JIT) compiler optimizations that could reorder instructions across threads, ensuring proper memory visibility and ordering as per the Common Language Infrastructure (CLI) memory model defined in ECMA-335. This is combined with the lock statement, which provides mutual exclusion and implicit full memory barriers (acquire on entry and release on exit) to synchronize access during the critical initialization phase. The pattern is thread-safe in .NET Framework 2.0 and later implementations, where the runtime's memory model guarantees that volatile reads and writes, along with lock semantics, prevent the reordering issues that plagued earlier systems.^[19]^[20] A typical implementation for a thread-safe singleton uses a volatile static reference to the instance and a private static object for locking, avoiding broader synchronization scopes. The outer check skips the lock if the instance is already initialized, while the inner check within the lock ensures only one thread performs the creation. Here's an example:

csharp
public sealed class Singleton
{
    private static volatile Singleton instance;
    private static readonly object padlock = new object();

    private Singleton() { }

    public static Singleton Instance
    {
        get
        {
            if (instance == null)
            {
                lock (padlock)
                {
                    if (instance == null)
                    {
                        instance = new Singleton();
                    }
                }
            }
            return instance;
        }
    }
}
public sealed class Singleton
{
    private static volatile Singleton instance;
    private static readonly object padlock = new object();

    private Singleton() { }

    public static Singleton Instance
    {
        get
        {
            if (instance == null)
            {
                lock (padlock)
                {
                    if (instance == null)
                    {
                        instance = new Singleton();
                    }
                }
            }
            return instance;
        }
    }
}

This code ensures lazy initialization with minimal contention, as subsequent accesses bypass the lock entirely after the first creation. While the pattern is reliable under the .NET runtime's guarantees, alternatives like Interlocked.CompareExchange can offer finer-grained atomic control for similar scenarios, though the lock-based approach remains prevalent for its simplicity.^[19] Common pitfalls include locking on this or the type object (typeof(Singleton)), which can lead to deadlocks across application domains or with thread aborts, as these locks may acquire process-wide synchronization. Instead, a dedicated private static object like padlock confines the lock scope and mitigates such risks. Additionally, omitting the volatile modifier risks visibility issues on platforms with weak memory models, potentially causing threads to observe stale null values despite successful initialization elsewhere.^[21]

In Go with Sync Package

In Go, double-checked locking leverages the sync package's synchronization primitives to implement thread-safe lazy initialization while minimizing lock contention, adhering to the language's memory model which ensures sequential consistency for data-race-free executions through happens-before relationships.^[22] The model permits compiler and hardware reordering but requires explicit synchronization—such as mutex operations or atomic accesses—to establish visibility guarantees, preventing scenarios where a goroutine observes an initialization flag without seeing the associated state updates.^[22] A standard approach combines sync.Mutex for exclusive access during initialization with sync/atomic operations for the double check, ensuring atomic pointer loads and stores provide release-acquire ordering. This avoids the pitfalls of naive implementations, where unsynchronized reads might yield stale or partial values.^[22] For pure double-checked locking, atomic.Pointer is preferred over plain pointers to guarantee that once a value is stored, it is immediately visible across goroutines without additional barriers. The following example illustrates a goroutine-safe lazy initialization for a singleton using atomic.Pointer and sync.Mutex, where the fast path checks the pointer without locking, and the slow path initializes only if necessary:

go
package example

import (
	"sync"
	"sync/atomic"
)

type Singleton struct {
	// Application-specific fields
	data string
}

var instance atomic.Pointer[Singleton]
var mu sync.Mutex

// GetInstance performs double-checked locking for lazy initialization.
func GetInstance() *Singleton {
	if p := instance.Load(); p != nil {
		return p
	}
	mu.Lock()
	defer mu.Unlock()
	if p := instance.Load(); p != nil {
		return p
	}
	s := &Singleton{data: "initialized"}
	instance.Store(s)
	return s
}
package example

import (
	"sync"
	"sync/atomic"
)

type Singleton struct {
	// Application-specific fields
	data string
}

var instance atomic.Pointer[Singleton]
var mu sync.Mutex

// GetInstance performs double-checked locking for lazy initialization.
func GetInstance() *Singleton {
	if p := instance.Load(); p != nil {
		return p
	}
	mu.Lock()
	defer mu.Unlock()
	if p := instance.Load(); p != nil {
		return p
	}
	s := &Singleton{data: "initialized"}
	instance.Store(s)
	return s
}

This pattern reduces overhead for repeated calls, as subsequent invocations bypass the mutex entirely after initialization.^[22] The atomic operations ensure that the store in the slow path synchronizes before any subsequent load, making the instance visible consistently. In read-heavy workloads, sync.RWMutex enhances performance by permitting concurrent reads via RLock while reserving exclusive Lock for writes, aligning with the memory model's synchronization rules where an Unlock precedes and orders subsequent RLock or Lock calls. The double check occurs first under RLock to quickly return an existing instance, falling back to write mode only if needed; this setup ensures initialized values are visible to readers due to the happens-before from the writer's Unlock.^[22] Despite the viability of these manual double-checked locking variants, Go's idiomatic preference is sync.Once, which internally employs a mutex-based double-check mechanism to execute an initialization function exactly once, guaranteeing its completion and visibility without exposing the underlying synchronization details. This abstraction minimizes bugs from improper ordering and is recommended for most cases over custom implementations.^[22] For instance:

go
package example

import "sync"

type Singleton struct {
	// Application-specific fields
	data string
}

var once sync.Once
var instance *Singleton

func initInstance() {
	instance = &Singleton{data: "initialized"}
}

// GetInstance uses sync.Once for safe lazy initialization.
func GetInstance() *Singleton {
	once.Do(initInstance)
	return instance
}
package example

import "sync"

type Singleton struct {
	// Application-specific fields
	data string
}

var once sync.Once
var instance *Singleton

func initInstance() {
	instance = &Singleton{data: "initialized"}
}

// GetInstance uses sync.Once for safe lazy initialization.
func GetInstance() *Singleton {
	once.Do(initInstance)
	return instance
}

sync.Once's internal logic handles races transparently, making it more reliable for concurrent environments than manual double-checked locking.

POSIX and System-Level Usage

Thread Synchronization in Unix-Like Systems

In Unix-like systems, double-checked locking can be implemented using POSIX threads (pthreads), a standard API for multithreading that provides portable synchronization primitives across compliant operating systems. The pattern leverages pthread mutexes to protect the critical section where lazy initialization occurs, ensuring thread safety without requiring a lock on every access. Specifically, the first check for the condition (e.g., whether an object is initialized) is performed without locking, and if it fails, the thread acquires the mutex using pthread_mutex_lock() before performing the second check and potential initialization, followed by pthread_mutex_unlock() to release it. This approach minimizes contention by avoiding unnecessary locking when the object is already initialized.^[23] The semantics of pthread mutexes are defined in POSIX.1-2001, which specifies that locking and unlocking operations are atomic and provide mutual exclusion, preventing multiple threads from entering the critical section simultaneously. For double-checked locking to function correctly, these mutex operations serve as full memory barriers, ensuring that all memory writes prior to the unlock are visible to subsequent reads after the lock in other threads, thus addressing potential reordering issues from compilers or hardware. This synchronization guarantee is crucial, as without it, partially constructed objects could be observed by other threads, leading to undefined behavior. POSIX mutexes impose restrictions on instruction reordering, acting as hard sequence points that maintain the intended order of operations across threads.^[7] This POSIX-based implementation of double-checked locking is highly portable, applicable to Unix-like operating systems such as Linux and macOS, where pthreads form the foundation for threading support. Implementations often wrap these low-level primitives in higher-level libraries for ease of use, such as Go's sync package, which builds upon POSIX mutex semantics to provide safe concurrent access.

Mutex-Based Double-Checking

Mutex-based double-checked locking in POSIX environments utilizes pthread_mutex_t to protect shared resource initialization, ensuring thread safety while minimizing lock contention for subsequent accesses. The pattern begins with initializing the mutex using pthread_mutex_init with default attributes or the PTHREAD_MUTEX_INITIALIZER static initializer to establish a fast mutex suitable for most cases. A non-locked preliminary check verifies if the resource (e.g., a shared flag or pointer) is already initialized, avoiding unnecessary locking in the common case where initialization has completed. If uninitialized, the thread acquires the lock via pthread_mutex_lock, performs a second check under protection to confirm the state, initializes the resource if needed, and then releases the lock with pthread_mutex_unlock. This manual approach serves as an alternative to pthread_once for one-time initialization, providing fine-grained control over the process.^[4]^[24] The following C pseudocode illustrates a thread-safe lazy initialization of a shared resource using double-checked locking with pthreads:

c
#include <pthread.h>
#include <stdlib.h>

static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER;
static volatile int initialized = 0;
static void* shared_resource = NULL;

void* get_shared_resource() {
    if (!initialized) {
        int ret = pthread_mutex_lock(&init_mutex);
        if (ret != 0) {
            // Handle error, e.g., log and return NULL
            return NULL;
        }
        if (!initialized) {
            shared_resource = malloc(sizeof(/* resource type */));  // Or other init
            if (shared_resource == NULL) {
                pthread_mutex_unlock(&init_mutex);
                return NULL;
            }
            initialized = 1;
        }
        ret = pthread_mutex_unlock(&init_mutex);
        if (ret != 0) {
            // Handle error, e.g., abort or log
            return NULL;
        }
    }
    return shared_resource;
}
#include <pthread.h>
#include <stdlib.h>

static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER;
static volatile int initialized = 0;
static void* shared_resource = NULL;

void* get_shared_resource() {
    if (!initialized) {
        int ret = pthread_mutex_lock(&init_mutex);
        if (ret != 0) {
            // Handle error, e.g., log and return NULL
            return NULL;
        }
        if (!initialized) {
            shared_resource = malloc(sizeof(/* resource type */));  // Or other init
            if (shared_resource == NULL) {
                pthread_mutex_unlock(&init_mutex);
                return NULL;
            }
            initialized = 1;
        }
        ret = pthread_mutex_unlock(&init_mutex);
        if (ret != 0) {
            // Handle error, e.g., abort or log
            return NULL;
        }
    }
    return shared_resource;
}

This example employs a volatile qualifier on the initialized flag to prevent compiler optimizations from reordering reads across threads, though the mutex operations inherently provide synchronization.^[24]^[4] Error handling is crucial to avoid undefined behavior or resource leaks; for instance, pthread_mutex_init may return EINVAL if the mutex attributes are invalid, while pthread_mutex_lock can yield EDEADLK if a deadlock is detected with error-checking mutex types (e.g., PTHREAD_MUTEX_ERRORCHECK). To mitigate deadlock risks from recursive locking, select a recursive mutex type (PTHREAD_MUTEX_RECURSIVE) during initialization, which tracks a lock count and allows the owning thread to relock without blocking. Always verify return values from pthread functions, as failures like ENOMEM during allocation can necessitate cleanup and unlock attempts to prevent hangs. For optimization in high-contention scenarios, pthread_mutex_trylock can replace the initial lock attempt, returning EBUSY immediately if contended, allowing the caller to fall back to full locking or retry; however, this adds complexity and is less common in standard double-checked patterns.^[24]^[25] In terms of performance, the mutex acquisition in POSIX threads acts as a full memory barrier, serializing memory operations to ensure visibility of the initialization across threads without requiring explicit fences like pthread_rwlock_rdlock or assembly instructions. This eliminates reordering issues from compilers or hardware, as the lock-unlock pair guarantees that writes before the unlock are observable after subsequent locks. For example, in a 1997 benchmark on an UltraSPARC-II system with two 70 MHz processors, double-checked locking reduced average access time by over 15 times compared to always-locked alternatives (from 4.43 µs to 0.30 µs per call), due to the unlocked fast path dominating after initialization.^[24]^[26]^[4]

Contemporary Alternatives and Deprecations

Stable Values in Java 25 (Preview)

In Java 25, released on September 16, 2025, JEP 502 introduced the Stable Values API as a preview feature to enable thread-safe, lock-free lazy initialization of immutable objects treated as constants by the JVM.^[13] This addresses longstanding challenges in deferred initialization by allowing values to be computed at most once, on demand, while guaranteeing immutability thereafter and permitting JVM optimizations such as constant folding.^[13] Unlike traditional approaches requiring volatile fields or synchronization, stable values decouple initialization timing from class loading, improving application startup performance in multi-threaded environments.^[27] The mechanism relies on the StableValue<T> class, which acts as a container for a single value of type T. Initialization occurs via methods like orElseSet(Supplier<T> supplier), ensuring atomic, one-time computation without explicit locks, as the JVM enforces stability guarantees that prevent reordering across threads.^[13] The StableValue class signals to the compiler and runtime that the value, once set, behaves like a final constant, enabling safe publication and eliminating race conditions inherent in manual lazy patterns.^[28] For instance, stable values support double-checked locking semantics for arrays and collections, but in a more declarative and performant manner.^[13] A representative example for singleton creation using stable values is as follows:

java
import java.lang.StableValue;

public class Resource {
    private static final StableValue<Resource> INSTANCE = StableValue.of();
    
    private Resource() {}  // Private constructor for singleton enforcement
    
    public static Resource getInstance() {
        return INSTANCE.orElseSet(Resource::new);
    }
    
    // Immutable resource logic here
}
import java.lang.StableValue;

public class Resource {
    private static final StableValue<Resource> INSTANCE = StableValue.of();
    
    private Resource() {}  // Private constructor for singleton enforcement
    
    public static Resource getInstance() {
        return INSTANCE.orElseSet(Resource::new);
    }
    
    // Immutable resource logic here
}

This pattern initializes the resource only upon first access to getInstance(), without volatile qualifiers or synchronized blocks, leveraging the API's built-in thread safety.^[13] The preview feature can be enabled in Java 25 using the --enable-preview flag, and developers are encouraged to adopt it for new code to reduce reliance on manual double-checked locking constructs.^[29] As of November 2025, shortly after its release in Java 25, the Stable Values API has seen growing adoption in open-source libraries and frameworks focused on performance-sensitive applications, such as those optimizing startup times in microservices and containerized environments.^[30]

Other Lazy Initialization Patterns

In addition to double-checked locking (DCL), several alternative patterns provide thread-safe lazy initialization with reduced complexity and overhead, often leveraging language-specific guarantees for static or local variable initialization. The initialization-on-demand holder idiom in Java uses a static inner class to defer object creation until the first access, ensuring thread safety through the JVM's class loading mechanism without explicit synchronization. This approach, recommended for static fields, incurs negligible runtime cost and avoids the visibility issues that plagued early DCL implementations. Similarly, in C++, the Meyers' singleton employs a local static variable within a function, which is initialized on first call and guaranteed thread-safe under C++11 and later standards due to the magic statics rule that serializes initialization. For Go, the sync.Once type from the standard library executes a provided function exactly once across goroutines, handling synchronization internally and simplifying one-time initialization for shared resources like database connections. In C#, static constructors combined with the Lazy class enable deferred execution of expensive operations, where the constructor runs only once to initialize static data, and Lazy ensures thread-safe value computation without locks in most scenarios. These patterns are generally preferred over DCL in non-performance-critical paths because they eliminate the need for manual lock management and volatile qualifiers, reducing the risk of subtle bugs from memory model interactions. DCL should be reserved for hot code paths where lock contention significantly impacts throughput, such as in high-frequency trading systems, but even then, profiling is essential to confirm benefits outweigh the added complexity. Simpler alternatives like atomic flags (e.g., std::atomic in C++ or Interlocked in C#) or fully lock-free techniques using compare-and-swap operations offer further reductions in overhead for boolean checks or pointer assignments, making them suitable for lightweight lazy loading. Cross-language trends reflect a shift toward built-in language support for safe lazy initialization, minimizing reliance on custom patterns like DCL. Rust's OnceCell (stabilized as OnceLock in the standard library since version 1.70) provides a thread-safe cell for single-assignment values, with initialization deferred until first get() call and protected by internal synchronization. In .NET 9, enhancements to LazyInitializer and atomic operations in System.Threading further streamline lock-free lazy setup for references, allowing efficient one-time computations without allocating full Lazy instances in performance-sensitive scenarios. Despite fixes in modern languages, DCL retains inherent limitations, including its sensitivity to compiler optimizations and platform-specific memory barriers, which can lead to partial initialization visibility across threads even with volatiles. This ongoing complexity has prompted recommendations to profile applications before adopting DCL, as simpler alternatives often suffice without measurable performance loss in practice.

References

[1]
The "Double-Checked Locking is Broken" Declaration
With this change, the Double-Checked Locking idiom can be made to work by declaring the helper field to be volatile. This does not work under JDK4 and earlier.
[2]
None
### Summary of Double-Checked Locking (DCL) under JSR-133
[3]
CWE-609: Double-Checked Locking (4.18) - MITRE Corporation
Double-checked locking refers to the situation where a programmer checks to see if a resource has been initialized, grabs a lock, checks again to see if the ...
[4]
[PDF] Double-Checked Locking -- A Optimization Pattern for Efficiently ...
To solve this problem, we present the Double-Checked Lock- ing optimization pattern. This pattern is useful for reducing contention and synchronization overhead ...
[5]
Double-checked locking: Clever, but broken - InfoWorld
Feb 9, 2001 · Many Java programmers are familiar with the double-checked locking idiom, which allows you to perform lazy initialization with reduced ...Missing: Sometimes | Show results with:Sometimes
[6]
[PDF] Pattern-Oriented Software Architecture—Patterns for Concurrent and ...
... Double-Checked Locking Optimization........................................................ 306. Chapter 5: Concurrency Patterns.............................
[7]
[PDF] C++ and the Perils of Double-Checked Locking - Scott Meyers
This article explains why Singleton isn't thread safe, how DCLP attempts to address that problem, why DCLP may fail on both uni- and multiprocessor ar-.
[8]
JSR 133 (Java Memory Model) FAQ - UMD Computer Science
Under the new memory model, making the instance field volatile will "fix" the problems with double-checked locking, because then there will be a happens-before ...
[9]
Synchronization and the Java Memory Model - Doug Lea
Jul 29, 2000 · These same comments apply to multithreaded safety failures more generally. Concurrent programs that do not use synchronization fail for many ...
[10]
[PDF] Effective Java: Programming Language Guide - Pascal-Man
Jun 1, 2001 · [Pugh01b] The “Double-Checked Locking is Broken” Declaration. Ed. William Pugh,. University of. Maryland. March. 2001. <http://www.cs.umd.edu ...
[11]
https://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
[12]
Java Specification Requests - detail JSR# 133
JSR 133 describes the semantics of threads, locks, volatile variables, and data races, including the Java memory model.Missing: pre- 5.0
[13]
JEP 502: Stable Values (Preview) - OpenJDK
Jul 24, 2023 · Double-checked locking on arrays. Implementing a double-checked locking construct capable of supporting arrays of deferred immutable values ...
[14]
https://en.cppreference.com/w/cpp/atomic/memory_order
[15]
Double-Checked Locking Is Fixed in C++11 -- Jeff Preshing
The double-checked locking pattern (DCLP) is a bit of a notorious case study in lock-free programming. Up until 2004, there was no safe way to implement it in ...
[16]
https://en.cppreference.com/w/cpp/atomic/atomic
[17]
https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic2
[18]
https://en.cppreference.com/w/cpp/language/storage_duration
[19]
Broken variants on double-checked locking - Joe Duffy - Blog
Jan 26, 2006 · The 2.0 memory model does not use ld.acq 's unless you are accessing volatile data (marked w/ the volatile modifier keyword or accessed via the ...Missing: .net
[20]
volatile keyword - C# reference - Microsoft Learn
The volatile keyword indicates that a field might be modified by multiple threads that are executing at the same time.Missing: checked | Show results with:checked
[21]
Reliability Best Practices - .NET Framework - Microsoft Learn
Sep 10, 2022 · Do not use volatile fields. Cleanup code must be in a ... The double checking lock initialization code should look like this example:
[22]
The Go Memory Model - The Go Programming Language
### Summary of Double-Checked Locking, Synchronization, Memory Model Guarantees, and Lazy Initialization in Go
[23]
pthread_mutex_lock
The mutex object referenced by mutex shall be locked by calling pthread_mutex_lock(). If the mutex is already locked, the calling thread shall block until the ...
[24]
pthread_mutex_lock
### Summary of pthread_mutex_lock
[25]
pthread_mutex_lock
### Summary of pthread_mutex_trylock for Optimization and Errors
[26]
Memory Barriers Are Like Source Control Operations
Jul 10, 2012 · Operations on POSIX mutexes, such as pthread_mutex_lock. Just as there are many instructions which act as memory barriers, there are many ...
[27]
Java 25 Introduces Stable Values API for Deferred Immutability and ...
Jun 2, 2025 · JEP 502 introduces the Stable Values API in JDK 25, enhancing application startup performance by allowing deferred immutability.
[28]
Stable Values in Java 25 | Baeldung
Sep 26, 2025 · In this tutorial, we'll explore the preview Stable Value API (JEP 502) introduced in Java 25. 2. Understanding the Problem. As Java ...
[29]
Performance Improvements in JDK 25 - Inside.java
Oct 20, 2025 · JEP 502: Stable Values (Preview). Using the StableValue API (previewed in JDK 25), anyone can declare a lazy constant that is implicitly stable ...Missing: adoption | Show results with:adoption
[30]
https://inside.java/2025/10/20/jdk-25-performance-improvements/