Two-phase commit protocol
The two-phase commit (2PC) protocol is a distributed algorithm that coordinates multiple nodes in a transaction processing system to ensure atomicity, meaning all participants either commit their local changes or abort them entirely, preventing partial updates across distributed resources.[1] Introduced by Jim Gray in his 1978 paper "Notes on Data Base Operating Systems," the protocol formalizes a mechanism for reliable transaction commitment in environments like distributed databases, where a transaction spans multiple autonomous sites.[1] It operates under a coordinator-participant model, where one node acts as the coordinator and the others as participants (or cohorts), each managing local resources such as database locks and logs.[2] The protocol's core strength lies in its use of durable logging to support recovery: participants force log records to stable storage during preparation, enabling idempotent operations for redo or undo in case of failures.[1] In operation, 2PC proceeds in two distinct phases. During the first phase (prepare or voting phase), the coordinator sends a "prepare" message to all participants, prompting each to lock resources, perform local work, and write prepare log entries; participants then respond with a "yes" vote if ready to commit or "no" if unable to proceed.[2] If the coordinator receives unanimous "yes" votes, it enters the second phase (commit phase) by logging the decision and broadcasting a "commit" message, after which participants acknowledge completion and release locks; any "no" vote or timeout triggers an "abort" broadcast instead.[3] This process typically involves up to 4n messages for n participants, ensuring consensus while tolerating certain failures through timeouts and recovery protocols.[2] Widely adopted in enterprise systems, 2PC underpins standards like XA for transaction managers in databases (e.g., IBM IMS, Oracle) and middleware, facilitating applications in banking, e-commerce, and cloud services where data consistency across sites is critical.[3] However, it is a blocking protocol: if the coordinator fails after the prepare phase but before broadcasting the decision, participants may remain locked until recovery, potentially causing delays or deadlocks in high-availability environments.[2] Optimizations such as presumed commit or abort variants, along with three-phase commit extensions, address these issues by reducing logging overhead and blocking risks, though the basic 2PC remains foundational for its simplicity and guarantees.[2]Prerequisites and Assumptions
System Model
The two-phase commit protocol operates within a distributed transaction system comprising multiple nodes, each managing local resources such as databases or files, that must collectively execute a transaction atomically to maintain data consistency across the network.[4] These nodes communicate via message passing, and the system assumes processes run at arbitrary speeds but progress eventually, with faults being rare and detectable.[4] The protocol ensures that updates to shared resources are either all applied or all discarded, preventing partial states that could lead to inconsistencies in distributed environments.[5] In this model, the system designates distinct roles: a coordinator, typically a central node or process that initiates and orchestrates the commitment decision, and participants, which are resource managers at each node responsible for preparing local transaction effects and executing the final decision.[4] The coordinator drives the protocol by soliciting votes from participants on whether they can commit their portion of the transaction.[5] Participants, in turn, lock resources during preparation and transition states only upon receiving the coordinator's directive, relying on stable storage to log decisions for recovery.[4] The underlying transaction model emphasizes atomicity as a core ACID property, requiring that the entire distributed transaction commits only if all participants agree to do so, or aborts otherwise, thereby ensuring isolation, consistency, and durability across nodes despite potential failures.[5] This all-or-nothing guarantee extends the single-node transaction abstraction to distributed settings, where partial failures could otherwise violate system invariants.[5] For instance, in a banking system performing an electronic funds transfer between two accounts on separate databases, the protocol coordinates the debit from one account and credit to the other, ensuring the transfer completes fully or reverts entirely to avoid lost or duplicated funds.[5]Reliability Assumptions
The two-phase commit protocol relies on a set of reliability assumptions to guarantee the atomicity of distributed transactions across multiple nodes. It operates in an asynchronous communication model, where message delivery times are unbounded, but assumes reliable and ordered transmission without losses or undetected duplicates, often enforced through sequence numbering and acknowledgment protocols in communication sessions. This model detects and recovers from message omissions via logging and retries, ensuring that all nodes eventually receive consistent decisions despite delays.[1][6] A core requirement is the availability of stable storage at each participant node, where transaction decisions and logs are written durably before any commit messages are sent. This non-volatile storage survives node crashes independently of volatile memory, allowing nodes to persist critical state information such as prepare votes or final outcomes. Without stable storage, the protocol could not ensure durability, as transient failures might lead to inconsistent states across the system.[1][6] The protocol assumes a crash-recovery fault model, where nodes may fail by stopping (fail-stop semantics) and later recover, but they do not exhibit Byzantine behavior such as sending conflicting or malicious messages. Upon recovery, nodes replay logs to redo committed actions or undo uncommitted ones, restoring consistency without violating the global transaction outcome. This model excludes network partitions that prevent message delivery, relying instead on recoverable sessions to maintain coordination.[1][6] These assumptions were formalized in early distributed database research, notably by Jim Gray in 1978, who introduced the protocol for transaction management in systems like System R, emphasizing honest nodes and robust logging to handle crashes in multi-node environments.[1]Core Protocol Mechanics
Voting Phase
In the voting phase of the two-phase commit protocol, the coordinator initiates the process by sending a "prepare" message to all participants after completing its local transaction validation and ensuring that the transaction can proceed on its end.[1] This message prompts each participant to assess whether it can commit the transaction locally. Upon receiving the prepare message, each participant performs necessary local operations, including acquiring locks on relevant resources, verifying constraints such as data integrity and availability, and writing a prepare record to its stable log to indicate readiness. If these checks succeed, the participant transitions to a prepared state, logs the necessary redo and undo information for recovery, and responds to the coordinator with a "yes" vote signifying it is ready to commit; otherwise, it responds with a "no" vote, indicating an abort is required due to failure in local execution.[1] This phase exhibits a blocking characteristic, as participants retain locks on resources from the moment they receive the prepare message and enter the prepared state until they receive the final decision from the coordinator, potentially stalling concurrent transactions that require those resources. To mitigate indefinite waits, participants implement timeout mechanisms; if no decision message arrives from the coordinator within a predefined period after sending their vote, the participant assumes an abort and rolls back the transaction to release resources. The coordinator's logic in this phase can be outlined as follows:[1]Coordinator Voting Phase: 1. Send prepare message to all participants. 2. Wait for responses from all participants (with timeout handling). 3. If all responses are "yes": Decide to commit (proceed to decision phase). 4. If any response is "no": Decide to abort (proceed to decision phase). 5. Log the decision in stable storage.Coordinator Voting Phase: 1. Send prepare message to all participants. 2. Wait for responses from all participants (with timeout handling). 3. If all responses are "yes": Decide to commit (proceed to decision phase). 4. If any response is "no": Decide to abort (proceed to decision phase). 5. Log the decision in stable storage.
Decision Phase
In the decision phase of the two-phase commit protocol, the coordinator aggregates the responses received from all participants during the preceding voting phase. If every participant has indicated readiness to commit by responding affirmatively, the coordinator decides to commit the transaction; otherwise, if any participant has voted to abort, the coordinator decides to abort. This binary decision mechanism ensures that the outcome is unanimous across the distributed system.[1] Upon reaching its decision, the coordinator records the global outcome—either commit or abort—in its stable storage log to guarantee durability and support recovery in case of failures. It then broadcasts the corresponding message ("commit" or "abort") to all participants over the network. Participants, upon receiving the message, execute the instructed action: for a commit, they make the transaction's updates permanent by writing them to stable storage and release any held locks or resources; for an abort, they roll back the changes and release resources. To confirm completion, each participant acknowledges the message back to the coordinator only after logging the outcome in its own stable storage, ensuring the decision is persisted locally before proceeding.[1][5] This phase enforces the all-or-nothing property of atomic transactions by centralizing the final decision at the coordinator and requiring explicit propagation and acknowledgment from all participants. No partial commits are possible, as the protocol blocks until consensus is achieved or failure is detected, thereby maintaining consistency across all involved sites even under partial network partitions or site failures. The coordinator retains its log of the global decision to resolve any uncertainties during subsequent recovery protocols.[1]Participant States and Transitions
In the two-phase commit protocol, participants (resource managers) operate according to a finite state machine that ensures atomicity by coordinating state changes across distributed sites. The key states for a participant are Active, Prepared, Committed, and Aborted, each representing distinct stages of transaction readiness and durability.[1] The Active state is the initial condition, where the participant processes the transaction locally but has not yet received a prepare request from the coordinator; resources may be temporarily held, but the participant retains the ability to unilaterally abort.[1] Upon receiving the prepare message, the participant evaluates local commit feasibility, locks resources if possible, and logs a prepare record; a successful evaluation triggers a transition to the Prepared state, where the participant votes "yes" and waits for the coordinator's decision, with resources now locked against further changes.[1] The Prepared state is semi-committed, as the participant has abdicated unilateral abort rights but requires external coordination to finalize.[7] From the Prepared state, the participant transitions to Committed upon receiving a commit decision, at which point local changes are made durable by writing to stable storage and releasing locks; this state is terminal and irreversible.[1] Alternatively, an abort decision leads to the Aborted state, where partial effects are rolled back, locks released, and the transaction is undone durably.[1] The Aborted state can also be entered directly from Active if the participant votes "no" during preparation or encounters a local failure.[1] The coordinator (transaction manager) maintains its own simplified state machine, starting in an Idle state before initiating the protocol, transitioning to Inquire (or Preparing) upon sending prepare messages and awaiting votes, then to Wait (or Decided) after collecting responses, and finally to Commit or Abort to broadcast the outcome.[1] State transitions are designed for idempotency, allowing safe retries of messages (e.g., duplicate prepare or commit requests) without altering an already-finalized state, which supports recovery from communication failures.[7] The logical flow of the finite state machine can be summarized as follows:| From State | Trigger | To State | Action Performed |
|---|---|---|---|
| Active | Prepare message received | Prepared | Lock resources, log prepare, vote yes |
| Active | Local failure or no vote | Aborted | Rollback changes, release locks |
| Prepared | Commit message received | Committed | Make changes durable, release locks |
| Prepared | Abort message received | Aborted | Rollback changes, release locks |