Database transaction schedule
In database management systems, a transaction schedule, also known as a history, is a sequence that specifies the chronological order in which operations from multiple concurrent transactions are executed, preserving the relative order of operations within each individual transaction.[1] This abstraction models the interleaving of actions such as reads and writes across transactions to manage concurrency while aiming to maintain database consistency.[2]
Schedules are classified into serial and non-serial types; a serial schedule executes transactions one after another without interleaving, ensuring equivalence to a specific execution order, whereas non-serial schedules allow operations from different transactions to intermingle, potentially introducing conflicts that must be resolved.[3] For correctness, schedules are evaluated for serializability, meaning their outcome must be equivalent to some serial schedule in terms of the final database state, which can be assessed through conflict serializability (based on the order of conflicting operations like reads and writes on the same data item) or view serializability (preserving reads and final writes).[4] Conflict serializability, a stricter and more commonly enforced criterion, uses a precedence graph to detect cycles that would indicate non-serializability.[1]
Beyond serializability, schedules must also satisfy recoverability properties to handle failures; a recoverable schedule ensures that if one transaction reads data written by another, the writing transaction commits before the reading one, preventing cascading aborts and maintaining the ACID (Atomicity, Consistency, Isolation, Durability) guarantees essential for reliable database operations.[2] Concurrency control mechanisms, such as locking and timestamp ordering, enforce these properties in schedules to balance performance gains from parallelism with the prevention of anomalies like dirty reads or lost updates.[4]
Basic Concepts
Definition and Motivation
In database systems, a transaction schedule, often referred to as a history, serves as an abstract model that captures the interleaved sequence of operations from multiple transactions, including reads (r), writes (w), commits (c), and aborts (a), as they execute on shared data. This representation abstracts the real-time execution order imposed by the system's scheduler, enabling analysis of how concurrent activities affect database state without simulating the full runtime environment.[5][6]
Transactions themselves are defined as logical units of work that transform the database from one consistent state to another, adhering to the ACID properties—Atomicity (all operations succeed or none do), Consistency (data integrity rules are preserved), Isolation (concurrent transactions appear to execute independently), and Durability (committed changes persist despite failures).[5] The motivation for modeling schedules stems from the demands of modern database management systems (DBMS), which permit concurrent transaction execution to maximize throughput and resource utilization in multi-user environments. Without controlled interleaving, however, such concurrency risks anomalies like dirty reads or lost updates, potentially violating consistency and isolation; schedules thus underpin concurrency control strategies to mitigate these issues while preserving ACID guarantees.[5]
The formal study of transaction schedules emerged in the 1970s amid early DBMS research focused on concurrency control, driven by the need to support shared data access in emerging relational systems. Pioneering efforts, such as the IBM System R project in 1976, introduced practical locking mechanisms to manage multi-user operations, laying the groundwork for schedule-based correctness criteria like serializability. This historical development addressed the limitations of serial execution—where transactions run sequentially without overlap, ensuring correctness but at the cost of low performance—in favor of interleaved schedules that balance efficiency and reliability.[5]
Notation and Representation
In database transaction schedules, the standard notation for individual operations is as follows: r_i(X) denotes that transaction T_i reads data item X, while w_i(X) indicates that T_i writes to X.[1] Additionally, c_i represents the commit operation of T_i, which finalizes its changes, and a_i signifies the abort operation, which rolls back any uncommitted changes.[7] This symbolic representation, rooted in foundational concurrency control theory, allows for precise modeling of transaction behaviors without specifying implementation details.
A schedule S is represented as an ordered sequence of these operations from one or more transactions, where the relative order of operations within each individual transaction is preserved, but operations from different transactions may interleave.[1] For instance, a simple schedule might appear as r_1(A) w_1(A) r_2(A) w_2(A), illustrating the interleaving of read and write actions on shared data item A by transactions T_1 and T_2.[8] This linear sequence captures the chronological execution order in a concurrent environment, facilitating analysis of properties like serializability.
Schedules are typically assumed to be complete, encompassing all operations from participating transactions up to their commit or abort points, ensuring a full view of execution outcomes.[7] In contrast, partial schedules may terminate before all transactions complete, often used in theoretical analysis or recovery scenarios to examine intermediate states.[1] These assumptions align with the definition of transactions as atomic units of work, enabling consistent evaluation of concurrency effects.
While the operation-level notation forms the core representation, variations exist in specialized contexts; for example, timestamp-based approaches augment operations with timestamps like r_i(X, ts_i) to enforce ordering, and lock-based methods incorporate acquire (l_i(X)) and release (u_i(X)) symbols for synchronization. However, the primary focus remains on the basic read-write-commit-abort primitives for general schedule analysis.[9]
Illustrative Example
Consider two simple transactions, T1 and T2, that access disjoint data items to illustrate a basic concurrent schedule. Transaction T1 reads and then writes to data item A, followed by its commit, expressed as: T1: r1(A) w1(A) c1. Similarly, T2 reads and writes to data item B, then commits: T2: r2(B) w2(B) c2. Here, r_i(X) denotes transaction T_i reading item X, w_i(X) denotes writing to X, and c_i denotes the commit operation.[1]
An example interleaved schedule S combining these transactions is:
S: r1(A) r2(B) w1(A) w2(B) c1 c2
S: r1(A) r2(B) w1(A) w2(B) c1 c2
This sequence proceeds step by step as follows: T1 begins by reading A, after which T2 reads B; T1 then writes its updated value to A, followed by T2 writing to B; finally, T1 commits, and T2 commits. The notation aligns with standard conventions for representing transaction actions in database systems.[1]
This example serves to demonstrate how operations from multiple transactions can interleave along a shared execution timeline, allowing concurrent processing while each transaction maintains its internal order.[1]
Schedule Types
Serial Schedules
A serial schedule is one in which the operations from a set of transactions are executed sequentially, with all actions of one transaction completing before any action of the next transaction begins, resulting in no interleaving between transactions. This execution order corresponds to a total ordering of the transactions, such as T1 followed by T2 followed by T3, denoted simply as T1 T2 T3.
Serial schedules inherently preserve database consistency, as each transaction operates on a consistent database state produced by the previous transaction and maintains consistency through its own execution, assuming individual transactions are correct. They are serializable by definition, since they are themselves serial executions, and recoverable, because commits occur only after all prior transactions have fully committed, eliminating the possibility of dirty reads.[10] However, serial schedules offer no concurrency benefits, as transactions do not overlap in time, thereby avoiding anomalies from interference but also forgoing opportunities for improved throughput.[11]
One key advantage of serial schedules is their simplicity in implementation and verification; they require no complex concurrency control mechanisms and serve as the foundational baseline for assessing the correctness of more efficient concurrent schedules.[3]
For illustration, consider two transactions where T1 reads and writes data item A (r1(A) w1(A) c1) and T2 does the same (r2(A) w2(A) c2). The serial schedule T1 → T2 executes as r1(A) w1(A) c1 r2(A) w2(A) c2, allowing T2 to read the value written by T1. In contrast, the serial schedule T2 → T1 executes as r2(A) w2(A) c2 r1(A) w1(A) c1, where T1 reads T2's value instead; both orders yield consistent results but different final database states depending on the serialization order.
Concurrent Schedules
In database management systems, concurrent schedules, also referred to as non-serial schedules, permit the interleaving of operations from multiple transactions, enabling parallel execution to exploit system resources more effectively than sequential processing. This interleaving specifies the chronological order of actions such as reads and writes across transactions, allowing, for instance, a read operation from one transaction to occur between the read and write operations of another, as in the sequence r1(A) r2(B) w1(A) w2(B).[1][12]
The primary benefit of concurrent schedules lies in their ability to enhance system throughput and resource utilization in multi-user environments, where multiple transactions can proceed simultaneously, reducing average response times by minimizing idle periods for processors and disks.[1] For example, short transactions can complete without being blocked by longer ones, leading to higher overall efficiency in database systems handling diverse workloads.[12] However, without appropriate management mechanisms like concurrency control protocols, concurrent schedules carry the risk of producing inconsistent database states due to overlapping accesses.[1]
All possible schedules in a database system fall into one of two categories: serial or concurrent, with concurrent schedules encompassing any execution that deviates from strict sequential transaction ordering.[12] In contrast to serial schedules, which process transactions one at a time to inherently avoid interference, concurrent schedules prioritize parallelism but require validation to ensure equivalence to some serial execution for correctness.[1]
Schedule Structure
Action Ordering and Duration
In database transaction schedules, actions are sequenced according to a total order that encompasses all operations from multiple transactions, ensuring a complete and linear execution sequence across the system. This total order interleaves the read (r) and write (w) operations from different transactions while adhering to the predefined partial order within each individual transaction, where operations must execute in their specified sequence to maintain logical consistency. For instance, if a transaction T_i includes r_i(X) followed by w_i(Y), this intra-transaction order is preserved in the schedule, preventing any reversal that could violate the transaction's semantics.[13]
The duration of individual operations, such as reads and writes, is modeled as instantaneous in theoretical analyses of schedules, simplifying the evaluation of concurrency effects by assuming atomic execution at a single point in time. In practice, however, operations may involve non-negligible time due to I/O delays or processing overhead, though abstract models abstract away these durations to focus on logical ordering rather than physical timing. Transaction boundaries are demarcated by commit (c_i) or abort (a_i) operations, which are not instantaneous but serve as endpoints: all read and write actions of a transaction must precede its commit or abort to enforce atomicity, with commits making changes permanently visible and aborts triggering rollbacks.[13]
These ordering constraints allow for inter-transaction interleaving, enabling concurrent execution to enhance system throughput, but they impose that no operation from one transaction violates the preserved order of another. The resulting schedule's structure influences the visibility of transaction effects, as uncommitted changes remain isolated until a commit occurs, thereby controlling when data modifications become accessible to other transactions.[13]
Conflicting Operations
In database transaction schedules, two operations are said to conflict if they belong to different transactions, operate on the same data item, and at least one of them is a write operation. This definition ensures that the relative order of such operations can influence either the final database state or the values returned to the transactions, distinguishing conflicts from non-conflicting pairs like two reads on the same item.[14] Read-read operations do not conflict, as swapping their order produces identical results regardless of execution sequence.[3]
Conflicting operations are categorized into three types based on the nature of the reads and writes involved: read-write (RW), write-read (WR), and write-write (WW).[15] An RW conflict occurs between a read by one transaction (r_i(X)) and a write by another (w_j(X)) on data item X, where swapping them could cause the reading transaction to observe an outdated value.[3] A WR conflict arises from a write followed by a read on the same item (w_i(X) and r_j(X)), potentially allowing the reading transaction to see an uncommitted or intermediate value from the writing transaction.[15] Finally, a WW conflict involves two writes on the same item (w_i(X) and w_j(X)), where the order determines which update is overwritten, leading to potential loss of one transaction's changes.[3]
The presence of conflicting operations in a schedule restricts permissible reorderings, as altering their order may yield non-equivalent outcomes compared to a serial execution. For instance, consider a schedule S consisting of w_1(A) followed by r_2(A): this WR conflict ensures that transaction T_2 reads the value written by T_1, and reversing the order would instead have T_2 read the prior value of A, altering the result.[14] Such conflicts thus form the basis for analyzing schedule correctness by preserving necessary operation orders to mimic serial behavior.[3]
Serializability
Conflict Equivalence
Conflict equivalence is a fundamental concept in database concurrency control that determines when two schedules of transactions produce the same effect on the database state despite potentially different interleavings of operations. Two schedules S and S' are conflict equivalent if they involve the same set of operations from the same transactions and preserve the relative order of all pairs of conflicting operations.[16] Conflicting operations are those from different transactions that access the same data item, with at least one being a write operation.[16]
Formally, S and S' are conflict equivalent, denoted S \approx_c S', if:
- The sets of operations in S and S' are identical (op(S) = op(S')).
- For every pair of conflicting operations o_i and o_j (where o_i precedes o_j in S), o_i also precedes o_j in S', and vice versa. This condition ensures that the precedence graph of conflicts remains unchanged between the schedules.[16]
The key implication of conflict equivalence is that both schedules, when executed on the same initial database state, yield identical final states and intermediate reads for transactions, preserving the logical consistency of the database without requiring identical execution sequences.[16] This equivalence class allows for optimizations in scheduling while maintaining correctness, as non-conflicting operations can be reordered freely without altering the outcome.
For example, consider two transactions T_1 and T_2 where T_1 reads data item A and writes B, and T_2 reads B and writes C. A schedule S: r_1(A) w_1(B) r_2(B) w_2(C) and S': r_1(A) r_2(B) w_1(B) w_2(C) are not conflict equivalent because the conflicting pair w_1(B) and r_2(B) changes order. However, swapping non-conflicting operations like r_1(A) and w_2(C) in a schedule does not affect equivalence, as they do not conflict.[16]
Conflict-Serializable Schedules
A schedule S is defined as conflict-serializable if it is conflict-equivalent to some serial schedule, meaning that the relative order of all conflicting operations in S matches that in a serial execution of the transactions.[16] Conflict equivalence requires that two schedules involve the same individual actions on the same data items and preserve the precedence of every pair of conflicting actions, where conflicting actions are those from different transactions that access the same data item and at least one of which is a write operation.
This equivalence allows a conflict-serializable schedule to be transformed into an equivalent serial schedule by repeatedly swapping adjacent non-conflicting operations, thereby interleaving concurrent transactions without altering the outcome regarding data consistency.[16] Such schedules ensure that the database remains in a consistent state after execution, as the final database state matches that of a serial execution where transactions run one after another without overlap.
For instance, consider a schedule where transactions T_1 and T_2 execute operations such that T_1's write on item A precedes T_2's read on A, and no conflicting operations create cycles in their precedence relations; this schedule is conflict-serializable, equivalent to executing T_1 fully before T_2.[16]
However, conflict-serializable schedules impose restrictions that may prevent some concurrent executions that preserve final consistency, particularly those involving blind writes—writes to a data item without a prior read by the same transaction—which can lead to unnecessary serialization even when the outcomes are equivalent.
View Equivalence
View equivalence provides a semantic criterion for comparing database schedules based on the observable effects they produce, specifically the values read by transactions and the final database state, rather than the precise order of operations. This notion was formalized to capture a broader class of correct concurrent executions than those defined by syntactic conflict-based criteria. Two schedules S and S' over the same set of transactions are view equivalent if they satisfy the following three conditions:
- Same reads-from relationship: For each read operation r_i(X) in S, the write operation w_j(X) that provides the value for that read is the same transaction T_j in S' as in S. This includes initial reads from the database state before any transaction writes.
- Same initial writes: For each data item X, if a transaction performs the initial write w_k(X) in S (i.e., the first write to X), then the same transaction performs it in S'.
- Same final writes: For each data item X, the transaction that performs the final write w_m(X) in S (i.e., the last write to X) is the same in S'.[17]
These conditions ensure that S and S' preserve the same "view" of the database state as seen by the transactions, meaning each transaction reads the same values and the database ends in the same state, regardless of the initial database contents or transaction semantics. The reads-from relationship formally models data flow dependencies, focusing on which writes influence subsequent reads.[17][18]
Unlike conflict equivalence, which enforces orderings based on all pairs of conflicting operations (such as read-write, write-read, or write-write on the same item by different transactions), view equivalence permits reordering of non-conflicting writes that do not alter the reads-from relations or final state—for instance, blind writes, where a transaction writes a value that is overwritten before being read by any other transaction. This makes view equivalence a looser criterion that allows more schedules while still guaranteeing equivalent observable behavior.[17]
For example, consider schedule S:
- T_1: r_1(A) w_1(B)
- T_2: r_2(B) w_2(A) w_2(B)
- T_3: r_3(A) w_3(B)
Here, r_2(B) reads the initial value (from T_0), r_1(A) and r_3(A) read from T_2's write to A, the final write to A is by T_2, and to B by T_3.
Now consider the serial schedule S': T_2 then T_1 then T_3. In S', T_2 writes A and B first; T_1 reads A from T_2 and writes B (overwritten later); T_3 reads A from T_2 and writes B. The reads-from relations, initial writes, and final writes match those in S, so S and S' are view equivalent, even though S involves interleaving that might introduce conflicts not present in S'.[17]
View-Serializable Schedules
A schedule S is view-serializable if it is view-equivalent to some serial schedule, meaning that the reads in S obtain the same values from the same writes as in the serial schedule, and the final writes in S match those in the serial schedule.[16]
View-serializable schedules preserve database consistency by ensuring an outcome equivalent to some sequential execution of transactions, but testing for view serializability is NP-complete in general.[16] All conflict-serializable schedules are view-serializable, but the converse does not hold; view serializability permits additional schedules, particularly those involving blind writes where a transaction updates a data item without first reading its value.[19]
For example, consider transactions T_1: r_1(A) w_1(B); T_2: r_2(B) w_2(A) w_2(B); T_3: r_3(A) w_3(B), executed in the interleaved schedule S: r_2(B) w_2(A) r_1(A) w_1(B) w_2(B) r_3(A) w_3(B). This schedule is view-equivalent to the serial schedule T_2 then T_1 then T_3, as r_2(B) and r_1(A) read initial and T_2's write respectively, r_3(A) reads from T_2, and final writes are T_2 for A and T_3 for B. However, it is not conflict-serializable, as the precedence graph contains a cycle: T_2 \to T_1 (from w_2(A) before r_1(A)) and T_1 \to T_2 (from w_1(B) before w_2(B)).[17]
In practice, most database management systems target conflict serializability as a sufficient and efficiently testable approximation to view serializability.[20]
Recoverability
Recoverable Schedules
A recoverable schedule ensures that if transaction T_j reads a value written by transaction T_i, then the commit of T_i (denoted c_i) precedes the commit of T_j (denoted c_j). This condition, known as the commit dependency, guarantees that no transaction commits until all transactions whose writes it depends on have also committed. The concept was formalized in foundational work on concurrency control and recovery.[21][22]
The primary purpose of recoverable schedules is to maintain database consistency in the presence of failures, such as transaction aborts, by preventing the propagation of uncommitted changes into the committed state. This avoids situations where a committed transaction must be retroactively undone due to a failure in a preceding transaction it depended on, thereby limiting cascading aborts and ensuring that the database can always be restored to a consistent state reflecting only committed transactions. Without recoverability, failures could lead to irrecoverable inconsistencies, violating the durability and isolation properties of ACID transactions.[21][23]
For example, consider the schedule S: w_1(A) \, c_1 \, r_2(A) \, c_2, where T_1 writes to data item A and commits before T_2 reads from A and commits. This schedule is recoverable because T_2 reads only committed data from T_1. In contrast, the schedule S': w_1(A) \, r_2(A) \, c_2 \, a_1, where T_2 reads from T_1 before T_1 commits and T_1 later aborts, is not recoverable: T_2's commit incorporates uncommitted (and ultimately discarded) changes from T_1, potentially leaving the database in an inconsistent state that cannot be undone without rolling back T_2.[22][23]
Recoverability addresses failure handling and is orthogonal to serializability, which concerns logical equivalence to serial executions; a schedule may be serializable yet non-recoverable if commit order violates dependencies, or recoverable but non-serializable if it permits certain conflicts.[21][23]
Cascadeless Schedules
A cascadeless schedule, also known as an ACA (avoid cascading aborts) schedule, is defined as a transaction schedule in which every read operation accesses only data values written by committed transactions, thereby prohibiting dirty reads where uncommitted changes are visible to other transactions.[24] This property ensures that if a transaction Ti writes a data item X and another transaction Tj reads X, the commit (or abort) of Ti precedes the read by Tj.[24] As a result, cascadeless schedules form a strict subset of recoverable schedules, inheriting recoverability while imposing additional constraints to prevent propagation of failures.[24]
The mechanism underlying cascadeless schedules restricts read access to data items until the transaction responsible for the most recent write has committed, effectively isolating uncommitted updates from concurrent readers.[24] This read-from rule eliminates the risk of a single transaction abort triggering a chain of aborts in dependent transactions, as no transaction can depend on uncommitted state.[24] By enforcing this discipline at the schedule level, database systems can maintain consistency without requiring complex rollback propagation during recovery.[24]
For illustration, consider a simple schedule involving transactions T1 and T2 operating on data item A. In a non-cascadeless schedule, the sequence might be w1(A) followed immediately by r2(A), with T1 aborting afterward; this forces T2 to abort as well to undo the dirty read, causing a cascading abort.[25] In contrast, a cascadeless schedule delays r2(A) until after T1's commit (c1), so if T1 aborts, T2 simply reads the prior committed value of A without needing to roll back, avoiding any cascade.[25] This example highlights how the absence of intermediate reads from uncommitted writes preserves independence between transactions.
The primary benefits of cascadeless schedules include reduced recovery overhead, as aborts are localized to the failing transaction without propagating to readers, and enhanced system reliability by minimizing the scope of failure impacts.[24] These schedules simplify failure handling in high-concurrency environments, where cascading aborts could otherwise amplify downtime and resource consumption.[24] Overall, they provide a practical balance between concurrency and recoverability, supporting efficient transaction processing in database systems.[24]
Strict Schedules
A strict schedule is a type of database transaction schedule that ensures the highest level of recoverability by prohibiting any transaction from reading or writing a data item until the transaction that last wrote that item has either committed or aborted. This restriction applies to both read and write operations on the affected data item, preventing any form of interference with uncommitted changes. Formally, in a schedule S involving transactions T_i and T_j, if T_i performs a write operation w_i(X) on data item X, then no subsequent operation r_j(X) or w_j(X) from T_j can appear in S until after the commit c_i or abort a_i of T_i.[21]
Strict schedules possess key properties that make them desirable for database systems. They inherently imply both cascadeless and recoverable schedules, as the delay in accessing written values eliminates the possibility of cascading aborts and ensures that committed transactions reflect only finalized states. Additionally, strict schedules are straightforward to achieve using locking mechanisms, such as strict two-phase locking, where exclusive locks on written items are held until the transaction commits or aborts, thereby guaranteeing the required ordering without complex additional checks.[21]
To illustrate, consider two transactions T_1 and T_2 where T_1 writes to data item A via w_1(A). In a strict schedule, this write must precede any read r_2(A) or write w_2(A) from T_2 only after T_1's commit c_1 or abort a_1 has occurred; otherwise, the schedule violates strictness. For example, the sequence w_1(A) r_2(A) c_1 would be non-strict because T_2 reads the uncommitted value from T_1.[21]
In practice, strict schedules are widely adopted in database management systems due to their simplicity in supporting recovery processes, such as deferred update or simple rollback, where uncommitted writes can be safely ignored or undone without affecting other transactions. This approach minimizes recovery complexity in failure scenarios, making it a standard choice for ensuring data integrity in concurrent environments.[21]
Relationships and Analysis
Hierarchy of Schedule Classes
In database transaction scheduling, the classes of schedules form a hierarchy based on their guarantees for consistency and recovery, allowing designers to balance concurrency with correctness. Within serializability, serial schedules form the strictest subset, where transactions execute non-concurrently in some order, ensuring equivalence to a sequential execution.[24] Conflict-serializable schedules broaden this set, encompassing those whose precedence graphs are acyclic and thus equivalent to some serial order via conflict-preserving rearrangements, with serial schedules as a proper subset.[24] View-serializable schedules extend further, including any schedule equivalent to a serial one in terms of read-from relationships and final writes, such that serial ⊆ conflict-serializable ⊆ view-serializable.[24]
For recoverability, the hierarchy reverses in strictness: recoverable schedules are the broadest class, requiring that if a transaction reads from another, the latter commits before the former does, preventing dirty reads from causing inconsistencies upon abort.[26] Cascadeless schedules form a stricter subset, avoiding cascading aborts by ensuring reads occur only from committed transactions, so recoverable ⊇ cascadeless.[26] Strict schedules impose the tightest constraints, prohibiting reads or writes to data modified by an uncommitted transaction, yielding recoverable ⊇ cascadeless ⊇ strict.[26] Notably, all serial schedules are strict assuming no premature aborts, as non-overlapping executions inherently avoid uncommitted accesses.[26]
These hierarchies intersect such that full correctness typically demands a schedule that is both serializable (often conflict-serializable for practicality) and recoverable (frequently strict to simplify recovery), as the serializability classes do not inherently ensure recoverability properties. For instance, conflict-serializable + recoverable serves as a common target in systems, balancing assurance with implementation feasibility. Stricter classes like strict schedules reduce concurrency by delaying operations but ease implementation through simple locking rules, whereas view-serializable schedules permit higher throughput yet complicate verification due to intricate equivalence checks.[24][26]
Conceptually, these classes exhibit Venn-like overlaps without full nesting across categories: every strict schedule is recoverable and potentially serializable if conflicts align, but not all view-serializable schedules are recoverable, as some may involve uncommitted reads that cascade failures. This structure highlights trade-offs, where relaxing to view-serializability or recoverable schedules increases permissible interleavings but risks anomalies unless complemented by additional mechanisms.
Testing Serializability
Testing serializability of a database schedule involves determining whether the interleaved operations of concurrent transactions produce the same effect as some serial execution, ensuring consistency. Practical methods focus on subclasses like conflict and view serializability, as general serializability testing is computationally intensive. These tests are crucial for schedulers in database management systems to validate histories without simulating all possible serial orders.
The standard test for conflict serializability constructs a precedence graph, also known as a serialization graph or conflict graph. In this directed graph, each node represents a transaction, and a directed edge is added from transaction T_i to T_j (denoted T_i \to T_j) if there is a conflicting pair of operations where an operation in T_i precedes a conflicting operation in T_j on the same data item; conflicting operations are read-write, write-read, or write-write pairs. A schedule is conflict serializable if and only if this graph is acyclic, as cycles indicate unavoidable conflicts that prevent reordering into a serial form.[6][27]
To identify a valid serial order when the graph is acyclic, a topological sort can be performed on the precedence graph. This algorithm processes nodes in an order respecting all edges, yielding a serialization of transactions. The time complexity of constructing the graph and performing the topological sort is O(V + E), where V is the number of transactions (nodes) and E is the number of conflicting operation pairs (edges), making it efficient for typical database workloads with polynomial time.[27]
For example, consider the schedule S: r_1(A) w_2(A) r_2(B) w_1(B). The conflicting operations are r_1(A) and w_2(A) (read-write conflict on A, edge T_1 \to T_2) and r_2(B) and w_1(B) (read-write conflict on B, edge T_2 \to T_1). The resulting precedence graph has a cycle T_1 \to T_2 \to T_1, so S is not conflict serializable.[27]
Testing for view serializability is more involved, as it requires verifying that the schedule's reads-from relations (which transaction's write each read observes) and final writes on each data item match those of some serial schedule. This involves checking for an equivalent serial order that preserves the initial reads, final database state, and all intermediate reads-from dependencies, without regard to non-conflicting operation orders. Due to the need to enumerate potential serial permutations to match these relations, view serializability testing is NP-complete, even for simple transaction models.[6]
In practice, the complexity of view serializability testing leads database systems to favor conflict serializability, which is a sufficient (though stricter) condition for view serializability. One approximation or avoidance strategy involves detecting and preventing blind writes—writes to a data item without a prior read by the same transaction—as schedules without blind writes that are view serializable are also conflict serializable, simplifying validation. However, full view equivalence checks remain rare outside theoretical analysis due to their exponential worst-case time.[6][27]