Interval scheduling
Interval scheduling is a fundamental optimization problem in computer science and operations research, involving the selection of the maximum number of non-overlapping time intervals from a given set of intervals, each defined by a start time s_j and finish time f_j.[1] The problem arises in scenarios such as assigning jobs to a single machine or scheduling meetings in a conference room, where compatibility requires that no two selected intervals overlap (i.e., for any two selected intervals j and k, either f_j \leq s_k or f_k \leq s_j).[2] The unweighted version, which maximizes the number of selected intervals, is efficiently solved using a greedy algorithm that sorts the intervals in non-decreasing order of their finish times and iteratively selects the interval with the earliest finish time that does not conflict with previously chosen ones.[1] This algorithm achieves optimality, as proven by showing that any optimal solution can be transformed into the greedy solution without reducing the number of intervals, and it runs in O(n \log n) time due to the sorting step followed by a linear scan.[1][2]
A notable variant is weighted interval scheduling, where each interval has an associated weight w_j, and the goal shifts to maximizing the total weight of the selected non-overlapping intervals rather than their count.[3] This version requires dynamic programming for an optimal solution, computing the maximum value for each interval by considering whether to include it or follow the optimal subproblem ending before its start time, also in O(n \log n) time after sorting.[3] Interval scheduling problems find broad applications in resource allocation, such as hotel reservations, factory machine scheduling, and timetabling, where fixed time windows must be respected to minimize conflicts or machines used.[4] Extensions include multi-machine scheduling and constraints on flexible availabilities, which introduce additional complexity but build on the core greedy and dynamic programming techniques.[5]
Fundamentals
Problem Definition
Interval scheduling is a fundamental optimization problem in algorithm design and combinatorial optimization, involving the selection of tasks or jobs to be performed on a single resource without conflicts. Each job is represented as an interval defined by a start time s_i and a finish time f_i, where s_i < f_i, typically assuming real-valued or integer times. The core objective is to select the largest possible subset of these intervals such that no two selected intervals overlap, thereby maximizing the utilization of the resource.[1]
Non-overlap is ensured by the constraint that for any two selected intervals i and j, either f_i \leq s_j or f_j \leq s_i. This setup models scenarios such as scheduling meetings in a single room or allocating time slots to non-conflicting activities on one machine.[1]
The problem has two primary variants. In the unweighted case, the goal is to maximize the cardinality, or number, of selected intervals. In the weighted case, each interval i is associated with a positive weight w_i (representing value, profit, or priority), and the objective shifts to maximizing the total sum of weights \sum w_i over the selected non-overlapping intervals.[3]
The roots of interval scheduling trace back to early scheduling problems in operations research, with seminal formulations emerging in the 1950s through works like Dantzig and Fulkerson's tanker scheduling, and further developed in algorithms literature by the 1970s; it was formalized in complexity-theoretic terms by Garey and Johnson in their 1979 guide to NP-completeness.[6]
Key Terminology and Assumptions
In interval scheduling, each job or activity is modeled as an interval on the real line, typically denoted as [s_i, f_i) where s_i is the start time and f_i is the finish time, with s_i < f_i. Two intervals are compatible if they do not overlap, i.e., their intersection is empty (they are disjoint), and thus can be scheduled without conflict on the same resource. A feasible schedule consists of a set of mutually compatible intervals that can be selected together without any overlaps.[6][1]
Key assumptions underpin the problem formulation: intervals are often presorted by non-decreasing finish times (f_1 \leq f_2 \leq \cdots \leq f_n) to facilitate algorithmic processing, though this is not inherent to the input. Time is treated as continuous, allowing start and finish times to be any real numbers, and there is no preemption—each interval must run contiguously from start to finish or not at all if not selected. Let n denote the number of intervals, and for each interval i, the processing time is p_i = f_i - s_i, though this duration is not always central to the objective.[1][6]
Several edge cases arise in practice. If the input set of intervals is empty, the optimal feasible schedule is also empty, yielding zero selected intervals. When all intervals overlap with one another, at most one can be included in any feasible schedule; in the weighted variant, this would be the interval with the maximum weight. For identical intervals sharing the exact same start and finish times, they fully overlap and thus at most one can be selected, as compatibility requires disjointness.[1][6]
Single-Resource Maximization
Unweighted Case
In the unweighted case of single-resource interval scheduling, the objective is to select the maximum number of non-overlapping intervals from a given set to maximize cardinality.[1]
The standard approach employs a greedy algorithm that sorts the intervals by their finish times and iteratively selects the one with the earliest finish time that does not overlap with previously selected intervals.[1] Specifically, the algorithm proceeds as follows:
- Sort the n intervals in non-decreasing order of finish times, denoted as f_1 \leq f_2 \leq \cdots \leq f_n.
- Initialize an empty set S to store the selected intervals.
- Select the first interval (with the earliest finish time) and add it to S.
- For each subsequent interval i, if its start time s_i is at or after the finish time of the last interval in S, add it to S.
- Repeat until all intervals are considered.
This procedure can be expressed in pseudocode as:
GREEDY-INTERVAL-SELECTOR(intervals):
sort intervals by increasing finish time
S = empty set
last_finish = -infinity
for each interval i in sorted order:
if start_time(i) >= last_finish:
add i to S
last_finish = finish_time(i)
return S
GREEDY-INTERVAL-SELECTOR(intervals):
sort intervals by increasing finish time
S = empty set
last_finish = -infinity
for each interval i in sorted order:
if start_time(i) >= last_finish:
add i to S
last_finish = finish_time(i)
return S
The algorithm runs in O(n \log n) time, dominated by the initial sorting step, with subsequent compatibility checks taking O(n) time.[1]
The optimality of this greedy choice is established via an exchange argument. Consider any optimal solution O with m intervals. Let G be the greedy solution, and suppose |G| < m. Align the first r intervals where G and O match, with r maximal. The next interval in O, say o_{r+1}, finishes no earlier than the greedy's next choice g_{r+1}. Replacing o_{r+1} with g_{r+1} in O yields a new feasible solution of size at least m that matches G for the first r+1 intervals, contradicting the maximality of r. Thus, |G| = m.[1]
To illustrate the importance of sorting by finish time, consider the following set of four intervals: A ([0,5]), B ([1,3]), C ([3.5,4]), and D ([2,6]). Sorting by earliest finish time yields B, C, A, D. The greedy algorithm selects B (finish 3) and then C (start 3.5 ≥ 3), yielding two non-overlapping intervals. In contrast, sorting by earliest start time and greedily selecting yields A first (start 0), after which no other interval is compatible, yielding only one interval.[1]
Weighted Case
In the weighted case of single-resource interval scheduling, each interval j is associated with a positive weight w_j in addition to its start time s_j and finish time f_j, and the objective is to select a compatible subset that maximizes the total weight \sum w_j.[7] Unlike the unweighted case, where a greedy algorithm suffices, the weighted variant requires dynamic programming to achieve optimality, as simply selecting the maximum number of intervals may overlook higher-weight combinations.[7]
The dynamic programming formulation assumes the n intervals are sorted in non-decreasing order of finish times, so f_1 \leq f_2 \leq \cdots \leq f_n. Define OPT(j) as the maximum total weight achievable using a compatible subset from the first j intervals. The base case is OPT(0) = 0. For j \geq 1, the recurrence is
OPT(j) = \max \left( OPT(j-1),\ w_j + OPT(p(j)) \right),
where p(j) is the largest index i < j such that interval i is compatible with j (i.e., f_i \leq s_j); if no such i exists, p(j) = 0. This captures the choice of either excluding interval j or including it and appending the optimal solution up to its latest compatible predecessor.[7]
To compute p(j) efficiently, preprocess the sorted intervals and use binary search on the finish times for each query, requiring O(\log n) time per p(j) and O(n \log n) total preprocessing and computation time. The DP table is then filled in O(n) time iteratively from j=1 to n, yielding an overall time complexity of O(n \log n). A naive implementation without binary search, using linear scans to find p(j), runs in O(n^2) time.[7]
To reconstruct the optimal solution after computing the OPT table, backtrack from j = n: if OPT(j) = w_j + OPT(p(j)), include interval j and recurse on p(j); otherwise, recurse on j-1. This traces the decisions that led to the maximum weight.[7]
As an illustrative example, consider five intervals sorted by finish time, with start times s = [0, 1, 2, 3, 0], finish times f = [1, 2, 3, 4, 5], and weights w = [10, 10, 10, 10, 50]. The compatible predecessors are p = [0, 1, 2, 3, 0]. The DP table is computed as follows:
| j | OPT(j-1) | w_j + OPT(p(j)) | OPT(j) |
|---|
| 1 | 0 | 10 + 0 = 10 | 10 |
| 2 | 10 | 10 + 10 = 20 | 20 |
| 3 | 20 | 10 + 20 = 30 | 30 |
| 4 | 30 | 10 + 30 = 40 | 40 |
| 5 | 40 | 50 + 0 = 50 | 50 |
The optimal solution selects the fifth interval alone, for total weight 50. In contrast, applying the unweighted greedy strategy (earliest finish time, disregarding weights) selects the first four intervals, yielding only 40—demonstrating why it fails here.[7]
Grouped Interval Scheduling
Decision Problem Complexity
The decision version of the grouped interval scheduling problem asks whether there exists a selection of exactly one interval from each group such that no two selected intervals overlap, thereby forming a feasible schedule that represents all groups.[8]
This problem is NP-complete when the maximum group size is at least 3. The proof relies on a reduction from the numerical three-dimensional matching problem, demonstrating that deciding feasibility becomes computationally intractable as group sizes increase beyond 2.[6] Specifically, Nakajima and Hakimi established the NP-completeness for cases where each group (or job) has 3 or more possible intervals, even under additional constraints like unit lengths.[8]
In contrast, the decision problem is solvable in polynomial time when all groups contain at most 2 intervals. This can be achieved by modeling the instance as a flow network, where groups are represented on one side and discretized time slots (derived from sorted endpoints of all intervals) on the other, with capacities ensuring at most one interval per group and no overlaps in time slots. The maximum flow in this network determines if a feasible assignment exists for all groups, computable using algorithms like Hopcroft-Karp for bipartite matching in O(n^{1.5}) time, where n is the number of intervals.[9][6]
The key dichotomy theorem states that the grouped interval scheduling decision problem is NP-complete if the maximum group size is at least 3 and polynomial-time solvable otherwise. This result, originating from foundational work in the 1980s and refined in subsequent analyses, highlights the threshold at group size 2 for tractability.[8][9]
Optimization Problem Hardness
The optimization version of grouped interval scheduling requires selecting a maximum-cardinality (unweighted) or maximum-weight set of non-overlapping intervals such that at most one interval is chosen from each group.[10]
This problem is NP-hard even in the unweighted case when each group contains at least two intervals. The proof follows from showing that the decision variant—determining whether at least k groups can be covered by selecting one non-overlapping interval per group—is NP-complete for maximum group size 2, via a reduction establishing equivalence to known NP-complete problems in scheduling.[11]
Furthermore, the problem is APX-hard (formerly known as MAX SNP-hard) when the maximum group size is at least 2, implying no polynomial-time approximation scheme exists unless P=NP. This hardness arises from an L-reduction from MAX-3-SAT-3.[11]
In the special case where all groups have size 1, the problem reduces to the classic single-resource interval scheduling maximization, which is solvable exactly in polynomial time: via a greedy algorithm for the unweighted case and dynamic programming for the weighted case, both in O(n \log n) time after sorting.[1]
Notably, a gap exists relative to the decision problem of whether a selection covering one interval from every group (without overlaps) is feasible: this covering decision is polynomial-time solvable when the maximum group size is at most 2 (via network flow or matching techniques), yet the optimization problem of maximizing the number of covered groups remains NP-hard in this setting.[6]
Approximation Techniques
Greedy and Polynomial Approximations
In grouped interval scheduling, a simple greedy algorithm provides a 2-approximation for the unweighted maximization problem. The algorithm proceeds by repeatedly selecting the compatible interval (non-overlapping with previously selected intervals and from an unused group) that finishes earliest among all remaining candidates; after selection, all intervals from that group are discarded. This extends the classic earliest-finish-time greedy strategy for single-resource interval scheduling by incorporating the group constraint to ensure at most one selection per group. The algorithm runs in O(n log n) time when implemented efficiently using a priority queue or by presorting intervals by finish time and scanning linearly.
The approximation guarantee is proven by partitioning the set of groups into those selected by the greedy algorithm (size |G|) and those not selected. The optimal solution can select at most |G| intervals from the selected groups (at most one per group) and at most another |G| from the unselected groups, since the unselected groups' feasible intervals lie in at most |G| disjoint time intervals defined by the gaps between the greedy selections, allowing at most one non-overlapping interval per gap. Thus, the optimal value is at most 2|G|, yielding the 2-approximation ratio. This charging argument directly adapts the proof for the single-resource case while accounting for the group restrictions. The bound is tight, as there exist instances where the greedy selects half the optimal number.
An improved approximation algorithm achieves a ratio arbitrarily close to e/(e-1) \approx 1.582 for the general unweighted case, using a combination of block partitioning of the timeline and LP rounding.[13]
For special cases where each group contains at most two intervals, the problem admits an exact polynomial-time solution via network flow modeling for maximum cardinality selection. The intervals can be represented as vertices in a bipartite graph, with jobs (groups) as edges connecting their two possible intervals, and additional structure from the interval ordering allows maximum matching computation in O(n^2) time using standard flow algorithms, ensuring non-overlapping selections. This solvability holds more broadly for the decision version of selecting a fixed number of intervals when the maximum intervals per group is two.
Linear Programming Methods
Linear programming provides a powerful framework for approximating the grouped interval scheduling problem, where intervals are partitioned into groups and at most one interval per group can be selected, subject to non-overlap constraints, to maximize the total weight.
The standard LP relaxation introduces variables x_i \in [0,1] for each interval i, representing the fractional selection of interval i. The objective is to maximize \sum_i w_i x_i, where w_i is the weight of interval i. Constraints ensure non-overlap by requiring \sum_{i \in C} x_i \leq 1 for each maximal clique C in the interval conflict graph (equivalently, for each time point covered by the intervals, though the clique formulation yields a polynomial-sized LP). Additionally, group constraints enforce \sum_{i \in g} x_i \leq 1 for each group g. This LP can be solved in polynomial time using interior-point methods or the ellipsoid algorithm, as the constraint matrix has a polynomial number of inequalities due to the structure of interval graphs.[14]
The integrality gap of this relaxation is at most 2, meaning the optimal LP value is at most twice the optimal integer solution, even with the added group constraints tightening the feasible region compared to the standard interval graph independent set LP (which has gap 1).[14] With groups, the gap remains bounded by 2, though it can be tighter for instances with low group densities (few intervals per group).
To obtain an integral solution, randomized rounding techniques can be applied to the LP solution. One approach selects, for each group independently, an interval i \in g with probability x_i / \sum_{j \in g} x_j (or simply x_i if the group sum is normalized), then discards conflicting selections greedily by earliest deadline or similar tie-breaking. This yields an expected value at least half the LP optimum, giving a 4-approximation overall due to the gap, but refinements like conditional probabilities to respect overlap constraints improve it to a constant factor. For the special case of equal weights per job (where weights are uniform within groups), an LP-based rounding achieves a 2-approximation by solving the relaxation and deterministically selecting non-conflicting intervals guided by fractional values.[14] Deterministic pipage rounding, which iteratively rounds fractional variables while preserving the objective under the multilinear extension, can also be adapted for partition matroid constraints (groups) combined with the interval structure, yielding O(1)-approximations such as 2.5 in cases with bounded interval densities.[15]
When the number of intervals per group is fixed (say, at most k), more advanced techniques yield better guarantees. For k=2, approximations of 4/3 are possible via LP relaxations over feasible block schedules followed by randomized rounding. Recent improvements include 1.5-approximations for low-density instances (where maximum overlap is bounded), using local search on partial solutions combined with LP guidance to swap intervals within groups for improved non-overlap satisfaction. For fixed k, constant-factor approximations exist, such as k^k / (k^k - (k-1)^k) + \epsilon, via iterative block partitioning and LP rounding.[16]
Classic Scheduling Variants
The activity selection problem is mathematically identical to the unweighted case of interval scheduling, in which the objective is to select the maximum number of mutually non-overlapping activities, each defined by a fixed time interval, to maximize resource utilization on a single processor. This classic problem is optimally solved using a greedy algorithm that sorts the activities by their finishing times and iteratively selects the one with the earliest finish time that does not conflict with previously chosen activities, achieving an optimal solution in O(n \log n) time due to sorting. The greedy strategy for this problem emerged in the mid-20th century alongside foundational work on greedy algorithms.[1][17]
In contrast, the weighted interval scheduling variant assigns a profit or weight to each interval and seeks to maximize the total weight of a non-overlapping subset, which can be efficiently computed using dynamic programming by considering intervals in sorted order and computing the maximum value up to each point. Notably, extensions like the job interval selection problem—where each job offers multiple possible execution intervals—render the weighted version NP-hard, with the 0-1 knapsack problem reducing to it as a special case when all job deadlines coincide.[6][18]
A related classic variant is deadline scheduling, where each job specifies a release time r_i (earliest start) and a deadline d_i (latest finish), often with unit processing time, transforming the problem into selecting non-overlapping intervals [r_i, d_i] to meet all deadlines without violation. This is equivalent to standard interval scheduling through a time-reversal transformation: negate all times to swap roles, setting the new finish time f_i' = -r_i and start time s_i' = -d_i, then apply the greedy earliest-finish heuristic on the reversed timeline to check feasibility or maximize on-time jobs. Such formulations arise in real-time systems and are APX-hard in general when processing times vary, but polynomial for fixed unit times via the above reduction.[6][19]
Storage allocation represents another foundational connection, modeling memory requests as time intervals during which a block of storage must be exclusively reserved, requiring non-overlapping assignments along a linear timeline to avoid conflicts. In the offline dynamic storage allocation problem, requests arrive with predefined intervals of need, and the task is to assign contiguous memory blocks such that no two active requests overlap in both time and address space, solvable via interval graph coloring techniques for the unweighted case. This variant differs from higher-dimensional packing problems, as it confines conflicts to a one-dimensional timeline rather than multi-resource bins.[6][20]
These classic variants highlight key distinctions from more advanced interval scheduling extensions: they assume single-resource usage without job grouping or multiple identical processors, maintaining one-dimensional temporal conflicts unlike two-dimensional bin packing, where spatial dimensions introduce additional overlap constraints.[6]
Graph-Theoretic Connections
Interval graphs provide a fundamental graph-theoretic framework for modeling interval scheduling problems. An interval graph is an undirected graph where each vertex corresponds to an interval on the real line, and an edge exists between two vertices if and only if their corresponding intervals overlap. In this representation, the problem of selecting a maximum set of non-overlapping intervals—known as the unweighted interval scheduling problem—translates directly to finding a maximum independent set in the interval graph, as an independent set consists precisely of vertices whose intervals do not intersect.[21]
For the unweighted case, the maximum independent set in an interval graph can be computed efficiently using a greedy algorithm that sorts the intervals by their ending times and iteratively selects the one with the earliest end time that does not overlap with previously selected intervals. This approach runs in O(n log n) time, where n is the number of intervals, and is optimal due to the perfect graph properties of interval graphs, where the size of the maximum independent set equals the minimum clique cover number.[22]
In the weighted case, the maximum weight independent set problem on interval graphs, which corresponds to weighted interval scheduling, can be solved using dynamic programming. Since interval graphs are chordal—a superclass where perfect elimination orders enable efficient computation—the algorithm processes vertices in a sorted order by interval endpoints, computing the maximum weight selectable set up to each point in O(n time after sorting. This leverages the tree-like structure implicit in the chordal elimination scheme, avoiding the exponential complexity of general graphs.[23][24]
The grouped variant of interval scheduling, where intervals are partitioned into groups and at most one interval per group can be selected without overlaps, introduces additional constraints that can be modeled by augmenting the interval graph with clique structures for each group. This formulation ties into graph coloring aspects, as the minimum number of groups needed to schedule all intervals relates to the chromatic number of the conflict graph, which for interval graphs equals the size of the maximum clique; however, the optimization under group constraints remains polynomial-time solvable via extensions of dynamic programming that account for group exclusions. The decision version's complexity connects to coloring problems in more general intersection graphs, highlighting NP-hardness when group overlaps deviate from interval structure.[25]
Interval graphs can be recognized in linear time using PQ-tree algorithms, which test for the consecutive ones property in the adjacency matrix and construct an interval representation if valid. This recognition, originally developed in 1976, enables efficient verification of whether a given graph admits an interval model, facilitating applications in diverse fields. In bioinformatics, interval graphs model DNA fragment overlaps for physical mapping and copy number variation detection, where maximum independent sets identify non-overlapping genomic regions of interest.[26][27]
Extensions and Applications
Multi-Resource Scheduling
Multi-resource interval scheduling generalizes the single-resource problem to k identical machines, where the goal is to assign a subset of given intervals to the machines such that no two intervals assigned to the same machine overlap in time, maximizing the number of selected intervals in the unweighted case or the total weight in the weighted case.[28] This problem arises in settings like parallel processing where multiple processors can handle non-overlapping tasks simultaneously, with each processor representing a resource. The single-resource case corresponds to k=1, which is solvable optimally via greedy selection in the unweighted case or dynamic programming in the weighted case.[6]
In the unweighted case, an optimal solution can be computed in O(n log n) time using a greedy algorithm that sorts the intervals by their ending times and assigns each interval to the machine that becomes available earliest after its start time, provided it does not overlap with previously assigned intervals on that machine. This approach, akin to multiprocessor scheduling heuristics, ensures no machine is idle when a compatible interval is available. For fixed k, the weighted variant is solvable in polynomial time via dynamic programming, where the state tracks the current interval and the availability times of the k machines, leading to O(n^2 k) time complexity by considering compatible assignments in sorted order.[6] List scheduling and bin-packing-inspired heuristics, such as sorting intervals by length and assigning to the least-loaded machine, provide practical near-optimal solutions for larger k, though the exact greedy method is preferred for optimality.[6]
When all intervals must be scheduled on k machines, the focus shifts to minimizing the makespan, defined as the maximum total length of intervals assigned to any single machine (since non-overlapping assignments ensure the busy time equals the sum of lengths). This problem is NP-hard, but a 2-approximation algorithm exists using a greedy assignment strategy that sorts intervals by decreasing length and assigns each to the machine with the current minimum load that can accommodate it without overlap. For large k, a polynomial-time approximation scheme (PTAS) achieves (1 + ε)-approximation for any fixed ε > 0, leveraging structural properties of interval graphs and dynamic programming on subsets.
The grouped multi-resource variant requires selecting at most one interval per group (where each group represents alternatives for a task) such that the selected intervals can be assigned to k machines without overlaps per machine, maximizing the number of represented groups. This problem is NP-hard even for k=2 and groups containing at least 2 intervals, as it generalizes the single-machine group interval scheduling, which reduces from 3-partition.[16]
Real-World Uses
Interval scheduling finds extensive application in resource allocation problems where tasks or events occupy time intervals and must be selected to avoid overlaps. In meeting room booking systems, the unweighted interval scheduling algorithm is commonly used to maximize the number of reservations for a single room by greedily selecting the meeting that ends earliest among compatible options. For instance, Google's Calendar employs variants of this approach to handle basic scheduling without priorities, ensuring efficient use of shared spaces. When priorities are involved, such as assigning higher-value events like executive meetings over routine ones, weighted interval scheduling is applied to optimize total value, as implemented in enterprise tools like Microsoft Outlook's resource booking features.
Broadcast scheduling leverages interval scheduling to manage airtime for television and radio channels, grouping advertisements or programs into non-overlapping slots to maximize revenue. In this context, the problem is modeled as selecting weighted intervals (e.g., ad spots with revenue values) across multiple channels, using greedy approximations to ensure high-value content airs without conflicts, as seen in systems developed by broadcasters like the BBC for dynamic playlist generation.
More recent applications include machine learning hyperparameter tuning, where trials are scheduled as time intervals on computational resources to avoid overlaps and maximize completed experiments within deadlines; frameworks like Ray Tune employ interval scheduling variants for this purpose in distributed training setups. Similarly, in the 2020s, electric vehicle (EV) charging station allocation has adopted interval scheduling to assign charging slots as non-overlapping intervals, optimizing for user demand and grid capacity.
Practical implementations often rely on open-source libraries for efficiency. Google's OR-Tools provides solvers for weighted and grouped variants, enabling custom optimizations in production environments like logistics and event planning.