Activity selection problem
The activity selection problem is a classic combinatorial optimization problem in computer science, where one is given a set of n activities, each defined by a start time s_i and a finish time f_i (with s_i < f_i), and the objective is to select the largest possible subset of these activities such that no two selected activities overlap in time—meaning, for any two compatible activities i and j, either s_i \geq f_j or s_j \geq f_i.[1]
This problem is canonically solved using a greedy algorithm, which first sorts the activities in non-decreasing order of their finish times (requiring O(n \log n) time) and then iteratively selects the activity with the earliest finish time that does not conflict with previously chosen activities (achieving O(n) time for the selection phase after sorting).[2] The algorithm's optimality is proven via the greedy choice property—there always exists an optimal solution that includes the activity finishing earliest—and optimal substructure, where the subproblem after the greedy choice yields an optimal solution for the remainder.[1] As detailed in standard algorithmic texts, this approach guarantees a maximum cardinality solution and serves as an introductory example of greedy methods in algorithm design.
The activity selection problem has practical applications in resource scheduling, such as allocating time slots for jobs on a single machine, meeting room reservations, or radio/TV program selection to maximize coverage without overlaps.[2] It exemplifies how greedy strategies can efficiently solve problems with matroid structure, though extensions like weighted variants (where activities have profits) require dynamic programming or other techniques for optimality.
Problem Description
The activity selection problem, also known as the interval scheduling problem, involves selecting a maximum number of non-overlapping activities from a given set to schedule on a single resource.[3][2]
Formally, there are n activities, indexed by i = 1 to n, where each activity i is represented as an interval with a start time s_i and a finish time f_i, satisfying s_i < f_i.[3] The input consists of a list of these activities, which may be provided in sorted or unsorted order.[2] The objective is to select a subset of these activities of maximum cardinality such that no two selected activities overlap, meaning they are pairwise compatible.[3][2]
Two activities i and j (with i \neq j) are compatible if their intervals do not overlap, i.e., either f_i \leq s_j or f_j \leq s_i.[3] Key assumptions include that activities cannot be preempted (they must run contiguously from start to finish) and that only a single resource is available, preventing concurrent execution of overlapping activities.[3][2] This formulation models scenarios such as scheduling jobs on a single machine or allocating time slots in a resource-constrained environment.
Illustrative Example
To illustrate the activity selection problem, consider a set of six activities, each defined by an integer start time s_i and finish time f_i (with s_i < f_i), as follows:
| Activity | Start Time (s_i) | Finish Time (f_i) |
|---|
| 1 | 1 | 4 |
| 2 | 3 | 5 |
| 3 | 0 | 6 |
| 4 | 5 | 7 |
| 5 | 3 | 8 |
| 6 | 5 | 9 |
[4]
Two activities i and j are compatible if either f_i \leq s_j or f_j \leq s_i; otherwise, they overlap and cannot both be selected.[4]
The goal is to find a maximum-cardinality subset of mutually compatible activities. The single-activity subsets each have size 1. For size-2 subsets, examples include {1, 4} (since f_1 = 4 \leq s_4 = 5), {1, 6} ($4 \leq 5), {2, 4} (f_2 = 5 \leq 5), and {2, 6} ($5 \leq 5). No activity is compatible with 3 after its finish time of 6, as all remaining activities start at or before 5. Similarly, subsets starting with 5 yield no further compatible activities. No size-3 (or larger) compatible subset exists, as any pair finishes by time 9 with no subsequent non-overlapping activity. Thus, the maximum cardinality is 2, achieved by subsets such as {1, 4} or {2, 6}.[4]
A timeline visualization highlights the overlaps: imagine a horizontal axis from time 0 to 9, with activity 3 spanning the broadest interval (0–6) and overlapping all others; activity 1 (1–4) overlaps only 2 and 5; activities 4 and 6 (both starting at 5) do not overlap with 1 or 2 but overlap each other.[4]
Certain selections fail to achieve the maximum. For instance, choosing activity 3 first (covering 0–6) leaves no room for any other, yielding only size 1, because its late finish blocks all subsequent starts at 3 or 5. In contrast, starting with activity 1 or 2 (finishing by 4 or 5) allows pairing with 4 or 6, reaching size 2 and suggesting that earlier-finishing activities enable more options.[4]
Greedy Solution
Algorithm Description
The greedy algorithm for the unweighted activity selection problem selects a maximum set of mutually compatible activities by prioritizing those that finish as early as possible, thereby maximizing opportunities for subsequent selections. The core idea involves sorting all activities in non-decreasing order of their finish times f_i and then greedily picking the next compatible activity that starts no earlier than the finish time of the last selected one.
This method follows three high-level steps: first, sort the activities such that f_1 \leq f_2 \leq \cdots \leq f_n; second, initialize the selection with the first activity (index 1); third, iterate through the remaining activities, adding an activity j to the selection if its start time s_j \geq f_k, where k is the index of the most recently selected activity.[5]
The rationale for this greedy choice lies in the fact that selecting the activity with the earliest finish time minimizes the time blocked for future activities, thus allowing the greatest potential for additional compatible selections later in the schedule.[6]
In cases of ties where multiple activities share the same finish time, the algorithm assumes distinct finish times for simplicity, but if ties occur, it selects any one arbitrarily without impacting the optimality of the maximum set size.
Pseudocode and Steps
The greedy algorithm for the activity selection problem can be implemented by first sorting the activities by their finish times and then iteratively selecting the next compatible activity with the earliest finish time. The following pseudocode, adapted from the standard formulation, assumes a list of n activities, each represented as a pair (s_i, f_i) where s_i is the start time and f_i is the finish time, with activities indexed from 1 to n. The procedure returns the indices of the selected activities in the original list after sorting.
GREEDY-ACTIVITY-SELECTOR(activities)
n ← length[activities]
if n = 0
return empty set
sort activities by increasing f_i // O(n log n) time
let S be an empty set
add 1 to S // Select the first activity (index 1 after sorting)
last_finish ← f_1
for i = 2 to n
if s_i ≥ last_finish
add i to S
last_finish ← f_i
return S // Indices in the sorted list; map back to original if needed
GREEDY-ACTIVITY-SELECTOR(activities)
n ← length[activities]
if n = 0
return empty set
sort activities by increasing f_i // O(n log n) time
let S be an empty set
add 1 to S // Select the first activity (index 1 after sorting)
last_finish ← f_1
for i = 2 to n
if s_i ≥ last_finish
add i to S
last_finish ← f_i
return S // Indices in the sorted list; map back to original if needed
To illustrate the execution, consider a small example with four activities: A1 (start=1, finish=4), A2 (start=3, finish=5), A3 (start=0, finish=7), A4 (start=8, finish=10). First, sort by finish times: A1 (1-4), A2 (3-5), A3 (0-7), A4 (8-10). Initialize S = {1} (A1), last_finish = 4. For i=2 (A2), s_2=3 < 4, skip. For i=3 (A3), s_3=0 < 4, skip. For i=4 (A4), s_4=8 ≥ 4, add 4 to S, last_finish=10. The selected set S = {1,4}, corresponding to A1 and A4.
The algorithm handles edge cases gracefully. For an empty list (n=0), it returns an empty set. If all activities overlap (e.g., all start before the finish of the first), it selects only the first activity after sorting (the one with the earliest finish time). If all activities are non-overlapping (each subsequent start time is at or after the previous finish time), it selects all n activities.
Analysis of Correctness
Greedy Choice Property
The greedy choice property for the activity selection problem asserts that, given a nonempty set S of activities sorted by increasing finish times, there exists an optimal solution that includes the activity g with the earliest finish time f_g. This property ensures that selecting the activity that finishes first among those compatible with previously chosen activities does not preclude achieving a globally optimal solution maximizing the number of selected activities.[4]
To prove this, consider an arbitrary optimal solution O to the problem that includes the minimum number of activities incompatible with g. If O already contains g, the property holds directly. Otherwise, assume O does not include g; let O = \{a_1, a_2, \dots, a_k\} where the activities are ordered by start times, so a_1 is the first activity in O and f_{a_1} \geq f_g since g finishes earliest among all activities in S. Construct a new solution O' = \{g, a_2, \dots, a_k\} by replacing a_1 with g. This exchange yields |O'| = |O|, maintaining the solution size. Moreover, O' remains feasible because g is compatible with a_2, \dots, a_k: each a_i for i \geq 2 starts after f_{a_1} \geq f_g, ensuring no overlap with g. Thus, O' is another optimal solution that includes g, establishing the property via this exchange argument.[4]
This key lemma underpins the correctness of the greedy algorithm: the initial choice of the earliest-finishing activity can always be incorporated into some optimal solution without loss of optimality, allowing the recursive application of the same strategy to the remaining compatible activities. The proof relies on the sorted order by finish times and the definition of compatibility, where two activities are compatible if the start time of one is at or after the finish time of the other.[4]
Optimal Substructure Proof
The activity selection problem possesses the optimal substructure property, meaning that an optimal solution to the problem can be constructed by combining an optimal solution to a subproblem with a suitable choice for the current step. Specifically, after selecting the greedy activity g (the one that finishes earliest), the subproblem consisting of all remaining activities compatible with g admits an optimal solution that can be found using the same greedy method.[1][7]
To prove this, let S be the set of n activities sorted by non-decreasing finish times, and let A be any optimal solution for S. By the greedy choice property, there exists an optimal solution that includes g; thus, without loss of generality, let A be such a solution with g \in A. Denote S' = \{ a_i \in S \mid s_i \geq f_g \}, where s_i and f_i are the start and finish times of activity a_i, and g = a_1 is the earliest-finishing activity. Then, A' = A \setminus \{ g \} forms an optimal solution for the subproblem on S'.[1][7]
Suppose, for contradiction, that there exists a solution B' to S' with |B'| > |A'|. Then, the set B = B' \cup \{ g \} is a feasible solution for S, as all activities in B' start after g finishes and are mutually compatible. Moreover, |B| = |B'| + 1 > |A'| + 1 = |A|, which contradicts the optimality of A. Therefore, A' is optimal for S', establishing that the original optimal solution decomposes into g plus an optimal solution to the subproblem.[1][7]
This property holds more generally by induction on the number of activities n. For the base cases, if n = 0, the empty solution is optimal; if n = 1, selecting the single activity is optimal.[8] Assume the greedy method yields an optimal solution for all instances with fewer than n activities. For an instance with n activities, after selecting g, the subproblem on S' has fewer than n activities, so the inductive hypothesis applies, and the greedy solution for S' is optimal. Thus, the overall solution is optimal for S.[7][8]
Together with the greedy choice property—which guarantees an optimal solution includes the earliest-finishing activity—this optimal substructure ensures the greedy algorithm constructs a globally optimal solution, obviating the need for dynamic programming approaches that solve overlapping subproblems exhaustively.[1]
Time and Space Complexity
The greedy algorithm for the activity selection problem requires sorting the activities by their finish times, which takes O(n \log n) time using a standard comparison-based sorting algorithm such as merge sort or heap sort.[9] Following the sort, the algorithm performs a linear scan to select compatible activities, requiring O(n) time to iterate through the sorted list and check start times against the last selected finish time.[10] Thus, the overall time complexity is O(n \log n), as the sorting step dominates.[2]
In the best case, where the input activities are already sorted by finish times, the sorting step is unnecessary, reducing the time complexity to O(n) for the linear selection pass alone.[11] Conversely, in the worst case, the input requires a full sort, maintaining the O(n \log n) bound regardless of the specific activity overlaps.[12]
The space complexity of the algorithm is O(1) additional space beyond the input and output, assuming an in-place sorting method like heapsort and storing only indices or a minimal output list during selection.[13] If the output list of selected activities is considered part of the space usage and no in-place sort is used, it becomes O(n) in the worst case to hold the result.[14]
In contrast to the greedy approach, a brute-force solution that enumerates all possible subsets of activities to find the maximum compatible set has exponential time complexity, specifically O(2^n \cdot n), due to checking $2^n subsets and verifying compatibility in linear time per subset.[15] This makes the greedy polynomial-time method vastly more efficient for large n.[16]
Practical Considerations
In practical implementations of the unweighted greedy algorithm for the activity selection problem, activities are typically represented using simple data structures such as arrays or lists, where each activity is stored as a tuple or pair containing its start time and finish time. This structure allows efficient access during sorting and selection phases. For example, the algorithm begins by sorting the list of activities based on their finish times using a custom comparator, which ensures that the greedy choice prioritizes activities ending earliest. In Python, this sorting step can be implemented as sorted(activities, key=lambda x: x[1]), where x[1] refers to the finish time in each activity tuple.
Since input activities are often provided in unsorted order, the algorithm requires a one-time preprocessing sort by finish times before iterating to select compatible activities; this sort dominates the overall runtime but enables linear-time selection afterward. To preserve original indices if needed for output (e.g., reporting activity identifiers), implementations may store indices alongside times during sorting or use a stable sorting method that maintains relative order. When multiple activities share the same finish time (ties), the choice among them does not affect optimality, so ties can be broken arbitrarily; however, for consistency or to favor certain preferences, a stable sort or secondary sorting key by start time can be employed to prioritize activities that begin later, minimizing idle time.[17][18]
The activity selection problem finds real-world applications in resource allocation scenarios, such as scheduling jobs on a single machine to maximize throughput or allocating a single meeting room to multiple events without conflicts. In CPU scheduling, for instance, it models selecting non-overlapping tasks to optimize processor utilization under unweighted assumptions. Common implementation pitfalls include assuming the input is already sorted by finish times, which can lead to incorrect selections if overlooked, and failing to properly verify non-overlap conditions during the greedy iteration. Additionally, when dealing with floating-point times for precision in real-time systems, comparisons between start and finish times must account for potential rounding errors, often using a small epsilon threshold (e.g., 1e-9) to avoid false overlaps or misses due to machine precision limits.[19][11][20]
Variants and Extensions
Weighted Activity Selection
In the weighted activity selection problem, each activity i is associated with a positive weight w_i > 0 in addition to its start time s_i and finish time f_i, and the objective is to select a compatible subset of activities (no two overlap) that maximizes the total weight \sum w_i rather than the number of activities.[21]
The standard greedy approach of selecting activities by earliest finish time, which is optimal for the unweighted case, fails in the weighted variant because it may select low-weight activities that prevent selecting higher-weight alternatives. A standard counterexample is an early-finishing activity with weight 1 that overlaps with two later non-overlapping activities each with weight 100; the greedy selects the weight-1 activity, yielding 1, while the optimal selects the two weight-100 activities, totaling 200.[21]
===== END CLEANED SECTION =====