Fact-checked by Grok 2 weeks ago

Activity selection problem

The activity selection problem is a classic combinatorial optimization problem in computer science, where one is given a set of n activities, each defined by a start time s_i and a finish time f_i (with s_i < f_i), and the objective is to select the largest possible subset of these activities such that no two selected activities overlap in time—meaning, for any two compatible activities i and j, either s_i \geq f_j or s_j \geq f_i. This problem is canonically solved using a greedy algorithm, which first sorts the activities in non-decreasing order of their finish times (requiring O(n \log n) time) and then iteratively selects the activity with the earliest finish time that does not conflict with previously chosen activities (achieving O(n) time for the selection phase after sorting). The algorithm's optimality is proven via the greedy choice property—there always exists an optimal solution that includes the activity finishing earliest—and optimal substructure, where the subproblem after the greedy choice yields an optimal solution for the remainder. As detailed in standard algorithmic texts, this approach guarantees a maximum cardinality solution and serves as an introductory example of greedy methods in algorithm design. The activity selection problem has practical applications in resource scheduling, such as allocating time slots for jobs on a single machine, meeting room reservations, or radio/TV program selection to maximize coverage without overlaps. It exemplifies how greedy strategies can efficiently solve problems with matroid structure, though extensions like weighted variants (where activities have profits) require dynamic programming or other techniques for optimality.

Problem Description

Formal Definition

The activity selection problem, also known as the interval scheduling problem, involves selecting a maximum number of non-overlapping activities from a given set to schedule on a single resource. Formally, there are n activities, indexed by i = 1 to n, where each activity i is represented as an interval with a start time s_i and a finish time f_i, satisfying s_i < f_i. The input consists of a list of these activities, which may be provided in sorted or unsorted order. The objective is to select a subset of these activities of maximum cardinality such that no two selected activities overlap, meaning they are pairwise compatible. Two activities i and j (with i \neq j) are compatible if their intervals do not overlap, i.e., either f_i \leq s_j or f_j \leq s_i. Key assumptions include that activities cannot be preempted (they must run contiguously from start to finish) and that only a single resource is available, preventing concurrent execution of overlapping activities. This formulation models scenarios such as scheduling jobs on a single machine or allocating time slots in a resource-constrained environment.

Illustrative Example

To illustrate the activity selection problem, consider a set of six activities, each defined by an integer start time s_i and finish time f_i (with s_i < f_i), as follows:
ActivityStart Time (s_i)Finish Time (f_i)
114
235
306
457
538
659
Two activities i and j are compatible if either f_i \leq s_j or f_j \leq s_i; otherwise, they overlap and cannot both be selected. The goal is to find a maximum-cardinality subset of mutually compatible activities. The single-activity subsets each have size 1. For size-2 subsets, examples include {1, 4} (since f_1 = 4 \leq s_4 = 5), {1, 6} ($4 \leq 5), {2, 4} (f_2 = 5 \leq 5), and {2, 6} ($5 \leq 5). No activity is compatible with 3 after its finish time of 6, as all remaining activities start at or before 5. Similarly, subsets starting with 5 yield no further compatible activities. No size-3 (or larger) compatible subset exists, as any pair finishes by time 9 with no subsequent non-overlapping activity. Thus, the maximum cardinality is 2, achieved by subsets such as {1, 4} or {2, 6}. A timeline visualization highlights the overlaps: imagine a horizontal axis from time 0 to 9, with activity 3 spanning the broadest interval (0–6) and overlapping all others; activity 1 (1–4) overlaps only 2 and 5; activities 4 and 6 (both starting at 5) do not overlap with 1 or 2 but overlap each other. Certain selections fail to achieve the maximum. For instance, choosing activity 3 first (covering 0–6) leaves no room for any other, yielding only size 1, because its late finish blocks all subsequent starts at 3 or 5. In contrast, starting with activity 1 or 2 (finishing by 4 or 5) allows pairing with 4 or 6, reaching size 2 and suggesting that earlier-finishing activities enable more options.

Greedy Solution

Algorithm Description

The greedy algorithm for the unweighted activity selection problem selects a maximum set of mutually compatible activities by prioritizing those that finish as early as possible, thereby maximizing opportunities for subsequent selections. The core idea involves sorting all activities in non-decreasing order of their finish times f_i and then greedily picking the next compatible activity that starts no earlier than the finish time of the last selected one. This method follows three high-level steps: first, sort the activities such that f_1 \leq f_2 \leq \cdots \leq f_n; second, initialize the selection with the first activity (index 1); third, iterate through the remaining activities, adding an activity j to the selection if its start time s_j \geq f_k, where k is the index of the most recently selected activity. The rationale for this greedy choice lies in the fact that selecting the activity with the earliest finish time minimizes the time blocked for future activities, thus allowing the greatest potential for additional compatible selections later in the schedule. In cases of ties where multiple activities share the same finish time, the algorithm assumes distinct finish times for simplicity, but if ties occur, it selects any one arbitrarily without impacting the optimality of the maximum set size.

Pseudocode and Steps

The greedy algorithm for the activity selection problem can be implemented by first sorting the activities by their finish times and then iteratively selecting the next compatible activity with the earliest finish time. The following pseudocode, adapted from the standard formulation, assumes a list of n activities, each represented as a pair (s_i, f_i) where s_i is the start time and f_i is the finish time, with activities indexed from 1 to n. The procedure returns the indices of the selected activities in the original list after sorting.
GREEDY-ACTIVITY-SELECTOR(activities)
    n ← length[activities]
    if n = 0
        return empty set
    sort activities by increasing f_i  // O(n log n) time
    let S be an empty set
    add 1 to S  // Select the first activity (index 1 after sorting)
    last_finish ← f_1
    for i = 2 to n
        if s_i ≥ last_finish
            add i to S
            last_finish ← f_i
    return S  // Indices in the sorted list; map back to original if needed
To illustrate the execution, consider a small example with four activities: A1 (start=1, finish=4), A2 (start=3, finish=5), A3 (start=0, finish=7), A4 (start=8, finish=10). First, sort by finish times: A1 (1-4), A2 (3-5), A3 (0-7), A4 (8-10). Initialize S = {1} (A1), last_finish = 4. For i=2 (A2), s_2=3 < 4, skip. For i=3 (A3), s_3=0 < 4, skip. For i=4 (A4), s_4=8 ≥ 4, add 4 to S, last_finish=10. The selected set S = {1,4}, corresponding to A1 and A4. The algorithm handles edge cases gracefully. For an empty list (n=0), it returns an empty set. If all activities overlap (e.g., all start before the finish of the first), it selects only the first activity after sorting (the one with the earliest finish time). If all activities are non-overlapping (each subsequent start time is at or after the previous finish time), it selects all n activities.

Analysis of Correctness

Greedy Choice Property

The greedy choice property for the activity selection problem asserts that, given a nonempty set S of activities sorted by increasing finish times, there exists an optimal solution that includes the activity g with the earliest finish time f_g. This property ensures that selecting the activity that finishes first among those compatible with previously chosen activities does not preclude achieving a globally optimal solution maximizing the number of selected activities. To prove this, consider an arbitrary optimal solution O to the problem that includes the minimum number of activities incompatible with g. If O already contains g, the property holds directly. Otherwise, assume O does not include g; let O = \{a_1, a_2, \dots, a_k\} where the activities are ordered by start times, so a_1 is the first activity in O and f_{a_1} \geq f_g since g finishes earliest among all activities in S. Construct a new solution O' = \{g, a_2, \dots, a_k\} by replacing a_1 with g. This exchange yields |O'| = |O|, maintaining the solution size. Moreover, O' remains feasible because g is compatible with a_2, \dots, a_k: each a_i for i \geq 2 starts after f_{a_1} \geq f_g, ensuring no overlap with g. Thus, O' is another optimal solution that includes g, establishing the property via this exchange argument. This key lemma underpins the correctness of the greedy algorithm: the initial choice of the earliest-finishing activity can always be incorporated into some optimal solution without loss of optimality, allowing the recursive application of the same strategy to the remaining compatible activities. The proof relies on the sorted order by finish times and the definition of compatibility, where two activities are compatible if the start time of one is at or after the finish time of the other.

Optimal Substructure Proof

The activity selection problem possesses the optimal substructure property, meaning that an optimal solution to the problem can be constructed by combining an optimal solution to a subproblem with a suitable choice for the current step. Specifically, after selecting the greedy activity g (the one that finishes earliest), the subproblem consisting of all remaining activities compatible with g admits an optimal solution that can be found using the same greedy method. To prove this, let S be the set of n activities sorted by non-decreasing finish times, and let A be any optimal solution for S. By the greedy choice property, there exists an optimal solution that includes g; thus, without loss of generality, let A be such a solution with g \in A. Denote S' = \{ a_i \in S \mid s_i \geq f_g \}, where s_i and f_i are the start and finish times of activity a_i, and g = a_1 is the earliest-finishing activity. Then, A' = A \setminus \{ g \} forms an optimal solution for the subproblem on S'. Suppose, for contradiction, that there exists a solution B' to S' with |B'| > |A'|. Then, the set B = B' \cup \{ g \} is a feasible solution for S, as all activities in B' start after g finishes and are mutually compatible. Moreover, |B| = |B'| + 1 > |A'| + 1 = |A|, which contradicts the optimality of A. Therefore, A' is optimal for S', establishing that the original optimal solution decomposes into g plus an optimal solution to the subproblem. This property holds more generally by induction on the number of activities n. For the base cases, if n = 0, the empty solution is optimal; if n = 1, selecting the single activity is optimal. Assume the greedy method yields an optimal solution for all instances with fewer than n activities. For an instance with n activities, after selecting g, the subproblem on S' has fewer than n activities, so the inductive hypothesis applies, and the greedy solution for S' is optimal. Thus, the overall solution is optimal for S. Together with the greedy choice property—which guarantees an optimal solution includes the earliest-finishing activity—this optimal substructure ensures the greedy algorithm constructs a globally optimal solution, obviating the need for dynamic programming approaches that solve overlapping subproblems exhaustively.

Performance and Implementation

Time and Space Complexity

The greedy algorithm for the activity selection problem requires sorting the activities by their finish times, which takes O(n \log n) time using a standard comparison-based sorting algorithm such as merge sort or heap sort. Following the sort, the algorithm performs a linear scan to select compatible activities, requiring O(n) time to iterate through the sorted list and check start times against the last selected finish time. Thus, the overall time complexity is O(n \log n), as the sorting step dominates. In the best case, where the input activities are already sorted by finish times, the sorting step is unnecessary, reducing the time complexity to O(n) for the linear selection pass alone. Conversely, in the worst case, the input requires a full sort, maintaining the O(n \log n) bound regardless of the specific activity overlaps. The space complexity of the algorithm is O(1) additional space beyond the input and output, assuming an in-place sorting method like heapsort and storing only indices or a minimal output list during selection. If the output list of selected activities is considered part of the space usage and no in-place sort is used, it becomes O(n) in the worst case to hold the result. In contrast to the greedy approach, a brute-force solution that enumerates all possible subsets of activities to find the maximum compatible set has exponential time complexity, specifically O(2^n \cdot n), due to checking $2^n subsets and verifying compatibility in linear time per subset. This makes the greedy polynomial-time method vastly more efficient for large n.

Practical Considerations

In practical implementations of the unweighted greedy algorithm for the activity selection problem, activities are typically represented using simple data structures such as arrays or lists, where each activity is stored as a tuple or pair containing its start time and finish time. This structure allows efficient access during sorting and selection phases. For example, the algorithm begins by sorting the list of activities based on their finish times using a custom comparator, which ensures that the greedy choice prioritizes activities ending earliest. In Python, this sorting step can be implemented as sorted(activities, key=lambda x: x[1]), where x[1] refers to the finish time in each activity tuple. Since input activities are often provided in unsorted order, the algorithm requires a one-time preprocessing sort by finish times before iterating to select compatible activities; this sort dominates the overall runtime but enables linear-time selection afterward. To preserve original indices if needed for output (e.g., reporting activity identifiers), implementations may store indices alongside times during sorting or use a stable sorting method that maintains relative order. When multiple activities share the same finish time (ties), the choice among them does not affect optimality, so ties can be broken arbitrarily; however, for consistency or to favor certain preferences, a stable sort or secondary sorting key by start time can be employed to prioritize activities that begin later, minimizing idle time. The activity selection problem finds real-world applications in resource allocation scenarios, such as scheduling jobs on a single machine to maximize throughput or allocating a single meeting room to multiple events without conflicts. In CPU scheduling, for instance, it models selecting non-overlapping tasks to optimize processor utilization under unweighted assumptions. Common implementation pitfalls include assuming the input is already sorted by finish times, which can lead to incorrect selections if overlooked, and failing to properly verify non-overlap conditions during the greedy iteration. Additionally, when dealing with floating-point times for precision in real-time systems, comparisons between start and finish times must account for potential rounding errors, often using a small epsilon threshold (e.g., 1e-9) to avoid false overlaps or misses due to machine precision limits.

Variants and Extensions

Weighted Activity Selection

In the weighted activity selection problem, each activity i is associated with a positive weight w_i > 0 in addition to its start time s_i and finish time f_i, and the objective is to select a compatible subset of activities (no two overlap) that maximizes the total weight \sum w_i rather than the number of activities. The standard greedy approach of selecting activities by earliest finish time, which is optimal for the unweighted case, fails in the weighted variant because it may select low-weight activities that prevent selecting higher-weight alternatives. A standard counterexample is an early-finishing activity with weight 1 that overlaps with two later non-overlapping activities each with weight 100; the greedy selects the weight-1 activity, yielding 1, while the optimal selects the two weight-100 activities, totaling 200. ===== END CLEANED SECTION =====

References

  1. [1]
    Activity-Selection Problem - Kent
    Greedy Algorithm for Selection Problem. I. Sort the input activities by increasing finishing time. f1 ≤ f2 ≤ . . . ≤ fn. II Call GREEDY-ACTIVITY-SELECTOR (Sif).
  2. [2]
    [PDF] 1 Greedy Algorithms
    May 22, 2017 · The activity selection problem has many applications, most notably in scheduling jobs to run on a single machine. 1.1.1 Optimal Substructure.
  3. [3]
    [PDF] CMSC 451: Lecture 5 Greedy Algorithms for Scheduling
    Here is a formal problem definition. Interval scheduling problem: Given a set R of n requests with start-finish times [si,fi] for 1 ≤ i ≤ n, determine a ...<|control11|><|separator|>
  4. [4]
    Introduction to Algorithms - MIT Press
    A comprehensive update of the leading algorithms text, with new material on matchings in bipartite graphs, online algorithms, machine learning, and other ...
  5. [5]
  6. [6]
    [PDF] V. Greedy Algorithms
    A greedy algorithm makes the best choice at the moment, without regard for future consequences, using a 'take what you can get now' strategy.
  7. [7]
    [PDF] Graduate Algorithms - Computer Science
    Activity Selection problem: Prove Greedy Choice. Prove Optimal Substructure. Page 10. 10-8: Proving Greedy Choice. Let a1 be the activity that ends first – ...
  8. [8]
    [PDF] Greedy Algorithms: Theory Optimization Problems - Cal Poly
    Theorem 1. Problem Activity Selection has optimal substructure prop- erty. Proof. Consider an instance P of the Activity Selection problem with input. A ...
  9. [9]
    [PDF] Lecture 1 : Greedy Algorithms CLRS 16.1-16.2 March 28, 2002 1 ...
    Mar 28, 2002 · • Running time is obviously O(n log n). 2. Page 3. • ús algorithm correct˝. - ‚utput is set of non-overlapping activities, but is it the ...
  10. [10]
    [PDF] Greedy Algorithms
    Activity Selection Problem. • Scheduling a resource among several ... ≤ fn ; Can be sorted in O (n log n ) time. Procedure for activity selection (from CLRS).
  11. [11]
    Activity Selection Problem - Greedy Algorithm - Studytonight
    The Activity Selection Problem is an optimization problem which deals with the selection of non-conflicting activities that needs to be executed by a single ...Missing: paper | Show results with:paper
  12. [12]
    Activity Selection Problem Questions and Answers - Sanfoundry
    Hence, the best-case time complexity for the activity selection problem will be O(n).
  13. [13]
    Greedy Algorithm for Activity Selection - AlgoDaily
    There is no additional intermediate space involved for storing an array or matrix, making it O(1) space complexity.
  14. [14]
    Activity Selection Problem - Scaler Blog
    Sep 26, 2024 · Time and Space Complexity for the Greedy approach when the given set of activities are not sorted is O(nlogn) and O(1) respectively. This second ...
  15. [15]
    [PDF] Dynamic Programming - cs.Princeton
    Weighted Activity Selection: Brute Force. INPUT: N, s1,…,sN , f1,…,fN , w1,…,wN. Sort jobs by increasing finish times so that f1 ≤ f2 ≤ ... ≤ fN. Compute ...
  16. [16]
    4.1 Activity-Selection Problem - HKU
    This leads to the activity selection problem: There are a set of ... Introduction to Algorithms, Second Edition. The MIT Press, Massachusetts ...
  17. [17]
    [PDF] Chapter 16: Greedy Algorithms
    The activity-selection problem is the problem of selecting the largest set of mutually compatible activities. In this algorithm the activities are first sorted ...
  18. [18]
    16.1 An activity-selection problem - CLRS Solutions - walkccc.me
    Suppose that instead of always selecting the first activity to finish, we instead select the last activity to start that is compatible with all previously ...Missing: seminal paper<|control11|><|separator|>
  19. [19]
    [PDF] 1 Greedy Algorithms - Stanford University
    May 18, 2016 · The activity selection problem has many applications, most notably in scheduling jobs to run on a single machine. 1.1.1 Optimal Substructure.Missing: implementation | Show results with:implementation
  20. [20]
    Activity Selection Problem: The Ultimate Guide - HeyCoach | Blogs
    Common Mistakes to Avoid · Ignoring Sorting: Forgetting to sort activities by finish time is like trying to bake a cake without preheating the oven. · Overlapping ...
  21. [21]
    [PDF] ‣ weighted interval scheduling ‣ segmented least squares ...
    Greedy algorithm is correct if all weights are 1. Observation. Greedy algorithm fails spectacularly for weighted version. 8 weight = 999 weight ...
  22. [22]
    [PDF] 1 Maximum Independent Set Problem in Graphs
    The algorithm is simple and runs in linear time but is not obvious. To see this consider the weighted problem for intervals. The standard algorithm to solve.
  23. [23]
    [PDF] 1 A Greedy Algorithm for Scheduling Jobs with Deadlines and Profits
    Theorem 1 The schedule output by the greedy algorithm is optimal, that is, it is feasible and the profit is as large as possible among all feasible solutions.
  24. [24]
    A note on sequencing jobs with deadlines problem - ScienceDirect
    Two simple algorithms for sequencing jobs with deadline problems are proposed. The purpose is the maximization of the total profit earned when a job is ...
  25. [25]
    [PDF] APPROXIMATION ALGORITHMS FOR SCHEDULING PROBLEMS
    This thesis describes efficient approximation algorithms for some NP-Hard deterministic machine scheduling and related problems. An approximation algorithm ...
  26. [26]