Multilevel feedback queue
A multilevel feedback queue (MLFQ) is a CPU scheduling algorithm employed in operating systems to efficiently manage process execution by dividing the ready queue into multiple priority levels, where processes can dynamically move between queues based on their observed CPU usage and behavior, approximating shortest-job-first scheduling without prior knowledge of process burst times.[1] Originally described by Fernando J. Corbató and colleagues in 1962 as part of the Compatible Time-Sharing System (CTSS), the MLFQ scheduler addresses the challenges of mixed workloads, including interactive and batch jobs, by prioritizing short or I/O-bound processes while ensuring progress for longer-running ones.[1] Key to its operation are several rules: processes begin in the highest-priority queue and are scheduled using round-robin (RR) within each queue; upon exhausting a queue's time quantum, a process demotes to the next lower-priority queue with a potentially larger quantum; and mechanisms like periodic priority boosts prevent indefinite starvation of low-priority jobs.[1][2] This adaptive feedback mechanism allows MLFQ to favor responsive, short jobs—reducing average response times—while compute-intensive jobs gradually descend to lower queues, balancing throughput and fairness without requiring advance predictions of job lengths.[2][1] Implementations vary across systems: early versions appeared in Multics and BSD UNIX, while modern examples include Solaris (with up to 60 queues and configurable time slices from 20 ms to hundreds of ms) and elements in Windows NT scheduling.[1][3] Despite its strengths, MLFQ can risk process starvation if boosts are infrequent, though tunable parameters like maximum wait times mitigate this in practice.[2][3]Introduction
Definition and Purpose
A multilevel feedback queue (MLFQ) is a class of CPU scheduling algorithms employed in operating systems to manage process execution through a hierarchy of priority-based queues, where processes can dynamically shift between queues according to their runtime behavior. This structure enables the system to observe and adapt to process characteristics, such as CPU burst lengths, without requiring advance knowledge of job durations. Originally developed for time-sharing systems, MLFQ approximates the efficiency of shortest-job-first (SJF) scheduling by initially treating all processes as potentially short-running, thereby promoting fairness and efficiency in multiprogrammed environments.[4][1] The primary purpose of MLFQ is to enhance user responsiveness by prioritizing short, interactive, or I/O-bound processes, which typically require quick execution slices, while mitigating the risk of long-running CPU-bound processes monopolizing system resources. By demoting processes that exhibit prolonged CPU usage to lower-priority queues, MLFQ prevents system hogging and maintains high overall throughput, ensuring that the majority of jobs complete in a timely manner even under varying workloads. This dynamic adaptation supports balanced performance in interactive computing scenarios, where low response times for foreground tasks are critical alongside efficient batch processing.[1][4] At its core, MLFQ addresses the shortcomings of static scheduling algorithms, such as round-robin, which apply uniform time quanta to all processes regardless of their I/O or CPU intensity, often leading to suboptimal responsiveness for mixed workloads. Through feedback mechanisms that adjust priorities based on empirical usage patterns, MLFQ incorporates learning to better allocate CPU time, fostering a more equitable and performant system that evolves with process demands.[1]Historical Development
The multilevel feedback queue (MLFQ) scheduling algorithm originated in 1962 with the work of Fernando J. Corbató and colleagues at MIT, as described in their paper on the Compatible Time-Sharing System (CTSS). This system, running on modified IBM 709 hardware, pioneered time-sharing by allowing multiple users to access the computer interactively through remote terminals, a departure from the era's dominant batch processing paradigms.[5] A key milestone was the first practical implementation of MLFQ in CTSS, which supported up to 30 simultaneous users by dynamically adjusting process priorities to favor short, interactive tasks over long-running computations.[5] This innovation enabled efficient resource sharing on a single machine, fundamentally shifting computing from sequential job batches to concurrent, user-driven sessions and laying groundwork for modern multitasking environments. For these contributions to time-sharing, Corbató received the 1990 ACM A.M. Turing Award.[6] The MLFQ built upon earlier multilevel queue concepts from 1950s batch processing systems, where fixed-priority queues managed job streams to optimize throughput.[7] It was further refined in the 1960s within Multics, the successor to CTSS co-developed by MIT, Bell Labs, and General Electric, to accommodate growing interactive workloads in a multi-user utility computing model.[5] While influencing variants in later operating systems like Unix, the algorithm's core structure has endured as a foundational approach without significant modifications since the 1970s.[1]Fundamentals of Process Scheduling
Basic Concepts in CPU Scheduling
In operating systems, processes execute through a cycle of CPU and I/O bursts, where a CPU burst represents the time a process requires the processor for computation before performing input/output operations or blocking.[8] CPU bursts vary in length; short bursts are typical of I/O-bound processes, which spend most time waiting for I/O and thus interact frequently with the system, while long bursts characterize CPU-bound processes that perform intensive calculations with minimal I/O.[9] Key performance metrics for evaluating scheduling effectiveness include turnaround time, defined as the interval from process arrival to completion; waiting time, the duration spent in the ready queue excluding actual execution; and response time, the period from arrival until the process first receives CPU attention.[8] These metrics balance goals like maximizing CPU utilization and throughput while minimizing delays, particularly in systems handling diverse process types.[9] Fundamental scheduling algorithms provide baselines for managing process execution. First-come, first-served (FCFS) allocates the CPU to the longest-waiting ready process in arrival order, offering simplicity and fairness without preemption.[8] Its strengths lie in low overhead and ease of implementation, but it suffers from the convoy effect, where a single long CPU-bound process delays many short I/O-bound ones, leading to poor average waiting times (e.g., 17 ms in example workloads).[8] Shortest-job-first (SJF) prioritizes processes with the smallest estimated CPU burst, achieving optimality in minimizing average waiting and response times for non-preemptive scenarios assuming accurate burst predictions.[8] While SJF improves throughput over FCFS (e.g., reducing average wait to 7 ms), it risks indefinite starvation for longer jobs if short ones continually arrive and requires prior knowledge of burst times, which is often unavailable.[8] Round-robin (RR) scheduling assigns fixed time quanta (e.g., 10-100 ms) to each ready process in a circular queue, preempting upon expiration to promote fairness and responsiveness.[8] It excels in time-sharing environments by preventing starvation and supporting interactive workloads, yielding balanced waiting times (e.g., 5.66 ms with appropriate quanta).[8] However, RR incurs context-switching overhead, and poorly chosen quanta can mimic FCFS's convoy effect for CPU-bound processes or degrade throughput via excessive preemptions in mixed settings.[8] Static algorithms like FCFS, SJF, and RR falter in mixed workloads combining I/O-bound and CPU-bound processes, as fixed ordering or quanta fail to adapt to varying burst patterns, resulting in inefficiencies such as prolonged waits for interactive tasks or underutilized CPU during I/O phases.[10] For instance, FCFS and RR exhibit convoy delays when CPU-intensive jobs block shorter ones, while SJF's reliance on static estimates exacerbates starvation without dynamic adjustments.[11] This underscores the need for adaptive scheduling that observes process behavior over time to dynamically prioritize based on observed bursts, enhancing responsiveness and utilization across heterogeneous demands.[9]Evolution to Multilevel Queues
Single-queue scheduling algorithms, such as round-robin (RR), treat all processes equally by allocating CPU time in fixed quanta to each ready process, which ensures fairness but often results in inefficiencies for mixed workloads combining interactive and CPU-bound jobs.[1] For instance, short interactive tasks may suffer long wait times behind long-running batch processes, degrading response times, while simple FCFS scheduling exacerbates convoy effects where short jobs are delayed by a single long job.[1] To address these issues, multilevel queue scheduling emerged as an advancement, partitioning processes into multiple distinct queues based on fixed attributes like process type or priority at the time of creation.[1] Each queue operates independently with its own scheduling policy: higher-priority queues, typically for interactive or system processes, might employ RR with short quanta to prioritize responsiveness, while lower-priority queues for batch jobs could use FCFS to minimize overhead for long computations.[1] This hierarchical structure allows the scheduler to favor critical tasks without uniformly applying a single algorithm, improving overall system balance in early multiprogrammed environments.[1] Despite these benefits, multilevel queues suffer from rigid assignments, where processes cannot migrate between queues even if their behavior evolves—for example, an initially interactive process turning CPU-intensive remains stuck in a high-priority queue, potentially starving others.[1] This lack of adaptability limits responsiveness to dynamic workload changes, highlighting the need for feedback mechanisms to enable priority adjustments based on observed execution patterns.[1] Early implementations in 1960s multiprogramming systems demonstrated how fixed assignments could constrain flexibility, paving the way for dynamic reclassification in subsequent designs.[1]The MLFQ Algorithm
Queue Structure and Priorities
The multilevel feedback queue (MLFQ) employs a hierarchical arrangement of multiple queues, each corresponding to a distinct priority level, to manage process scheduling efficiently. Systems typically implement between 2 and n queues, with 3 queues being a common setup to balance simplicity and effectiveness in distinguishing process behaviors. The queues are ordered such that the uppermost queue holds the highest priority, and priority levels decrease progressively down the structure, ensuring that higher-priority queues always preempt those below them. This fixed hierarchy implies priorities through queue position rather than assigning explicit numerical values to individual processes.[1] Upon entering the system, all processes are placed in the topmost (highest-priority) queue, regardless of their initial characteristics. This initial assignment favors interactive or short-duration tasks, such as I/O-bound processes that require quick CPU bursts. The scheduler selects the process from the head of the highest-priority non-empty queue, only proceeding to lower queues when upper ones are vacant. In this way, lower queues, like the bottommost one, execute solely when all superior queues lack ready processes.[1] A representative 3-queue structure illustrates this organization: the first queue prioritizes new or short jobs expected to complete rapidly, the second queue handles medium-duration tasks, and the third queue accommodates long-running processes that demand sustained CPU time. This setup, first conceptualized in early time-sharing systems, allows the scheduler to allocate resources based on queue level, with higher queues receiving immediate attention to minimize response times for critical workloads.[1]Process Movement and Feedback Rules
The multilevel feedback queue (MLFQ) scheduling algorithm incorporates a dynamic feedback mechanism that enables processes to move between queues based on their observed CPU usage patterns, allowing the system to adaptively prioritize short, interactive jobs while managing longer, CPU-intensive ones. This movement is governed by a set of core rules that observe process behavior over time, approximating the performance of shortest-job-first (SJF) scheduling without prior knowledge of burst lengths. Short CPU bursts keep processes in higher-priority queues for quick execution, whereas prolonged CPU usage leads to demotion, ensuring efficient resource allocation in time-sharing environments.[1] Central to this mechanism is the demotion rule: when a process exhausts its allocated time quantum at a given queue level—regardless of the number of scheduling quanta it has received—it is moved to the next lower-priority queue. This feedback penalizes CPU-bound processes by reducing their priority, preventing them from monopolizing the CPU and allowing interactive processes to progress. For instance, a process starting in the highest-priority queue that fully consumes its quantum will shift downward, reflecting its longer burst characteristics.[1] Promotion rules counterbalance demotion to ensure fairness and prevent indefinite starvation. New processes entering the system are always placed in the highest-priority queue, giving them an initial opportunity for rapid execution. Additionally, to address potential starvation of low-priority processes, a periodic boost mechanism elevates all processes back to the top queue after a fixed time period S, effectively implementing aging based on system-wide wait times. For processes that voluntarily yield the CPU—such as those blocking on I/O operations—they typically remain at their current queue level upon resumption, though some variants promote them to a higher queue to favor I/O-bound jobs. These rules collectively form a decision tree: higher-priority queues are serviced first, equal-priority processes use round-robin, and movements are triggered solely by quantum exhaustion or system timers, without considering intra-queue order.[1][1] The feedback principle underlying these movements relies on historical CPU burst observations to dynamically classify processes, favoring those with short bursts (e.g., interactive tasks) by retaining them in upper queues while sinking CPU-intensive jobs lower. This empirical approximation of SJF enhances responsiveness in mixed workloads, as demonstrated in early time-sharing systems where short jobs complete swiftly without explicit burst predictions.[1]Intra-Queue Scheduling
In multilevel feedback queue (MLFQ) scheduling, once the highest-priority non-empty queue is selected for execution, processes within that queue are managed using queue-specific algorithms designed to balance responsiveness and throughput. Higher-priority queues, intended for interactive or short jobs, typically employ round-robin (RR) scheduling with short time quanta to ensure quick response times for user-facing processes.[1] For example, a top-priority queue might use RR with a 10 ms quantum, allowing multiple processes to share the CPU in a circular manner without long waits.[1] This approach preempts a running process after its quantum expires, returning it to the end of the same queue for fair sharing among peers.[1] Lower-priority queues, which handle CPU-bound or long-running jobs, often use first-come, first-served (FCFS) or RR with longer quanta to prioritize overall system throughput over individual responsiveness. In FCFS queues, the process at the head runs to completion without intra-queue preemption, minimizing context-switch overhead for batch-oriented workloads.[12] Alternatively, RR in these queues may employ quanta of 100 ms or more, enabling some interleaving while favoring sustained execution.[1] The choice between FCFS and extended RR in lower queues depends on system goals, with FCFS common in the base (lowest) queue to ensure completion of demoted processes.[12] Queue selection follows a strict priority order: the scheduler always dispatches from the highest non-empty queue, preempting any lower-queue process if a higher-priority one becomes ready.[1] This inter-queue preemption occurs immediately upon arrival to a higher queue, suspending the current execution and switching contexts.[1] Within an RR queue, ties in readiness are resolved through standard circular queuing, where processes are served in the order they were enqueued, with each getting an equal turn up to the quantum limit.[1] In the base queue using FCFS, no such intra-queue ties arise, as execution proceeds sequentially until all processes finish.[12]Configuration and Tuning
Time Quantum Assignments
In multilevel feedback queue (MLFQ) scheduling, time quanta are assigned differently to each queue to optimize for varying process behaviors, with shorter durations in higher-priority queues to favor interactive or short-burst tasks and longer durations in lower-priority queues to accommodate CPU-intensive workloads. This variation ensures that short jobs receive quick service without excessive waiting, while long-running jobs progress gradually without monopolizing the CPU. The original MLFQ design in the Compatible Time-Sharing System (CTSS) used a fixed base quantum q, but allowed processes to run for an effective slice of $2^\ell \times q at level \ell, where levels increase downward from the highest priority, effectively doubling the allowable runtime per level to reflect growing tolerance for larger bursts.[13] The assignment of time quanta is primarily determined by expected process burst times, balancing the trade-off between system responsiveness and overhead from context switches. Quanta that are too short in higher queues (e.g., under 10 ms) lead to frequent preemptions and increased context-switching costs, which can degrade overall throughput, whereas excessively long quanta reduce interactivity for users awaiting response. In practice, higher-priority queues typically use quanta of 8-10 ms to approximate human perception limits for responsiveness, while lower queues extend to 16-32 ms or more, and the bottom queue may employ an infinite quantum under first-come, first-served (FCFS) to complete batch jobs efficiently. These choices stem from empirical tuning based on workload characteristics, as overly aggressive short quanta exacerbate switching overhead in systems with high process density.[1] Tuning quanta often involves scaling them progressively across queues, such as doubling per level in a three-queue system—e.g., 8 ms for the top queue (prioritizing quick interactive tasks), 16 ms for the middle (medium bursts), and 32 ms or infinite for the bottom (long CPU-bound jobs)—to align with anticipated burst distributions and ensure fairness. This exponential growth allows short processes to complete within the initial small quanta, while longer ones accumulate service time across levels without starvation. An approximate response time for a process demoted to level k can be estimated as the sum of effective quanta up to that level: \sum_{i=0}^{k} 2^i q \approx 2^{k+1} q - q, highlighting how early levels contribute minimally to delay for short jobs but build cumulatively for others. Demotion occurs when a process exhausts its assigned quantum in a queue, prompting re-evaluation in the next lower-priority level. In real-world implementations like Solaris, quanta range from 20 ms at the highest priority to hundreds of milliseconds at the lowest, configurable via system tables to adapt to specific hardware and load profiles.[1][14]Priority Adjustment Mechanisms
In multilevel feedback queue (MLFQ) scheduling, priority adjustment mechanisms are essential for maintaining fairness and preventing issues such as starvation, where lower-priority processes might indefinitely wait without CPU access. The primary mechanism is aging, which periodically increases the priority of processes that have been waiting excessively in lower queues. This involves monitoring the wait time of a process and promoting it to a higher-priority queue after a predefined interval, ensuring that even CPU-bound or long-running jobs eventually receive service. For instance, under Rule 5 of a standard MLFQ implementation, all jobs are elevated to the topmost queue every S time units, where S serves as a tunable parameter—often set to around 100 milliseconds or 1 second—to balance responsiveness and equity.[1] In practical systems like Solaris, aging is implemented through a starvation timer that checks process wait times at regular intervals, such as every second via ats_update function. If a process's dispatch wait time (ts_dispwait) exceeds its maximum wait threshold (ts_maxwait), its priority is boosted to a higher level, typically ts_lwait (e.g., 50 or above), with the default ts_maxwait set to 0 for most queues but extended to 32000 ticks at the lowest priority (59) to guarantee eventual promotion. This approach ensures no process starves indefinitely, as the aging rate can be parameterized—for example, promoting after 100 ms of wait—to adapt to workload characteristics. Demotion thresholds may also extend beyond single quantum exhaustion; in refined MLFQ variants, a process is only moved down after consuming a full time slice at its current level or accumulating multiple quanta, preventing premature penalization of variable burst processes.[15][1]
Additional adjustments enhance fairness in specialized MLFQ configurations. For example, tracking an estimated burst time, often denoted as τ, allows dynamic priority recalculation based on historical CPU usage, promoting short-burst interactive jobs while demoting those exceeding their τ estimate. Optional integration of lottery scheduling within queues allocates CPU shares probabilistically to low-priority processes, using tickets proportional to wait time to probabilistically boost their chances without rigid promotions. These mechanisms, while not universal, address edge cases like priority inversion and are tunable parameters in implementations to optimize for specific environments.[1]