Interpolation search
Interpolation search is an algorithm for locating a target value in a sorted array, particularly effective for uniformly distributed data, where it estimates the probable position of the target by linearly interpolating between the values at the current search interval's endpoints.[1] Introduced by W. W. Peterson in 1957 as a method for addressing records in random-access storage, the algorithm iteratively narrows the search range by calculating the next probe position using the formula pos = low + \left( \frac{(x - array[low]) \times (high - low)}{array[high] - array[low]} \right), where low and high are the interval bounds and x is the target value.[1] This approach mimics human intuition, such as searching a telephone directory by estimating a name's position based on alphabetical distribution.[2]
In 1978, Yehoshua Perl, Alon Itai, and Haim Avni provided a theoretical analysis demonstrating that, under the assumption of uniform key distribution, interpolation search requires an average of \log \log N probes for an array of size N, outperforming binary search's O(\log N) in asymptotic terms for large, uniformly sorted datasets.[2] However, its worst-case performance degrades to O(N) on non-uniform or adversarially constructed data, and practical implementations often incur higher computational overhead per probe compared to binary search, making it less favorable in general-purpose scenarios despite its theoretical efficiency.[3] Subsequent research has addressed these limitations through optimized variants and adaptations for dynamic or non-uniform data, enhancing its applicability in various search scenarios.[3]
Overview
Definition
Interpolation search is a searching algorithm designed to locate a specific key within a sorted array of records ordered by numerical key values. Unlike simpler methods, it employs linear interpolation between the minimum and maximum key values in the current search interval to estimate and probe a more likely position for the target key, particularly effective under the assumption of a uniform distribution of keys across the array. This approach allows the algorithm to adaptively narrow down the search space based on the relative value of the target key, potentially requiring fewer comparisons than fixed-interval probing techniques.[2]
The algorithm presupposes two key prerequisites: the array must be strictly sorted in ascending or descending order by the numerical keys, and the keys themselves must be comparable numerical values suitable for arithmetic interpolation. Without these conditions—such as in unsorted data or non-numerical keys—the interpolation estimation would fail to produce meaningful probe positions, rendering the method inapplicable. These requirements ensure that the proportional positioning derived from the key values accurately reflects the array's structure.[1]
The primary motivation for interpolation search stems from its ability to outperform binary search in scenarios where key values provide additional distributional information, enabling guesses closer to the actual position of the target rather than always bisecting the interval. This mirrors the intuitive process humans use when searching a physical dictionary, estimating a word's location based on its alphabetical proximity to known endpoints rather than mechanically flipping to the middle. As a result, it offers a more value-informed strategy for probing in uniformly distributed sorted data.[4]
Interpolation search was first described by W. W. Peterson in 1957 as a technique for efficient addressing in random-access storage systems.[1]
History
The interpolation search algorithm was first proposed by W. W. Peterson in his 1957 paper "Addressing for Random-Access Storage," published in the IBM Journal of Research and Development.[5] In this work, Peterson introduced the core idea of estimating the position of a target key in a sorted array using linear interpolation between the endpoints, motivated by the need to optimize searches in random-access storage devices like early magnetic drums and disks.[5] By the late 1970s, the algorithm's potential was further highlighted in analyses of ordered file searches, emphasizing its utility under assumptions of uniform key distribution.
Key theoretical advancements came in 1978 with Y. Perl, A. Itai, and H. Avni's analysis, which proved an average-case time complexity of O(log log n) for uniformly distributed data, solidifying interpolation search's theoretical efficiency over binary search in ideal scenarios.[2] Practical considerations were explored in subsequent decades, including a 2021 study by A. S. Mohammed, Ş. E. Amrahov, and F. V. Çelebi, which proposed a hybrid interpolated binary search algorithm and evaluated its implementations on ordered datasets with varying distributions, demonstrating improved performance in real-world computing environments.[6]
Recent developments as of 2025 have focused on dynamic variants and novel reinterpretations, such as the Search-o-Sort framework proposed by Anurag Dutta et al., which integrates searching and sorting operations to handle arbitrary data more flexibly while building on interpolation principles.[7] These extensions address limitations in static arrays by enabling updates and non-uniform adaptations, reflecting ongoing interest in refining the algorithm for modern data structures.
Algorithm Description
Basic Procedure
Interpolation search operates on a sorted array of elements, typically numerical, to locate a target value by iteratively narrowing the search interval based on an estimated position within the current bounds. The algorithm begins with initialization of the search boundaries: the low index is set to the first element of the array (usually 0), and the high index is set to the last element (n-1, where n is the array length).[3]
The search proceeds in a loop as long as the low index is less than or equal to the high index, ensuring the target could potentially lie within the current interval. In each iteration, first check if the target is less than array[low] or greater than array[high]; if so, terminate the search indicating the target is not present. Otherwise, if array[low] equals array[high], check if the target equals this value: if yes, return low; else, terminate indicating not found. Then, calculate an estimated position using interpolation based on the values at the low and high bounds (detailed in the position estimation formula section). The value at this probed position is then compared to the target.[3][1]
If the probed value equals the target, the algorithm terminates successfully by returning the position. Otherwise, the bounds are adjusted: if the target is less than the probed value, the high index is updated to one less than the probed position to search the left subinterval; if greater, the low index is set to one greater than the probed position to search the right subinterval. This process repeats, progressively reducing the search space until the target is found or the interval collapses (low > high), in which case the target is not present and -1 or a similar indicator is returned.[3][1]
To illustrate, consider a sorted array A = [10, 20, 30, 40, 50] (indices 0 to 4) searching for the target 35. Initialize low = 0 and high = 4. Since 10 ≤ 35 ≤ 50, proceed. The first probe estimates position 2 (A[8] = 30), which is less than 35, so update low to 3. Now with low = 3 and high = 4, check 35 < A[9]=40, so terminate without finding the target.[3]
In interpolation search, the position estimation step calculates a probable index for the target value within the current search interval by assuming a linear relationship between array indices and values. This approach treats the sorted array segment from index \low to \high as approximating a straight line passing through the points (\low, \array[\low]) and (\high, \array[\high]), where \array[\low] \leq \target \leq \array[\high]. The estimated position \pos is derived by solving for the x-coordinate on this line where the y-value equals \target, yielding the formula:
\pos = \low + \frac{ (\target - \array[\low]) \times (\high - \low) }{ \array[\high] - \array[\low] }
[10]
The derivation follows from the line equation y = mx + c, where the slope m = \frac{ \array[\high] - \array[\low] }{ \high - \low } and intercept c = \array[\low] - m \low; substituting y = \target and solving for the index gives the proportional position. This linear interpolation estimates where the target fits proportionally along the value range, potentially probing closer to the actual location under uniform distribution assumptions.[10]
The range check prior to computation (target between array[low] and array[high]) ensures the formula's assumption holds and prevents invalid positions; the equal values check (handled in the basic procedure) avoids division by zero.[10]
For illustration, consider searching for target 45 in the sorted array [10, 20, 30, 40, 50, 60, 70] with initial \low = 0, \high = 6. Since 10 ≤ 45 ≤ 70, apply the formula: \pos = 0 + \frac{ (45 - 10) \times (6 - 0) }{ 70 - 10 } = \frac{35 \times 6}{60} = 3.5, which is typically floored to index 3 (value 40). Since 40 < 45, the new interval becomes \low = 4, \high = 6, and the process repeats.[10]
Comparison with Other Searches
Versus Binary Search
Binary search operates by repeatedly dividing the search interval in half, always probing the midpoint of the current range, which guarantees an O(log n) time complexity independent of the distribution of keys in the sorted array.[2]
In contrast, interpolation search adaptively estimates the probe position based on the target key's value relative to the minimum and maximum keys in the interval, using linear interpolation to predict a more precise location, which can result in fewer probes than binary search when the data meets certain conditions.[2]
Interpolation search excels on uniformly distributed datasets, such as arrays containing timestamps or evenly spaced integers, where it reduces the average number of probes from O(log n) in binary search to O(log log n).[2]
Binary search, however, outperforms interpolation search on non-uniform distributions or small arrays, where the arithmetic computations for position estimation in interpolation add significant overhead that diminishes or reverses its advantages.[3] Empirical evaluations confirm that optimized variants of interpolation search can achieve up to 4 times the speed of binary search on large uniform in-memory arrays due to reduced memory accesses, but binary search remains more robust overall for general cases.[3]
Versus Linear Search
Linear search, also known as sequential search, examines each element in a sorted or unsorted array sequentially from the beginning until the target key is located or the end of the array is reached, resulting in a time complexity of O(n) for both the average and worst cases in an array of n elements.[11]
In comparison, interpolation search exploits the assumption of uniform key distribution within a sorted array to estimate the target's position more intelligently, achieving an average-case time complexity of O(log log n), which provides substantial efficiency gains over linear search for large datasets.[2]
However, linear search remains suitable for very small arrays due to its minimal implementation overhead and the negligible performance difference in such cases, as well as for unsorted data where interpolation search cannot be applied without prior sorting.[12][11]
Historically, interpolation search emerged as an enhancement to the straightforward but slow linear search by incorporating key values and statistical distribution assumptions for position estimation, thereby bridging the gap to more balanced, distribution-agnostic techniques like binary search.[2]
Time Complexity
The time complexity of interpolation search varies depending on the distribution of keys in the sorted array. In the best case, the algorithm achieves O(1) performance if the target key is located at the position estimated by the first probe, such as when the key is at a low index matching the interpolation formula.[3]
Under the assumption of a uniform distribution of keys, the average-case time complexity is O(\log \log n), where n is the number of elements in the array. This improvement over binary search's O(\log n) arises because each probe leverages the key values to estimate the target's position more precisely than a fixed midpoint, reducing the search space more rapidly on average.[2]
In the worst case, interpolation search degenerates to O(n) time complexity, occurring when the keys follow a skewed distribution, such as exponential growth, causing probes to consistently fall near one end of the interval and mimic a linear scan.[3]
A sketch of the average-case analysis for uniform distributions involves bounding the expected error after each probe. The error at step j is at most n / 2^j, leading to an expected number of probes T satisfying E(T) \leq \log_2 \log_2 n + O(1), derived using supermartingale theory and optional sampling to show the process terminates after roughly \log \log n steps.[2]
The practical time complexity is influenced by the cost of computing probe positions, which involves arithmetic operations like divisions and multiplications, potentially outweighing the benefit of fewer memory accesses compared to binary search in hardware where such computations are expensive.[3]
Space Complexity and Practical Considerations
Interpolation search requires only a constant amount of auxiliary space, typically O(1), as it operates directly on the input array using a few variables such as the lower and upper bounds of the search interval (low and high) and the estimated position (pos).[3]
In practice, the position estimation formula involves arithmetic operations that can introduce floating-point precision issues, particularly with large array sizes or values near the limits of representable integers, leading to potential off-by-one errors in the probe position. To mitigate this, implementations often employ integer or fixed-point arithmetic alternatives, which avoid floating-point conversions and maintain exactness while tolerating minor precision loss since probes are speculative.[3]
The algorithm is most suitable for searching large, static, uniformly distributed sorted arrays, such as database indices or in-memory lookup tables, where its adaptive probing can yield significant speedups over binary search—up to 4x on uniform datasets in benchmarks. It should be avoided for dynamic datasets requiring frequent insertions or deletions, or for small arrays (under a few thousand elements), where the overhead of calculations exceeds the benefits compared to simpler linear search.[3]
On modern hardware as of 2025, the performance advantages of interpolation search can diminish for very large arrays if cache misses induced by non-sequential probes outweigh the speed of arithmetic computations, especially on systems with large but latency-sensitive caches. Additionally, the algorithm is not inherently suitable for parallelization in single-key searches due to its sequential dependency on probe outcomes, though modifications enable parallelism for multi-key queries across threads.[3]
Assumptions and Adaptations
The uniform distribution requirement for interpolation search stipulates that the keys in the sorted array are evenly spaced or follow a linear probability distribution, such as when generated from uniform random numbers across the range bounded by the array's minimum and maximum values. This assumption posits that the expected position of any key is proportional to its value relative to the endpoints, enabling the algorithm to make informed guesses about probe locations.[2]
The requirement is fundamental because the core position estimation formula derives the probe index by interpolating linearly between the array bounds, relying on the uniformity to ensure that the fraction (k - \min)/( \max - \min) accurately reflects the key's relative position in the distribution. Violations of this assumption cause the formula to produce skewed estimates, concentrating probes in non-representative areas and potentially degrading the search to linear time complexity, O(n), in pathological cases.[2]
To verify the uniform distribution assumption prior to applying interpolation search, practitioners can employ statistical goodness-of-fit tests, such as the chi-squared test, on the spacings between sorted keys or on binned frequency counts of key values to check for deviations from expected uniformity.[13]
Non-uniform distributions lead to biased probing, where the algorithm expends more steps in regions of key clustering, elevating the average case probe count beyond the ideal logarithmic bounds.[2]
In his original formulation, W. W. Peterson introduced interpolation search in 1957 under the explicit assumption of uniform key distribution to optimize record addressing in random-access storage environments, where such regularity was deemed plausible for practical files.
Interpolation search assumes a uniform distribution of keys for optimal performance, but real-world datasets often exhibit skewness, such as Zipf-like distributions in word frequencies or name lists. To address this, adaptations modify the position estimation to incorporate distribution knowledge or fallback mechanisms, maintaining efficiency without requiring full resorting.
One approach involves frequency-based estimation, where the interpolation formula is adjusted using the cumulative distribution function (CDF) or empirical probabilities derived from the data. For instance, in datasets following Zipf's law—common in natural language processing tasks like searching word lists—the estimation leverages higher probabilities for frequent items to bias probes toward denser regions. The Three-Point Interpolation (TIP) algorithm exemplifies this by using three data points (left, middle, right) to compute a non-linear position estimate via a quadratic fit, given by:
\text{expected} = x_1 + \frac{y_1 (x_1 - x_2)(x_1 - x_0)(y_2 - y_0)}{y_2 (x_1 - x_2)(y_0 - y_1) + y_0 (x_1 - x_0)(y_1 - y_2)}
where x_i are indices and y_i = V[x_i] - y^* with y^* the target value; this yields 2-3x speedups over binary search on skewed datasets like Wikipedia word frequencies.
In non-numeric contexts, such as alphabet-ordered tables, arithmetic interpolation search adapts the method for non-uniform key distributions by estimating positions based on letter or prefix frequencies rather than pure value arithmetic. For a phone book, where names starting with common letters like 'S' or 'J' cluster more densely, the algorithm precomputes frequency tables to scale the interpolation proportionally, avoiding the linear assumption and reducing probes in sparse regions. This maintains near-logarithmic performance despite skewness, though it requires O(n) preprocessing for frequency computation.
Hybrid approaches further mitigate risks by integrating binary search as a safeguard. The Interpolated Binary Search (IBS) algorithm, for example, performs an initial interpolation estimate, then refines the interval using a binary midpoint adjusted toward the estimate—computed as \text{Mid} = (\text{Inter} + \text{right})/2—effectively switching to binary-like halving after the probe. This ensures O(log n) worst-case time even on highly skewed data, while achieving O(log log n) average-case on uniform inputs, with experiments showing 2.31 iterations on average versus 4.73 for pure binary search on small non-uniform arrays.[6]
These adaptations introduce trade-offs, including higher per-iteration computation from multi-point fits or frequency lookups, and potential O(n) setup costs for probability tables, which may outweigh benefits in fully dynamic or memory-constrained environments. Nonetheless, they enable robust deployment in applications with known distributional biases, such as database indexing or text retrieval.
Variants
Dynamic Interpolation Search
The standard interpolation search algorithm operates on static, sorted arrays and does not efficiently support insertions or deletions, as these operations would require rebuilding the entire structure to maintain the assumptions of uniform distribution for optimal performance. Dynamic variants address this limitation by integrating interpolation-based probing into tree structures, enabling updates while preserving the sub-logarithmic search efficiency of the original algorithm. These extensions approximate the key distribution at each level of the tree, allowing searches to skip large portions of the data similar to static interpolation search.[14]
A prominent example is the Interpolation Search Tree (IST), first presented in 1985 and published in 1993 by Kurt Mehlhorn and Athanasios Tsakalidis, which organizes elements into a multiway tree where each node stores a subfile in an array and uses interpolation search to navigate to the appropriate child subtree. The tree's degree varies based on the size of the subfile, with representatives sampled to estimate the inverse cumulative distribution function, enabling probes that align with the data's statistical properties. Unlike strictly balanced trees, the IST relies on lazy rebuilding of subtrees to handle insertions and deletions without immediate global reorganization.[14][15]
Upon insertion or deletion, elements are initially marked or added to leaf-level bins, and subtrees are rebuilt only when the number of changes exceeds a threshold (typically a fraction of the subtree size, such as W/4 where W is the window size), ensuring the approximation of uniformity remains valid across levels. This rebalancing preserves the probabilistic guarantees of interpolation search by redistributing elements to maintain smooth density estimates in subintervals. The process avoids full tree traversal for updates by localizing changes to affected paths.[14]
The IST achieves expected search time of O(\log \log n) for random inputs following smooth probability distributions, with worst-case search bounded by O((\log n)^2); insertions and deletions have amortized cost O(\log n), and expected amortized O(\log \log n) under similar conditions. A 2006 refinement extends these bounds to hold with high probability for a broader class of distributions, including power-law and binomial, while supporting O(1) worst-case updates when positions are known. A 2020 revisit (Kaporis et al.) further improves the structure by achieving O(\log \log n) search time with high probability for a broader class of distributions, including those with duplicates, while maintaining O(1) worst-case updates when positions are known.[14][16][17] Space usage remains linear at \Theta(n).
These dynamic structures find application in in-memory databases for secondary indexing of frequently updated datasets, such as alphabetic keys in dictionaries or priority queues requiring fast predecessor and successor queries in constant time. They also support sequential access in linear time, making them suitable for scenarios where sub-logarithmic search outweighs the overhead of occasional rebalancing.[14][16]
Interpolation-Sequential Search
The interpolation-sequential search is a hybrid algorithm that performs an initial probe using interpolation to estimate the position of the target key in a sorted array, followed by a linear (sequential) scan in the vicinity of that probe to locate the exact item if the initial guess misses.[18] This approach leverages the efficiency of interpolation for the first step while relying on sequential search for refinement, making it suitable for static, sorted arrays where exact positioning is uncertain.[3]
Key benefits include robustness to moderate non-uniformity in data distributions, as the bounded sequential phase prevents the repeated probing failures that can degrade pure interpolation search to linear time in adversarial cases.[18] In uniform distributions, it achieves an average time complexity of O(\sqrt{n}) due to the expected interpolation error leading to a sequential scan of that order, avoiding the full O(n) worst-case behavior of standard interpolation while maintaining predictability.[3]
This variant is particularly useful for small to medium-sized arrays, where the overhead of sequential scanning remains low, or as a fallback mechanism in adaptive search systems that switch from interpolation upon detecting distribution irregularities.[3]
The algorithm was first formally described and analyzed in the late 1970s, building on earlier table lookup techniques, with proposals emphasizing its role in reducing access costs in scenarios like external storage where initial probes minimize I/O operations for non-ideal distributions.[18] Compared to pure interpolation search, it requires more probes overall due to the linear phase but provides guaranteed bounds that prevent pathological slowdowns, trading logarithmic efficiency for reliability in practice.[3]
Applications and Examples
Book-Based Searching
Interpolation search draws a direct analogy from the manual process of locating a word in a physical dictionary or a name in a telephone directory. When searching for a term like "Smith," a person does not flip randomly or check every page sequentially; instead, they estimate the approximate location by considering the alphabetical position relative to the full range from "A" to "Z." For instance, since "S" is the 19th letter, one might calculate it to be roughly three-quarters through the book (19/26 ≈ 0.73) and open near that proportion of the pages to begin scanning locally.[19][20]
In adapting this to textual data, the search keys—such as letters or words—are mapped to numeric values using character codes (e.g., ASCII or Unicode ordinals) to enable the interpolation formula. However, letter frequencies are non-uniform, with vowels like "E" appearing far more often than rare consonants like "Z," which can skew the uniform distribution assumption of the basic algorithm. To mitigate this, precomputed skip tables or sectional indexes (analogous to thumb tabs in physical dictionaries dividing the book into letter groups) allow jumping to probable regions more efficiently, refining the estimate iteratively.[21][19]
This approach mirrors manual searching techniques used in printed directories from the mid-20th century, as first conceptualized in early addressing methods for storage systems.[1] In modern contexts, the principle underlies position estimation in e-book readers for sorted text indexes and persists in legacy file systems or database indexing where data uniformity holds approximately.[3]
Real-World Use Cases
In database indexing, interpolation search is employed to accelerate lookups in sorted structures, particularly when keys follow a uniform distribution, such as timestamps or sequential IDs in large logs. By estimating probe positions based on value ranges, it minimizes the number of comparisons and disk seeks compared to binary search, which is beneficial for on-disk storage in SQL query processing. For instance, within B-tree nodes, interpolation search can reduce cache faults to as few as one per operation under ideal conditions, enhancing overall query performance in systems handling skewed or uniform data.[22][3]
Research has shown that integrating interpolation search into implementations like LevelDB, a key-value store used in production systems such as Google Chrome and Android, can yield up to 38% improvement in random read benchmarks for a 9 GB database and 7% for a 35 GB one, with search times accelerating by factors of 1.46x to 3x depending on data uniformity. Similarly, research adaptations to libraries supporting database-like operations, such as NumPy's searchsorted function (leveraged by Pandas for data analysis), have demonstrated speedups of up to 4.8x on uniform datasets and 3x on non-uniform ones using variants of interpolation search, highlighting its potential role in efficient in-memory indexing.[3]
Historically, interpolation search originated in early computing for random-access storage systems. In 1957, W. W. Peterson described the method in the context of IBM's random-access file systems, where it enabled efficient retrieval of records from ordered tapes or drums by estimating positions based on key distributions, addressing the limitations of sequential scanning in emerging direct-access hardware. This foundational application laid the groundwork for its use in modern storage hierarchies.[1]
Despite these advantages, interpolation search is rarely deployed standalone in large-scale systems due to its sensitivity to non-uniform distributions, where performance can degrade to linear time. In big data environments, B-trees remain the preferred indexing structure for their balance of insert/delete efficiency and logarithmic search guarantees, often hybridizing interpolation within nodes for uniform subsets while falling back to binary search otherwise. As of 2025, its niche persists in specialized in-memory or embedded scenarios with predictable data patterns, but broad adoption is tempered by the dominance of robust tree-based indexes in distributed databases.[22]
Implementation
Pseudocode
The basic interpolation search algorithm operates on a sorted array of numeric values in ascending order, estimating the position of the target value based on linear interpolation assuming a uniform distribution. The following pseudocode implements the core procedure, handling edge cases such as an empty array, target values outside the array bounds, and convergence to a single element (low equals high). It returns the index of the target if found, or -1 otherwise. This representation follows the foundational description of the algorithm.[2]
function interpolationSearch(sortedArray, target):
// Assume sortedArray is sorted in ascending order with numeric elements
n = length(sortedArray)
if n == 0:
return -1 // Empty array: target not found
low = 0
high = n - 1
// Early exit if target is outside array bounds
if target < sortedArray[low] or target > sortedArray[high]:
return -1
while low <= high:
// Handle case where search space is a single element
if low == high:
if sortedArray[low] == target:
return low
else:
return -1
// Estimate position using interpolation formula
// pos = low + floor( (target - sortedArray[low]) * (high - low) / (sortedArray[high] - sortedArray[low]) )
diff = sortedArray[high] - sortedArray[low]
if diff == 0:
// All elements equal in current range: perform linear scan or return -1 if no match
for i from low to high:
if sortedArray[i] == target:
return i
return -1
pos = low + floor( (target - sortedArray[low]) * (high - low) / diff )
// Check if estimated position is out of current bounds (degenerate case)
if pos < low or pos > high:
return -1
if sortedArray[pos] == target:
return pos // Target found at estimated position
// Adjust search space based on comparison
if sortedArray[pos] < target:
low = pos + 1 // Search right subarray
else:
high = pos - 1 // Search left subarray
return -1 // Target not found after exhausting search space
function interpolationSearch(sortedArray, target):
// Assume sortedArray is sorted in ascending order with numeric elements
n = length(sortedArray)
if n == 0:
return -1 // Empty array: target not found
low = 0
high = n - 1
// Early exit if target is outside array bounds
if target < sortedArray[low] or target > sortedArray[high]:
return -1
while low <= high:
// Handle case where search space is a single element
if low == high:
if sortedArray[low] == target:
return low
else:
return -1
// Estimate position using interpolation formula
// pos = low + floor( (target - sortedArray[low]) * (high - low) / (sortedArray[high] - sortedArray[low]) )
diff = sortedArray[high] - sortedArray[low]
if diff == 0:
// All elements equal in current range: perform linear scan or return -1 if no match
for i from low to high:
if sortedArray[i] == target:
return i
return -1
pos = low + floor( (target - sortedArray[low]) * (high - low) / diff )
// Check if estimated position is out of current bounds (degenerate case)
if pos < low or pos > high:
return -1
if sortedArray[pos] == target:
return pos // Target found at estimated position
// Adjust search space based on comparison
if sortedArray[pos] < target:
low = pos + 1 // Search right subarray
else:
high = pos - 1 // Search left subarray
return -1 // Target not found after exhausting search space
This pseudocode incorporates safeguards for division by zero (when all elements in the range are equal) by falling back to a linear check, though in practice, the uniform distribution assumption minimizes such occurrences.[2]
Sample Code
To illustrate the practical application of interpolation search, the following provides an iterative implementation in C++ for a sorted array of integers. This version uses integer arithmetic with long long intermediates to compute the estimated position, preventing overflow while avoiding floating-point operations for exact precision in integer domains. It includes bounds checking to ensure the position stays within the array limits and handles the case of equal boundary values to avoid division by zero. If the key is not found after the search concludes, the function returns -1.[2]
cpp
#include <iostream>
int interpolation_search(const [int](/page/INT)* arr, [int](/page/INT) n, [int](/page/INT) x) {
[int](/page/INT) low = 0;
[int](/page/INT) high = n - 1;
while (low <= high) {
// Handle equal boundary values to avoid division by zero
if (arr[low] == arr[high]) {
if (arr[low] == x) {
return low;
} else {
return -1;
}
}
// Integer arithmetic for position estimation
long long numerator = static_cast<long long>(x - arr[low]) * (high - low);
long long denominator = arr[high] - arr[low];
[int](/page/INT) pos = low + static_cast<int>(numerator / denominator);
// Bounds checking
if (pos < low) pos = low;
if (pos > high) pos = high;
if (arr[pos] == x) {
return pos;
} else if (arr[pos] < x) {
low = pos + 1;
} else {
high = pos - 1;
}
}
return -1;
}
#include <iostream>
int interpolation_search(const [int](/page/INT)* arr, [int](/page/INT) n, [int](/page/INT) x) {
[int](/page/INT) low = 0;
[int](/page/INT) high = n - 1;
while (low <= high) {
// Handle equal boundary values to avoid division by zero
if (arr[low] == arr[high]) {
if (arr[low] == x) {
return low;
} else {
return -1;
}
}
// Integer arithmetic for position estimation
long long numerator = static_cast<long long>(x - arr[low]) * (high - low);
long long denominator = arr[high] - arr[low];
[int](/page/INT) pos = low + static_cast<int>(numerator / denominator);
// Bounds checking
if (pos < low) pos = low;
if (pos > high) pos = high;
if (arr[pos] == x) {
return pos;
} else if (arr[pos] < x) {
low = pos + 1;
} else {
high = pos - 1;
}
}
return -1;
}
The code can be tested with a simple main function demonstrating both successful and unsuccessful searches on a uniformly distributed sorted array:
cpp
int main() {
int arr[] = {10, 20, 30, 40, 50};
int n = sizeof(arr) / sizeof(arr[0]);
// Search for existing element (30)
int index = interpolation_search(arr, n, 30);
std::cout << "Search for 30: " << (index != -1 ? "Found at index " + std::to_string(index) : "Not found") << std::endl;
// Search for non-existing element (25)
index = interpolation_search(arr, n, 25);
std::cout << "Search for 25: " << (index != -1 ? "Found at index " + std::to_string(index) : "Not found") << std::endl;
return 0;
}
int main() {
int arr[] = {10, 20, 30, 40, 50};
int n = sizeof(arr) / sizeof(arr[0]);
// Search for existing element (30)
int index = interpolation_search(arr, n, 30);
std::cout << "Search for 30: " << (index != -1 ? "Found at index " + std::to_string(index) : "Not found") << std::endl;
// Search for non-existing element (25)
index = interpolation_search(arr, n, 25);
std::cout << "Search for 25: " << (index != -1 ? "Found at index " + std::to_string(index) : "Not found") << std::endl;
return 0;
}
This implementation optimizes for integer arrays by using size_t-compatible sizing via int n (suitable for typical array bounds) and long long for calculations to maintain precision without floating-point conversions. To compile and run, save the code to a file (e.g., interpolation_search.cpp) and use a C++ compiler such as g++: g++ -std=c++11 interpolation_search.cpp -o interp_search && ./interp_search. The expected output is:
Search for 30: Found at index 2
Search for 25: Not found
Search for 30: Found at index 2
Search for 25: Not found
These examples confirm the function's behavior: it locates the key at the correct index when present and correctly reports absence otherwise, assuming the input array is sorted and uniformly distributed as required by the algorithm.[2]