Post correspondence problem
The Post correspondence problem (PCP) is a classic undecidable decision problem in computability theory, introduced by Emil L. Post in 1946.[1] It involves a finite collection of pairs of non-empty strings over a binary alphabet \{a, b\}, denoted as (g_1, g_1'), (g_2, g_2'), \dots, (g_n, g_n'), where each g_i and g_i' is a row of a's and b's.[1] The question is whether there exists a finite sequence of indices i_1, i_2, \dots, i_m (with m \geq 1) such that the concatenation g_{i_1} g_{i_2} \cdots g_{i_m} equals g_{i_1}' g_{i_2}' \cdots g_{i_m}'.[1] This setup can be visualized using "dominos" where each pair represents a tile with a top string and a bottom string, and a solution corresponds to a non-empty sequence of tiles forming matching concatenations.[2] Post proved the undecidability of PCP in his 1946 paper "A variant of a recursively unsolvable problem" by showing a reduction from the unsolvable decision problem for normal systems over \{a, b\}.[1] The proof involves a series of formal transformations that map instances of normal systems to PCP instances, ensuring that a solution exists if and only if the original system has a certain computable derivation.[1] This establishes that no algorithm can determine solvability for all finite instances, placing PCP among the foundational examples of recursively unsolvable problems, alongside the halting problem.[2] The significance of PCP lies in its simplicity and utility as a reduction tool for demonstrating undecidability in other domains, particularly formal language theory and logic.[2] For example, reductions from PCP have been used to prove that determining ambiguity in context-free grammars is undecidable, as well as undecidability of problems like whether the intersection of two context-free languages is empty.[2] It also connects to broader computability results, such as those involving Turing machines and recursively enumerable sets.[2] Variants of PCP include the modified PCP (MPCP), which requires any solution to start with a designated pair (often the first), facilitating reductions from problems like Turing machine acceptance.[3] The bounded PCP, where the sequence length is restricted to at most k tiles (with k given as input), is decidable and NP-complete.[4] Other restrictions, such as marked PCP (where paired strings differ in their first letter), can be decidable, while generalizations like the n-permutation PCP explore further computational boundaries.[5]Overview
Informal Description
The Post correspondence problem, introduced by mathematician Emil L. Post in 1946, can be intuitively visualized as a matching puzzle using domino-like tiles. Each tile consists of two halves: the top half bears a non-empty string of symbols from a finite alphabet containing at least two distinct symbols, while the bottom half bears another such string. The objective is to arrange a finite sequence of these tiles—repetitions allowed, and not all tiles required—such that the concatenated sequence of top strings exactly equals the concatenated sequence of bottom strings.[1][2] Given a finite set of such tile pairs as input, the problem poses the question of whether any matching sequence exists. This setup resembles everyday string-matching games but hides profound implications for what computers can and cannot compute.[6] As a cornerstone of theoretical computer science, the Post correspondence problem exemplifies an undecidable decision problem, where no general algorithm can determine the existence of a solution for every possible instance.[1]Historical Background
The Post correspondence problem (PCP) was introduced by Emil L. Post in 1946 as part of his investigations into undecidable problems in computability theory, emerging during the 1930s–1940s era when researchers like Alan Turing and Alonzo Church were formalizing the limits of recursive functions and mechanical computation.[7] Post, who had earlier developed tag systems in 1921 and explored normal systems in the 1940s, sought simpler variants of known unsolvable problems to illuminate the boundaries of algorithmic solvability, drawing on reductions from problems like the equivalence of normal algorithms.[8] This work positioned PCP as a more accessible example of undecidability compared to Turing's 1936 halting problem, while building on Post's broader efforts to characterize recursively unsolvable decision problems in symbolic logic and string manipulation.[9] The problem first appeared in Post's short paper titled "A Variant of a Recursively Unsolvable Problem," published in the Bulletin of the American Mathematical Society (volume 52, pages 264–268), where he sketched its formulation and outlined a proof of undecidability via reduction from the unsolvability of normal systems, as established by Church in 1936.[7] Although the 1946 note provided the core argument, the full detailed proof remained unpublished during Post's lifetime and appeared only in 1994 as part of his collected works, Solvability, Provability, Definability: The Collected Works of Emil L. Post.[10] This delay did not diminish the problem's immediate impact, as it offered a concrete, finite-instance decision problem whose undecidability highlighted the non-recursive nature of certain string-matching tasks in formal systems. Post's PCP influenced subsequent developments in computability, notably through Marvin Minsky's 1961 paper "Recursive Unsolvability of Post's Problem of 'Tag' and Other Topics in Theory of Turing Machines," which established the undecidability of related tag systems using two-tape non-writing automata, further advancing proofs of universality in restricted computational models. Furthermore, PCP served as a key tool in establishing the undecidability of the halting problem for linear bounded automata, as demonstrated in early 1960s reductions that encoded PCP instances into space-bounded computations, underscoring its versatility in proving limits for restricted Turing machine variants.[11]Formal Definition
Standard Formulation
The Post correspondence problem (PCP), introduced by Emil Post in 1946, is a decision problem in computability theory.[1] In its standard formulation, an instance consists of a finite collection of pairs of strings over a finite alphabet \Sigma with |\Sigma| \geq 2. Specifically, given a positive integer k and non-empty strings u_i, v_i \in \Sigma^+ for i = 1, \dots, k, the problem asks whether there exists a finite sequence of indices i_1, i_2, \dots, i_m (with m \geq 1 and each i_j \in \{1, \dots, k\}, allowing repetitions) such that the concatenation u_{i_1} u_{i_2} \cdots u_{i_m} = v_{i_1} v_{i_2} \cdots v_{i_m}.[1] This setup is often visualized using "tiles," where each pair (u_i, v_i) is represented as a domino-like tile with u_i on the top and v_i on the bottom, denoted as \frac{u_i}{v_i}. A solution corresponds to a finite sequence of such tiles placed end-to-end, where the concatenated top strings match the concatenated bottom strings exactly. The requirement that all strings u_i and v_i are non-empty ensures no trivial empty concatenations satisfy the equation, preventing degenerate cases.[1] The alphabet \Sigma must contain at least two distinct symbols to guarantee the problem's undecidability, as proven by Post via reduction from the unsolvability of the decision problem for normal systems over a binary alphabet; with a unary alphabet (|\Sigma| = 1), the problem becomes decidable.[1] In Post's original presentation, the alphabet was binary (\Sigma = \{a, b\}) and strings were explicitly non-null, but the generalization to any |\Sigma| \geq 2 preserves the core structure and undecidability.[1] The decision question is thus: Does a solution sequence exist for the given instance?Alternative Formulations
The Post correspondence problem can be reformulated using monoid morphisms to emphasize its algebraic structure. Consider an alphabet A = \{a_1, \dots, a_k\} corresponding to the tile indices, and an alphabet \Sigma for the strings. Define two morphisms \phi, \psi : A^* \to \Sigma^* by setting \phi(a_i) = u_i and \psi(a_i) = v_i, where (u_i, v_i) is the i-th tile in the standard formulation. The problem then asks whether there exists a non-empty word w \in A^* such that \phi(w) = \psi(w). This equivalent view, known as the equalizer problem for morphisms, connects PCP to the study of free monoids and their homomorphisms.[12] An alternative graph-theoretic formulation represents the problem as follows: construct a directed graph with a single vertex and k self-loops, one for each tile, where the i-th edge is labeled with u_i on the "top" and v_i on the "bottom." A solution corresponds to a non-empty path (sequence of edges) such that the concatenation of the top labels equals the concatenation of the bottom labels. This model highlights the combinatorial search aspect of PCP and facilitates reductions to other graph problems in theoretical analyses.[13] While solutions to PCP instances may extend infinitely—particularly if there exists a tile with u_i = v_i, allowing arbitrary repetitions—the decision problem specifically concerns the existence of finite solutions. The undecidability result applies to the finite case, as infinite extensions do not affect the core question of whether a match exists using finitely many tiles. Emil Post's original presentation focused on finite sequences of corresponding strings, establishing this emphasis.[1]Examples
Solvable Instance
A solvable instance of the Post Correspondence Problem (PCP) can be illustrated with a small set of tiles over the alphabet {0,1}, demonstrating how a sequence of tile indices leads to matching concatenations on the top and bottom strings. Consider the following four tiles, where each tile is represented as a pair (u_i, v_i) with u_i as the top string and v_i as the bottom string:| Index | Top (u_i) | Bottom (v_i) |
|---|---|---|
| 1 | 1 | 10 |
| 2 | 0 | 10 |
| 3 | 010 | 01 |
| 4 | 11 | 1 |