Fact-checked by Grok 2 weeks ago

Post correspondence problem

The Post correspondence problem (PCP) is a classic undecidable in , introduced by Emil L. Post in 1946. It involves a finite collection of pairs of non-empty strings over a \{a, b\}, denoted as (g_1, g_1'), (g_2, g_2'), \dots, (g_n, g_n'), where each g_i and g_i' is a row of a's and b's. The question is whether there exists a finite sequence of indices i_1, i_2, \dots, i_m (with m \geq 1) such that the concatenation g_{i_1} g_{i_2} \cdots g_{i_m} equals g_{i_1}' g_{i_2}' \cdots g_{i_m}'. This setup can be visualized using "dominos" where each pair represents a with a top string and a bottom string, and a solution corresponds to a non-empty sequence of tiles forming matching concatenations. Post proved the undecidability of PCP in his 1946 paper "A variant of a recursively unsolvable problem" by showing a reduction from the unsolvable decision problem for normal systems over \{a, b\}. The proof involves a series of formal transformations that map instances of normal systems to PCP instances, ensuring that a solution exists if and only if the original system has a certain computable derivation. This establishes that no algorithm can determine solvability for all finite instances, placing PCP among the foundational examples of recursively unsolvable problems, alongside the halting problem. The significance of lies in its simplicity and utility as a tool for demonstrating undecidability in other domains, particularly theory and logic. For example, reductions from have been used to prove that determining in context-free grammars is undecidable, as well as undecidability of problems like whether the intersection of two context-free languages is empty. It also connects to broader computability results, such as those involving Turing machines and recursively enumerable sets. Variants of PCP include the modified PCP (MPCP), which requires any solution to start with a designated pair (often the first), facilitating reductions from problems like Turing machine acceptance. The bounded PCP, where the sequence length is restricted to at most k tiles (with k given as input), is decidable and NP-complete. Other restrictions, such as marked PCP (where paired strings differ in their first letter), can be decidable, while generalizations like the n-permutation PCP explore further computational boundaries.

Overview

Informal Description

The Post correspondence problem, introduced by mathematician Emil L. Post in , can be intuitively visualized as a matching puzzle using domino-like s. Each consists of two halves: the top half bears a non-empty of symbols from a finite containing at least two distinct symbols, while the bottom half bears another such . The objective is to arrange a finite sequence of these s—repetitions allowed, and not all s required—such that the concatenated sequence of top s exactly equals the concatenated sequence of bottom s. Given a of such tile pairs as input, the problem poses the question of whether any matching sequence exists. This setup resembles everyday string-matching games but hides profound implications for what computers can and cannot compute. As a cornerstone of , the Post correspondence problem exemplifies an , where no general can determine the existence of a solution for every possible instance.

Historical Background

The Post correspondence problem (PCP) was introduced by Emil L. Post in 1946 as part of his investigations into undecidable problems in computability theory, emerging during the 1930s–1940s era when researchers like Alan Turing and Alonzo Church were formalizing the limits of recursive functions and mechanical computation. Post, who had earlier developed tag systems in 1921 and explored normal systems in the 1940s, sought simpler variants of known unsolvable problems to illuminate the boundaries of algorithmic solvability, drawing on reductions from problems like the equivalence of normal algorithms. This work positioned PCP as a more accessible example of undecidability compared to Turing's 1936 halting problem, while building on Post's broader efforts to characterize recursively unsolvable decision problems in symbolic logic and string manipulation. The problem first appeared in Post's short paper titled "A Variant of a Recursively Unsolvable Problem," published in the Bulletin of the (volume 52, pages 264–268), where he sketched its and outlined a proof of undecidability via reduction from the unsolvability of normal systems, as established by in 1936. Although the 1946 note provided the core argument, the full detailed proof remained unpublished during Post's lifetime and appeared only in 1994 as part of his collected works, Solvability, Provability, Definability: The Collected Works of L. Post. This delay did not diminish the problem's immediate impact, as it offered a concrete, finite-instance whose undecidability highlighted the non-recursive nature of certain string-matching tasks in formal systems. Post's PCP influenced subsequent developments in , notably through Marvin Minsky's 1961 "Recursive Unsolvability of Post's Problem of '' and Other Topics in Theory of ," which established the undecidability of related systems using two-tape non-writing automata, further advancing proofs of universality in restricted computational models. Furthermore, PCP served as a key tool in establishing the undecidability of the for linear bounded automata, as demonstrated in early 1960s reductions that encoded PCP instances into space-bounded computations, underscoring its versatility in proving limits for restricted variants.

Formal Definition

Standard Formulation

The Post correspondence problem (PCP), introduced by Emil Post in , is a in . In its standard formulation, an instance consists of a finite collection of pairs of strings over a finite \Sigma with |\Sigma| \geq 2. Specifically, given a positive k and non-empty strings u_i, v_i \in \Sigma^+ for i = 1, \dots, k, the problem asks whether there exists a finite sequence of indices i_1, i_2, \dots, i_m (with m \geq 1 and each i_j \in \{1, \dots, k\}, allowing repetitions) such that the u_{i_1} u_{i_2} \cdots u_{i_m} = v_{i_1} v_{i_2} \cdots v_{i_m}. This setup is often visualized using "tiles," where each pair (u_i, v_i) is represented as a domino-like tile with u_i on the top and v_i on the bottom, denoted as \frac{u_i}{v_i}. A solution corresponds to a finite sequence of such s placed end-to-end, where the concatenated top strings match the concatenated bottom strings exactly. The requirement that all strings u_i and v_i are non-empty ensures no trivial empty concatenations satisfy the equation, preventing degenerate cases. The alphabet \Sigma must contain at least two distinct symbols to guarantee the problem's undecidability, as proven by Post via reduction from the unsolvability of the decision problem for normal systems over a binary alphabet; with a unary alphabet (|\Sigma| = 1), the problem becomes decidable. In Post's original presentation, the alphabet was binary (\Sigma = \{a, b\}) and strings were explicitly non-null, but the generalization to any |\Sigma| \geq 2 preserves the core structure and undecidability. The decision question is thus: Does a solution sequence exist for the given instance?

Alternative Formulations

The Post correspondence problem can be reformulated using morphisms to emphasize its . Consider an A = \{a_1, \dots, a_k\} corresponding to the tile indices, and an alphabet \Sigma for the strings. Define two morphisms \phi, \psi : A^* \to \Sigma^* by setting \phi(a_i) = u_i and \psi(a_i) = v_i, where (u_i, v_i) is the i-th in the standard formulation. The problem then asks whether there exists a non-empty word w \in A^* such that \phi(w) = \psi(w). This equivalent view, known as the equalizer problem for morphisms, connects PCP to the study of free monoids and their homomorphisms. An alternative graph-theoretic formulation represents the problem as follows: construct a with a single and k self-loops, one for each , where the i-th is labeled with u_i on the "top" and v_i on the "bottom." A corresponds to a non-empty (sequence of ) such that the of the top labels equals the concatenation of the bottom labels. This model highlights the combinatorial search aspect of PCP and facilitates reductions to other graph problems in theoretical analyses. While solutions to PCP instances may extend infinitely—particularly if there exists a with u_i = v_i, allowing arbitrary repetitions—the specifically concerns the existence of finite solutions. The undecidability result applies to the finite case, as infinite extensions do not affect the core question of whether a match exists using finitely many tiles. Emil Post's original presentation focused on finite sequences of corresponding strings, establishing this emphasis.

Examples

Solvable Instance

A solvable instance of the Post Correspondence Problem () can be illustrated with a small set of tiles over the {0,1}, demonstrating how a sequence of tile indices leads to matching concatenations on the top and bottom strings. Consider the following four tiles, where each tile is represented as a pair (u_i, v_i) with u_i as the top string and v_i as the bottom string:
IndexTop (u_i)Bottom (v_i)
1110
2010
301001
4111
One is the sequence of indices 1, 2, 1, 3, 3, 4. Concatenating the top strings gives: 1 + 0 + 1 + 010 + 010 + 11 = 10101001011. Similarly, concatenating the bottom strings yields: 10 + 10 + 10 + 01 + 01 + 1 = 10101001011. Since the resulting strings are identical, this sequence solves the instance. This example highlights how repetitions of tiles (indices 1, 3 used multiple times) enable the lengths and contents to balance out, forming the required match. Such instances are verifiable by direct computation, contrasting with the general undecidability of .

Unsolvable Instance

A classic example of an unsolvable instance of the Post correspondence problem uses two tiles over the binary alphabet \{0, 1\}: tile A with top string "0" and bottom string "01", and tile B with top string "100" and bottom string "001". For a single tile, neither matches: tile A produces top "0" (length 1) and bottom "01" (length 2), while tile B produces top "100" (length 3) and bottom "001" (length 3), but the strings differ ("100" ≠ "001"). Sequences of multiple tiles also fail to match. Any sequence starting with tile B mismatches immediately, as the top begins with "1" while the bottom begins with "0". Sequences starting with tile A lead to persistent imbalances: the bottom string is always longer or has more 1's than the top. Specifically, tile A adds zero 1's to the top but one 1 to the bottom, while tile B adds one 1 to each; thus, after the initial tile A (which creates a +1 difference in 1's on the bottom), every addition of tile A increases this difference by 1, and tile B preserves it, ensuring the number of 1's on the bottom always exceeds that on the top by at least 1, preventing equality. This instance illustrates the of the problem, where even small sets of tiles require careful of lengths and counts to confirm unsolvability, in contrast to solvable instances that permit balancing through specific sequences.

Undecidability

Proof Overview

The undecidability of the Post correspondence problem () is demonstrated through a reduction from the (or equivalently, the acceptance problem for s), which is known to be undecidable. This reduction shows that an algorithm capable of solving PCP could be used to determine whether a given halts on a specified input, leading to a contradiction. The construction encodes the computation of the as a sequence of PCP tiles, where a valid correspondence exists the machine halts (or accepts) in finitely many steps. The key theorem states that is undecidable for any with at least two symbols. This result was originally proved by Emil Post in 1946, who reduced the unsolvable problem of whether a normal system over \{a, b\} a word beginning with a followed by an equal number of a's and b's to . The proof involves encoding the productions and s of the normal system into PCP tiles such that a solution corresponds to a valid in the normal system. Subsequent proofs reduce directly from the acceptance problem by first mapping to a modified PCP (requiring solutions to begin with a specific ) and then to standard using additional marker tiles. In outline, the proof structures the tiles to represent the initial machine configuration, each transition rule as a pair of strings that align successive instantaneous descriptions (separated by markers), and halting conditions via matching tiles that signal without further extension. If the machine halts, a finite sequence of tiles produces a ; otherwise, any attempt at matching either mismatches or requires an infinite sequence, which is impossible for a solution. This simulation ensures the undecidability carries over from the to PCP.

Reduction Technique

The undecidability of the Post correspondence problem () is established via reductions from undecidable problems in equivalent models of , such as Turing machines or Post normal systems. One common approach reduces the acceptance problem for Turing machines to PCP, often via an intermediate modified PCP (MPCP) that requires solutions to start with a designated . This facilitates encoding the deterministic computation path, where tiles simulate configurations and transitions, ensuring a match only if the machine accepts the input. Post's original 1946 proof used a reduction from the halting problem for normal systems, a restricted form of production systems. An instance of a normal system—consisting of an alphabet, productions of the form gP \to Ph where g is a fixed symbol and P, h strings—is mapped to PCP tiles over an extended alphabet. The tiles encode the leftmost g and productions to simulate derivations, with a solution existing if and only if the normal system derives a word of the form a a^* b^* with equal numbers of a's and b's. This preserves undecidability as normal systems are Turing-complete. Another technique reduces from the halting problem for tag systems, a special case of normal systems introduced by . Tag systems consist of an , deletion number v \geq 1, and production rules mapping each to a string; starts with an initial string and repeatedly deletes the first v symbols (using only the first for production) and appends the corresponding string until the length is less than v. The reduction constructs tiles to simulate finite halting s: initial tiles set the starting configuration, production tiles apply rules by consuming the leading and appending the production (using markers to track positions and ensure alignment), copy tiles propagate unchanged symbols, and halt tiles match only terminal configurations. A solution exists if and only if the tag system halts. Marvin Minsky's result showed that 2-tag systems with are , allowing undecidability proofs with small instances.

Variants

Bounded Variant

The bounded variant of the Post correspondence problem, also known as the bounded PCP (BPCP), considers an instance consisting of a of , each specified by a pair of strings (u_i, v_i) over a \Sigma for i = 1 to n, along with a positive k. The question is whether there exists a non-empty of at most k indices i_1, \dots, i_m (with $1 \leq m \leq k and repetitions allowed) such that the u_{i_1} \cdots u_{i_m} = v_{i_1} \cdots v_{i_m}. This variant is decidable, in contrast to the standard unbounded PCP. It belongs to the complexity class NP because a nondeterministic polynomial-time verifier can guess a sequence of at most k indices and check the string equality by concatenating the corresponding strings, which takes time polynomial in the input size (as the total length of the guessed concatenation is at most k times the maximum string length in the instance). BPCP is NP-hard, establishing its NP-completeness, via a from known NP-complete problems such as the problem, where the dominoes encode subsets to ensure a matching sequence corresponds to a valid cover without exceeding the bound k. The problem remains NP-complete even over a alphabet when the input lengths are encoded in (compact form), highlighting its hardness independent of alphabet size. In practice, BPCP instances with small values of k (relative to n) can be solved exhaustively by enumerating all possible sequences of length up to k, which requires time O(n^k) but becomes infeasible for large k due to the .

Other Extensions

The (PCP) admits several extensions that generalize its formulation while preserving undecidability in many cases. One such variant is the PCP, where each tile consists of a vector of strings (one for each component or track), and a solution requires the concatenations to match exactly in every component simultaneously. This multi-component setup captures parallel computations or multi-track simulations, and the problem remains undecidable over alphabets with at least two symbols, as it encompasses the standard PCP as a single-component case. Another extension is the one-sided PCP, in which the top strings are fixed—often to an identity-like encoding, such as single symbols representing tile indices—and the task reduces to determining whether a sequence of bottom strings concatenates to match this fixed top sequence. This variant, akin to the modified PCP (MPCP) used in undecidability proofs, is still undecidable in general, even over binary alphabets, because it allows encoding of Turing machine computations where the top tracks the sequence of steps and the bottom simulates the tape contents. The periodic PCP introduces constraints on solutions by seeking sequences that are periodic (repeating a fixed ) or imposing limits on repetitions in the index sequence. While the existence of any is undecidable, restricting to periodic solutions relates to the dual PCP, which asks whether there exist distinct s g and h such that g(\alpha) = h(\alpha) for a given word \alpha, with at least one morphism being non-periodic to avoid trivial cases. The dual variant is decidable for certain marked or immersive morphisms but undecidable in broader settings, highlighting connections to periodicity in word equalizers. The has profound relations to other undecidable problems, serving as a key tool in reductions. It proves the undecidability of for context-free s: given a PCP instance, one constructs a whose derivations correspond to potential matches, making the ambiguous if and only if a solution exists. Similarly, PCP reduces to the mortality problem in matrix semigroups, determining whether the arises from products of given integer matrices; this holds undecidable even for semigroups generated by two 48×48 matrices, with the reduction encoding tile matches via matrix multiplications that "mortalize" only on successful correspondences. Post-2000 developments have deepened PCP's ties to on words, exploring equalizers of s and pattern avoidance in solutions. For instance, the simultaneous PCP—requiring a common word in the equalizers of multiple pairs—is undecidable over monoids but decidable for immersions of groups, with algorithms computing bases for solution sets. These advances also address short-word instances, showing undecidability when image lengths are bounded by 2, and extend to non-commutative structures like groups, where PCP solvability informs word problem complexities.

References

  1. [1]
    A VARIANT OF A RECURSIVELY UNSOLVABLE PROBLEM
    A VARIANT OF A RECURSIVELY UNSOLVABLE PROBLEM. EMIL L. POST. By a string on a, 6 we mean a row of a's and 6's such as baabbbab. It may involve only a, or 6 ...
  2. [2]
    [PDF] 6.8. The Post Correspondence Problem - CIS UPenn
    The Post correspondence problem (due to Emil Post) is another undecid- able problem that turns out to be a very helpful tool for proving problems in logic or ...
  3. [3]
    [PDF] The Post Correspondence Problem
    Jun 1, 2021 · 15. 1 Introduction. First introduced by Emil Post in 1946, the Post Correspondence Problem (𝑃𝐶𝑃) is a. well-known example of an undecidable ...
  4. [4]
    Intractability - Algorithms, 4th Edition
    Bounded Post Correspondence Problem. Given a post correspondence problem ... If Y is NP-complete and X is in NP then X is NP-complete. If X is NP ...
  5. [5]
    [PDF] Decidability and Undecidability of Marked PCP - CWI
    Marked PCP, where words differ in the first letter, is decidable. However, 2-marked PCP, where words differ in the first two letters, is undecidable.
  6. [6]
    [PDF] Lecture 26: Post's Correspondence Problem and Tilings
    Apr 24, 2008 · Post's Correspondence Problem (PCP) is the problem of deciding whether a set of dominos has a match or not. The modified Post's Correspondence ...
  7. [7]
    None
    ### Summary of the Post Correspondence Problem (PCP) Definition and Undecidability
  8. [8]
    from Normal Systems to the Correspondence Decision Problem
    In 1946, Emil Leon Post (Bulletin of Amer. Math. Soc. 52 (1946), 264-268) introduced his famouscorrespondence decision problem, nowadays known as the Post ...
  9. [9]
    [PDF] Emil Post and His Anticipation of Gödel and Turing
    Oct 3, 2024 · The correspondence problem can be viewed as a problem about free semigroups, and in 1947, Post showed the unsolvability of an even more ...
  10. [10]
    [PDF] Recursive Unsolvability of Post's Problem of "Tag" and other Topics ...
    May 14, 2007 · It proved possible to adapt the Post system to yield a representation in the. "Tag" form as described in 92 below. 1. Two-tape non-writing ...
  11. [11]
    [PDF] 6.8 The Post Correspondence Problem - UPenn CIS
    Theorem 6.8.1 (Emil Post, 1946) The Post correspondence problem is undecidable, provided that the alphabet Σ has at least two symbols.
  12. [12]
    [PDF] On Some Modifications and Applications of the Post ... - UTUPub
    Another equivalent formulation of the problem is by (monoid) morphisms: given two morphisms ℎ, 𝑔 : 𝐴* → 𝐴* does there exist a non-empty word 𝑤 ∈ 𝐴* such that ℎ ...
  13. [13]
    [PDF] introduction to the theory of computation, second edition
    5.18 Show that the Post Correspondence Problem is undecidable over the binary alpha- bet Z = {o,1}. 5.19 In the silly Post Correspondence Problem, SPCP, in ...
  14. [14]
    [PDF] Post's Correspondence Problem - Stanford InfoLab
    ✦ Example: if 10 and 011 is the first pair, also add to the PCP instance the pair. *1*0* and *0*1*1. Finally, since the strings from the first list will.
  15. [15]
    [PDF] pcp.pdf
    Example: PCP – (3). ◇Suppose we add a third pair, so the instance becomes: 1 = (0, 01); 2 = (100, 001); 3 = (110, 10). ◇Now 1,3 is a solution; both strings are.
  16. [16]
  17. [17]
    The Post Correspondence Problem and equalisers for certain ... - arXiv
    Feb 18, 2020 · The Post Correspondence Problem and equalisers for certain free group and monoid morphisms. A marked free monoid morphism is a morphism for ...Missing: equivalent | Show results with:equivalent
  18. [18]
    LNCS 7907 - On the Dual Post Correspondence Problem
    The Dual Post Correspondence Problem is of particular interest for the re- ... only periodic solutions. The following proposition provides a tool for demon ...
  19. [19]
    [PDF] Chapter 8 The Post Correspondence Problem; Applications to ...
    An in- stance of the Post Correspondence problem (for short,. PCP) is given by two sequences U = (u1,...,um) and. V = (v1,...,vm), of strings ui,vi ∈ Σ∗. 551 ...Missing: alternative morphism graph equations
  20. [20]
    Mortality in Matrix Semigroups - jstor
    Theorem 6. The mortality problem is undecidable for the semigroups generated by two integer matrices of dimension 24. Still, 24 is an impressive dimension for ...
  21. [21]
    [PDF] The Post Correspondence Problem and Equalisers for Certain Free ...
    A marked free monoid morphism is a morphism for which the image of each generator starts ... and Post Correspondence Problem, 2008. URL: http://www-eupm ...
  22. [22]
    Post Correspondence Problem for short words - ScienceDirect
    We prove that the generalized Post Correspondence Problem is undecidable for instances where the lengths of the image words are at most 2 and the number of ...
  23. [23]
    Post's Correspondence Problem for hyperbolic and virtually ...
    Sep 9, 2023 · Post's Correspondence Problem (the PCP) is a classical decision problem in theoretical computer science that asks whether for pairs of free ...