Fact-checked by Grok 2 weeks ago

Expression problem

The Expression Problem is a fundamental challenge in programming language design concerning the extensibility of statically typed data abstractions, where one seeks to define a datatype by cases and add both new cases to the datatype and new functions over it without recompiling or modifying existing code, while preserving static type safety such as avoiding casts. Coined by Philip Wadler in an unpublished 1998 paper circulated via email, it serves as a benchmark for evaluating how well a language supports modular extensions to both data representations and behaviors. At its core, the problem illustrates a tension between functional and object-oriented paradigms: in functional languages, datatypes are typically defined via algebraic data types with fixed cases but extensible functions through , whereas object-oriented languages define extensible subclasses (new cases) but fixed methods (functions) per class, requiring mechanisms like the to add new operations. A classic example involves an initial datatype for arithmetic expressions with cases like constants and an ; extensions might add sum operations as new cases or pretty-printing as new functions, each demanding independent . This duality underscores broader issues in software extensibility, influencing designs in languages supporting generics, traits, or modular type systems. Since its introduction, the Expression Problem has inspired numerous solutions across paradigms, including object algebras in functional languages like and for composable extensions, generics in Java and C# for type-safe additions, and lightweight modular staging techniques in C++ for staged interpretations. These approaches often simplicity for full extensibility in both directions, with ongoing research exploring hybrids like open classes or platform-aware programming to address real-world modularity needs.

Definition and Motivation

Core Definition

The expression problem arises in the context of statically typed programming languages, where data abstraction is achieved through mechanisms like algebraic data types or class hierarchies to encapsulate representations and behaviors while ensuring compile-time type safety. This problem highlights tensions in extending such abstractions modularly without compromising type guarantees or requiring modifications to unrelated code. At its core, the expression problem refers to the challenge of extending a data type system along two orthogonal axes: adding new data representations, such as variants or cases to existing types, and adding new operations or functions that operate on those types. Representation extensibility allows for the introduction of novel cases without altering prior type definitions, while behavior extensibility enables the definition of additional functions over established types, all while preserving static type safety, modularity, and the ability to avoid recompilation of existing modules. As formally articulated by , the expression problem is "a new name for an old problem": the goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static (e.g., no casts). This formulation underscores the difficulty of achieving dual extensibility in a way that maintains the integrity of the across independent extensions.

Importance in Language Design

The expression problem serves as a for evaluating the extensibility and of programming language paradigms, particularly in statically typed systems, by revealing fundamental tensions between closed-world assumptions—such as those embodied in sealed classes or exhaustive case analysis—and open-world assumptions, like extensible interfaces or . In closed-world models, the complete set of data variants is fixed at , enabling optimizations like but restricting independent extensions without recompilation or type alterations. Conversely, open-world models support incremental additions but often at the cost of reduced or performance, as they cannot assume completeness of cases. This dichotomy underscores how language type systems balance predictability against flexibility in design. Addressing the expression problem is crucial for modularity, as unresolved extensibility barriers result in brittle codebases where adding new data representations or behaviors requires pervasive modifications, leading to maintenance challenges and scalability issues. Such limitations particularly impede the development of domain-specific languages (DSLs), where independent extensions for new syntax or semantics are essential, as well as plugin architectures and library evolution, which rely on composable abstractions without tight coupling. In practice, this forces developers into anticipatory designs that over-engineer for unforeseen needs, reducing the adaptability of software systems over time. The expression problem sharply critiques the inherent asymmetries in dominant paradigms: (OOP) facilitates adding new operations to existing types through or interfaces but struggles to incorporate new types without refactoring hierarchies or violating encapsulation. In contrast, (FP) excels at extending types via algebraic data types or sum types but complicates adding operations, as it typically requires updating pattern-matching constructs across dispersed modules, potentially breaking existing implementations. These trade-offs highlight how OOP prioritizes behavioral extension while FP emphasizes structural growth, often at the expense of the opposite dimension. Beyond critiques, the expression problem has profoundly shaped language features and practices, inspiring advancements like enhanced algebraic data types for controlled extensibility in and traits for decoupling types from operations in hybrid systems such as . It also reinforces the open-closed principle in , advocating for designs open to extension yet closed to modification through mechanisms like separate compilation and type-safe dispatch.

Historical Development

Origins and Early Ideas

The conceptual foundations of the expression problem trace back to early explorations in data abstraction during the 1970s and 1980s, where programming language designers grappled with the challenges of structuring and extending data types in interactive and modular systems. In procedural paradigms, such as those in languages like ALGOL or early C, extensibility was often achieved through ad-hoc modifications to global structures or functions, lacking systematic support for independent evolution of data representations and operations. Similarly, modular approaches in languages like Modula-2 emphasized encapsulation but treated extensions as language-specific mechanisms, such as separate compilation units, without addressing the tension between adding new data variants and new behaviors. These paradigms highlighted the need for disciplined abstraction but revealed inherent biases toward either centralizing data or decentralizing procedures, setting the stage for more formal inquiries into dual extensibility. A pivotal early contribution came from John C. Reynolds in 1975, who examined user-defined types and procedural data structures as complementary methods for data in interactive languages. Reynolds argued that user-defined types, which centralize representation within a single definition (e.g., a set as a of constructors like empty, full, or limited), allow efficient extensions to the type's internal structure by modifying only the definition, while keeping operations consistent. Conversely, procedural data structures decentralize representation across functions (e.g., sets as predicates over integers), enabling novel, distributed implementations but restricting primitive operations to single-item access, thus complicating multi-argument functions like union. This duality underscored the trade-offs in extensibility: centralized types favored operational uniformity at the expense of representational flexibility, while procedural approaches promoted representational variety but hindered operational integration. Reynolds' analysis, rooted in prior work on functional , illustrated how traditional type disciplines enforced levels but struggled to balance both axes of extension without unified mechanisms. Building on these ideas, William R. Cook in 1990 contrasted (OOP), which he termed procedural data abstraction, with abstract data types (ADTs) to expose limitations in and interfaces for extensible designs. In OOP, as exemplified by Smalltalk's collections, facilitates adding new operations (e.g., deriving a length method from a base list class) through decentralized method dispatch, but encapsulation barriers prevent optimizations across representations, such as efficient binary operations on mixed types like intervals and streams. ADTs, typical in functional languages like ML, hide representations behind type definitions, making it straightforward to add new operations but requiring global updates to all functions when introducing new constructors (e.g., extending lists to include intervals demands revising head and tail for every case). Cook's examination of collection hierarchies revealed how OOP's subclassing often leads to inconsistent behaviors, such as methods being overridden or deleted, while ADTs impose rigidity on data evolution, demonstrating the asymmetry in supporting data versus behavioral extensions. These pre-1990s developments from procedural, modular, early , and functional paradigms influenced the recognition of extensibility as a core challenge in language design, where traditional type systems inherently favored extension along one dimension—either data types or operations—over the other, often resulting in brittle or ad-hoc solutions. This insight into the dual nature of the problem would later be crystallized and named by in 1998.

Naming and Formalization

The term "Expression Problem" was coined by in a 1998 post to the java-genericity mailing list, where he described it as "a new name for an old problem" in the context of language design challenges encountered during discussions at the ECOOP '98 conference. Inspired by talks on bridging functional and object-oriented paradigms, Wadler formalized the problem as the difficulty of extending a datatype defined by cases—such as adding new cases to the datatype or new functions operating over it—without recompiling existing code and while preserving static . These ECOOP '98 discussions involved key figures including Shriram Krishnamurthi, Matthias Felleisen, and Daniel P. Friedman, who explored synthesizing functional programming's case-based extensibility with object-oriented inheritance through mixin-based approaches implemented in DrScheme, a precursor to the Racket programming language. Wadler's post outlined an early prototype solution using Generic Java (GJ) with parametric types and virtual types, presenting Lisp-inspired pseudocode for expression datatypes like additions and multiplications alongside visitor patterns to demonstrate the extensibility barriers. Concurrently, Krishnamurthi advanced prototypes for extensible visitors, enabling modular additions to expression evaluators without altering core class hierarchies, as detailed in their collaborative work on promoting reuse across paradigms. A significant formalization milestone came in 2012 with William R. Cook's ECOOP paper, which revisited the Expression Problem through the lens of versus , proposing object algebras as a practical in languages like and C# to achieve bidirectional extensibility without recompilation. Cook framed the problem as a tension between closed-world (adding behaviors to existing types) and open-world (adding types to existing behaviors), building on Wadler's definition to emphasize its implications for modular in mainstream object-oriented languages.

Illustrating the Problem

Classic Arithmetic Expression Example

The Expression Problem is commonly illustrated through a basic arithmetic expression system that supports literals (constant numbers) and as its core constructs. This base system can be defined using an , where expressions are represented abstractly as either a literal value, such as Lit 5 for the 5, or an of two subexpressions, such as Add (Lit 1) (Lit 2) for the expression 1 + 2. A fundamental on these expressions is , which computes the numerical result by recursively traversing the structure. In , this can be expressed using a case-based dispatch:
[function](/page/Function) eval(expression):
    case expression of
        Lit(n) → return n
        Add(left, right) → return eval(left) + eval(right)
For instance, applying eval to Add (Lit 1) (Lit 2) yields 3, as it evaluates the literals and sums them. This setup establishes a self-contained foundation for handling simple arithmetic but anticipates growth, such as incorporating multiplication for expressions like 1 + 2 * 3 or adding a pretty-printing operation to render the expression as a string.

Demonstrating Extensibility Barriers

To illustrate the extensibility barriers inherent in the Expression Problem, consider extending the base arithmetic expression datatype from the classic example, which typically includes literals and addition operations along with an evaluation function. Adding a new representation, such as a multiplication operation denoted as Mult (Lit 2) (Lit 3), requires modifying the datatype definition to include a new case, for instance, data Exp = Lit Int | Add Exp Exp | Mult Exp Exp. However, this change necessitates updating the existing evaluation function to handle the new case, such as through pattern matching:
haskell
eval :: Exp -> Int
eval (Lit n) = n
eval (Add x y) = eval x + eval y
eval (Mult x y) = eval x * eval y  -- New case added
If the resides in a separate , this modification breaks , requiring recompilation of dependent code or direct alteration of the original implementation. Similarly, introducing a new behavior, such as pretty-printing expressions to produce readable strings, demands defining a function that dispatches on all existing datatype cases. For the base expressions, this might appear as:
haskell
print :: Exp -> String
print (Lit n) = show n
print (Add x y) = "(" ++ print x ++ " + " ++ print y ++ ")"
This requires exhaustive coverage of the datatype's cases, and any prior clients using the original datatype must be recompiled to incorporate the new , or the datatype itself must be altered to support the via an , propagating changes across modules. The dual extension—adding both a new representation like and a new behavior like pretty-printing simultaneously—exacerbates the issue, as the new must account for the new case, and vice versa, forcing modifications to the core datatype and all associated in violation of the open-closed principle, which advocates extending behavior without altering existing code. For instance, after adding Mult, the pretty-printing would need an additional clause:
haskell
print (Mult x y) = "(" ++ print x ++ " * " ++ print y ++ ")"
This interdependence ensures that independent extensions cannot proceed without coordinated changes to the original codebase. In statically typed languages, these barriers are compounded by constraints, where dispatch to new cases or operations cannot occur without mechanisms like dynamic casting or , which introduce errors, , or loss of compile-time guarantees. For example, attempting to evaluate or print an unrecognized at may result in exceptions or incomplete coverage, undermining the language's .

Proposed Solutions

Object-Oriented Approaches

Object-oriented approaches to the expression problem primarily rely on the , which leverages to separate the addition of new behaviors from the existing data representations, allowing new operations to be defined without altering the original . The pattern, formalized in the seminal work on , defines an acceptor interface in the base expression class and a interface with type-specific methods for each operation, enabling polymorphic dispatch based on both the object type and the visitor type. This structure facilitates easy extension for new operations, such as or pretty-printing, by implementing a new visitor class that traverses the expression tree and applies the desired logic at each node. In a typical implementation using languages like Java or C#, the expression hierarchy starts with an abstract Exp class or interface featuring subclasses such as Lit for literals and Add for binary addition, each overriding an accept method to invoke the appropriate visitor method. For instance, to evaluate an expression, an IEvalVisitor interface declares methods like visitLit(Lit exp) returning the literal's value and visitAdd(Add exp) recursively evaluating and summing the operands; the eval operation is then performed by calling exp.accept(new EvalVisitor()) on the root node. Adding a new operation, such as XML serialization, requires only a new visitor implementing the interface with tailored logic for each expression type, preserving the open-closed principle for behaviors while keeping the data types closed. However, the visitor pattern exhibits limitations when extending the data types themselves, as introducing a new expression variant like Mult for necessitates updating the with a new visitMult and modifying all existing implementations to handle it, leading to widespread code changes. Closed class hierarchies, enforced by features like sealed classes in C# or final classes in , further hinder the addition of new representations without violating encapsulation or resorting to , which William Cook critiques as diverging from traditional inheritance-based object-oriented paradigms in favor of more flexible but less integrated mechanisms. Attempts to mitigate these issues, such as extensible visitors using additional dispatch layers, introduce significant complexity and runtime overhead, underscoring the pattern's asymmetry in favoring operation extension over type extension.

Functional and Type-Class-Based Solutions

In languages such as and , the expression problem is often addressed using algebraic data types (ADTs), which facilitate straightforward extension of the data representation but impose costs on function updates. An ADT for arithmetic expressions might be defined as data Exp = Lit [Int](/page/INT) | Add Exp Exp, where Lit represents a literal integer and Add a binary addition; adding a new constructor like Mult Exp Exp for multiplication requires only modifying the type definition, preserving through exhaustive in functions like evaluation or pretty-printing. However, extending functionality—such as introducing a new operation like —necessitates updating every existing function to handle all constructors via , which can propagate changes across the codebase and hinder . Haskell's type class system offers a partial solution by inverting this dynamic, enabling easy addition of new behaviors without altering the data types while still requiring updates for new type variants. To achieve extensibility in both directions, techniques like data types à la carte define separate types for each variant (e.g., newtype Lit = Lit Int and data Add e = Add e e), allowing type class instances for each, such as instance Eval Lit where eval (Lit i) = VInt i and instance Eval (Add e) where eval (Add e1 e2) = ... (assuming e supports Eval); variants are then composed modularly using coproducts and functors to form the full expression type. This allows new operations (e.g., a Pretty class for printing) to be added by defining a new class with instances for existing types, avoiding modifications to the individual variant definitions. Conversely, introducing a new expression type demands providing instances for all relevant classes, ensuring completeness but potentially leading to repetitive boilerplate. This approach, refined in techniques like data types à la carte, composes expression variants modularly using type classes and functors, balancing extensibility in both directions for interpreters. In dynamically typed functional languages like (and its descendant Racket), extensible variants can be implemented using structs combined with , which provide a form of higher-order class composition for incremental extension without rigid hierarchies. Mixins function as class-to-class transformations that add or override methods, allowing independent extensions to be composed modularly; for instance, a base expression struct can be augmented with evaluation behavior via a , and further extended with printing via another, all without recompiling core definitions. This design, introduced in early work on Scheme's object system, supports and reuse in expression-like domains, though it trades static guarantees for flexibility in untyped contexts. These functional approaches excel in type safety and compositional purity, particularly for representation extension in ADTs and behavior addition via type classes, but face trade-offs in scalability: the need to maintain exhaustive instances or matchings can result in code duplication or "expression explosion"—an proliferation of variant-specific definitions—in large, evolving systems where numerous operations interact with many constructors.

Hybrid and Modern Techniques

Hybrid techniques for addressing the expression problem integrate elements from object-oriented and functional paradigms to achieve greater extensibility, allowing both new data variants and operations to be added independently without modifying existing code. Object algebras exemplify this approach by leveraging generic interfaces to define algebraic signatures for expressions, enabling modular interpretations that separate syntax from semantics. In object algebras, an abstract factory interface, such as ExpAlg[T], specifies the constructors for expression components, including methods like lit(int n): T for literal values and add(T left, T right): T for addition. Clients implement this interface to provide specific behaviors; for instance, an evaluation algebra might return an Int by implementing lit to return the integer value and add to perform arithmetic summation. Separately, a representation algebra, such as one building abstract syntax trees, constructs Tree nodes via the same interface. This design ensures that expressions are built uniformly while allowing multiple algebras to interpret the same syntax differently, thus solving the expression problem through parametric polymorphism and subtyping. Implementations in languages supporting interfaces and generics, like or 8 and later, use traits or interfaces for ExpAlg[T]. For example, in :
scala
[trait](/page/Trait) ExpAlg[T] {
  def lit(n: [Int](/page/INT)): T
  def add(l: T, r: T): T
}

// Evaluation algebra
class EvalAlg extends ExpAlg[Int] {
  def lit(n: [Int](/page/INT)): Int = n
  def add(l: Int, r: Int): Int = l + r
}

// Tree algebra
class TreeAlg extends ExpAlg[Tree] {
  def lit(n: [Int](/page/INT)): Tree = Lit(n)
  def add(l: Tree, r: Tree): Tree = Add(l, r)
}
Expressions are then constructed using a chosen , such as val onePlusTwo = alg.add(alg.lit(1), alg.lit(2)), and interpreted by invoking the appropriate instance. This facilitates independent extensions: new operations (e.g., pretty-printing) can be added via a new implementation, while new (e.g., ) extend the interface without altering clients. In , a language, the expression problem is addressed using enums for data (e.g., enum AstNode { [Integer](/page/Integer)(i64), Add([Box](/page/Box)<AstNode>, [Box](/page/Box)<AstNode>) }), for operations (e.g., trait [Stringify](/page/String) { fn stringify(&self) -> [String](/page/String); }), and generics with bounds to enable extensibility. For example, a can operate on a type Node: [Stringify](/page/String) + Dump to support multiple operations without hard-coding. objects ([Box](/page/Box)<dyn [Trait](/page/Trait)>) allow , but solutions often require a "broker" enum type and From implementations for composition across crates. While this provides static and modularity, limitations include boilerplate for crossing domains (e.g., enum to object) and the need for workarounds like the unstable Unsize for full . As of 2025, 's approaches are discussed for their balance of performance and extensibility in safe . Other hybrid approaches include procedural embeddings using closures to encapsulate behaviors, where expression constructors return functions that defer evaluation until supplied with an interpreter, blending imperative flexibility with functional composition. Algebraic effects in languages like Koka and offer another synergy, treating operations as effectful computations handled modularly, though primarily for rather than direct data extensibility. Modern developments extend these hybrids to scalable domain-specific languages (DSLs) and architectures, where object algebras enable modular DSL design by composing independent language fragments. For instance, in DSLs for descriptions or questionnaires, algebras allow plugins to add syntax and semantics without global recompilation, enhancing in large systems. Post-2012 advancements, however, highlight critiques of added conceptual overhead in complex scenarios, such as composing multiple algebras for intricate DSLs, potentially increasing boilerplate compared to pure paradigms. Generalized variants of the expression problem, as explored in recent analyses, reveal its manifestations in everyday programming beyond datatypes, such as extensions, prompting robust solutions emphasizing type-safe .

References

  1. [1]
    Expression Problem Philip Wadler
    The two phases of the problem are clumped into two classes, LangF and Lang2F, each of which defines several mutually recursive inner classes and interfaces. In ...
  2. [2]
    (PDF) The Expression Problem Revisited - ResearchGate
    Aug 7, 2025 · This paper investigates the solution space within the framework of the soon-to-be mainstream generic extensions of C# and the Java programming language.
  3. [3]
    Solving the Expression Problem in C++, á la LMS - ResearchGate
    Our solution is a C++ transliteration of how Lightweight Modular Staging solves the Expression Problem. It, furthermore, gives a C++ encoding to object algebras ...
  4. [4]
    The Expression Problem Revisited Four new solutions using generics
    Aug 7, 2025 · Open classes provide a solution to the extensibility problem of object-oriented programming languages, allowing the modular addition of both ...
  5. [5]
    The expression problem in platform-aware programming
    Sep 19, 2025 · We restate a fundamental problem in software extensibility called The Expression Problem for platform-aware programming to show how a solution ...
  6. [6]
    [PDF] Independently Extensible Solutions to the Expression Problem
    Independently Extensible Solutions to the Expression Problem. Matthias Zenger. Martin Odersky. École Polytechnique Fédérale de Lausanne. 1015 Lausanne ...
  7. [7]
    Compositional Programming - ACM Digital Library
    ... Expression Problem (EP), il ... We present a language design, called CP, which is proved to be type-safe ...
  8. [8]
    [PDF] Creating and using domain-specific language features
    This enables a solution to a form of the “expression problem”; if one ex- tension defines new attributes for host language productions and another adds new ...
  9. [9]
    Towards virtual traits in Scala - ACM Digital Library
    Virtual traits are class-valued object attributes and can be redefined within subtraits. ... "The expression problem, Scandinavian style". In: ON MECHANISMS FOR ...
  10. [10]
    Nested Family Polymorphism with Extensible Variant Types
    Apr 29, 2024 · The Expression Problem epitomizes the difficulty [Wadler et al. 1998] ... A language design should allow a derived family to add new ...<|control11|><|separator|>
  11. [11]
    [PDF] User-Defined Types and Procedural Data Structures as ...
    Various notions of procedural (or functional) data structures have been developed [Reynolds 70*; Landin 65; Balzer 67*]. In this approach, the abstract form of ...
  12. [12]
    [PDF] Object-Oriented Programming Versus Abstract Data Types
    In 1975, John Reynolds published a paper called “User-defined data types and procedural data struc-.Missing: Discipline | Show results with:Discipline
  13. [13]
  14. [14]
    [PDF] Extensibility for the Masses - UT Computer Science
    Abstract. This paper presents a new solution to the expression problem. (EP) that works in OO languages with simple generics (including Java or C#).
  15. [15]
    Extensibility for the Masses - SpringerLink
    This paper presents a new solution to the expression problem (EP) that works in OO languages with simple generics (including Java or C#).
  16. [16]
    [PDF] Expression Problem - Haskell.org
    Jul 29, 2009 · Philip Wadler coined the term: The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one ...Missing: original | Show results with:original
  17. [17]
    The Expression Problem and its solutions - Eli Bendersky's website
    May 12, 2016 · There are a number of ways the expression problem as displayed in this article can be solved in Clojure using the language's built-in features.
  18. [18]
    Visitor - Refactoring.Guru
    The Visitor pattern suggests that you place the new behavior into a separate class called visitor, instead of trying to integrate it into existing classes. The ...
  19. [19]
    8.11 The Expression Problem and the Visitor Pattern - Programming 2
    The visitor pattern is suitable to solve this problem because it allows to factor out the new functionality into a separate visitor class. It uses the accept/ ...Missing: approaches | Show results with:approaches
  20. [20]
    Data types à la carte | Journal of Functional Programming
    Mar 18, 2008 · Data types à la carte - Volume 18 Issue 4. ... The Expression Problem. Accessed at http://homepages.inf.ed.ac.uk/wadler/papers/expression ...
  21. [21]
    Classes and mixins | Proceedings of the 25th ACM SIGPLAN ...
    In this paper we develop a model of class-to-class functions that we refer to as mixins. A mixin function maps a class to an extended class by adding or ...Missing: problem | Show results with:problem
  22. [22]
    Solutions to the Expression Problem Using Object Algebras
    The expression problem (EP) [1] is a classical problem in programming ... Java (authors: Bruno Oliveira and William Cook); F# (authors: Matthew ...
  23. [23]
    [PDF] Algebraic Effects for Functional Programming - Microsoft
    A problem with many effect systems is that the inferred types become very large or difficult to understand – we have extensive experience with large effectful ...
  24. [24]
    Using Object Algebras To Design Embedded Domain Specific ...
    Object algebras allow the developer to define embedded DSLs in an extensible and modular way: DSLs can be broken down into independent pieces and can have ...
  25. [25]
    The Expression Problem, Generalised - Functional Conf 2025
    Nov 17, 2024 · In this talk, we first define The Expression Problem, then analyse examples of the problem. We then generalise from the cases of a datatype ...