Hierarchical database model
The hierarchical database model is a data organization method that structures information in a tree-like hierarchy, where each record (or segment) has exactly one parent and can have multiple children, enforcing strict one-to-many relationships between data entities.[1][2] This model represents data as a rooted tree, with the top-level root segment serving as the entry point and subordinate levels branching downward to dependent records.[3][4]
Developed by IBM in the late 1960s as part of its Information Management System (IMS), the hierarchical model emerged during the era of mainframe computing to handle complex, large-scale data processing needs, particularly for the Apollo space program and early enterprise applications.[1][2] It gained prominence in the 1970s as one of the first commercial database management systems, alongside the network model, with IMS becoming the dominant implementation.[2][3] Navigation in hierarchical databases occurs via procedural, record-at-a-time access, often using languages like DL/I in IMS, which employs commands such as Get Unique, Get Next, and Get Next Within Parent to traverse parent-child paths in a preorder sequence.[3][2] Storage can utilize sequential files, indexed structures like B-trees, or pointers, with database definitions specified through schema languages to maintain physical and some logical data independence.[2]
Key advantages of the hierarchical model include its simplicity in constructing and operating tree-structured data, efficient retrieval for predefined hierarchical queries, and automatic enforcement of data integrity through parent-child dependencies, making it suitable for domains like organizational charts or routine transaction processing in banking systems.[5][4] However, it suffers from rigidity, as relationships must be predefined and cannot easily accommodate many-to-many associations without data replication, leading to update anomalies, storage waste, and inconsistency risks.[4][2] Its navigational, procedural nature also limits query flexibility, requires detailed programming for access, and complicates structural modifications, contributing to its decline in favor of relational models by the 1980s.[5][3]
Despite its limitations, the hierarchical model persists in legacy systems like IMS, which still supports mission-critical applications in industries such as finance and aerospace, and influences modern technologies including XML document storage and geographic information systems (GIS) that leverage tree-based representations.[1][3] Extensions like logical databases in IMS allow virtual views across multiple physical trees to mitigate some relational shortcomings, demonstrating the model's adaptability in specialized contexts.[2]
Fundamentals
Definition and Overview
The hierarchical database model is a data management approach that organizes information into a tree-like structure, where records are connected through parent-child relationships, allowing each child record to have exactly one parent while a parent can have multiple children.[6] This model represents data as a collection of rooted trees, often forming a forest of such trees, with links associating exactly two records to enforce the hierarchy.[6]
The primary purpose of the hierarchical model is to facilitate efficient storage and retrieval of data exhibiting natural hierarchical relationships, such as those found in organizational charts, bill-of-materials in manufacturing, or file directory systems.[7] Visually, the data structure resembles an upside-down tree, with a single root record at the top branching downward into subordinate records, enabling straightforward navigation along predefined paths.[8]
This model emerged in the 1960s as one of the earliest formalized database paradigms, predating the relational model and influencing subsequent data management systems.[6]
Core Concepts
In the hierarchical database model, data is structured using records as the fundamental nodes, each comprising a collection of fields that hold related information. These records are further divided into segments, which serve as logical groupings of data elements reflecting dependencies between them, such as departments, employees, or projects in an organizational context.[9] A database itself is a collection of multiple such hierarchies, forming a forest of rooted trees where each tree represents a complete set of interconnected records.[6]
The model maintains a clear distinction between logical and physical views to separate user-facing data organization from underlying storage mechanics. Logical records encapsulate the conceptual structure and content visible to applications, defining the hierarchy and relationships without regard to storage details. In contrast, physical records address implementation specifics, including indexing, pointer linkages for navigation, and disk allocation to optimize access efficiency.[6][10]
Fields within segments are generally single-valued, meaning each contains only one data value to ensure data integrity and simplicity. However, the model supports multivalued aspects through repeating groups of child segments under a single parent, enabling one-to-many relationships where a parent record can link to multiple instances of a child type, such as multiple courses under a department. Native many-to-many relationships are not directly supported, necessitating workarounds like data replication or virtual linking to avoid redundancy while maintaining the tree structure.[9][6]
Operations in the hierarchical model presuppose sequential access patterns, as data retrieval inherently follows the predefined hierarchical paths, typically via depth-first or preorder traversal starting from the root to navigate child segments in a top-down, left-to-right sequence.[9][6] This traversal order aligns with the model's tree-like organization, prioritizing efficient linear processing over arbitrary jumps.[6]
Data Organization
Tree Structure
The hierarchical database model organizes data into a tree topology, where a single root segment serves as the topmost entry point, from which subordinate segments branch downward in a series of parent-child relationships, culminating in leaf segments that possess no further dependents. This structure enforces a strict one-to-many (1:N) relationship, ensuring that each non-root segment has precisely one parent while allowing parents to have zero or more children, thereby forming a directed acyclic graph resembling an inverted tree. The root segment anchors the entire hierarchy, and traversal occurs from higher to lower levels, reflecting natural containment or composition relationships in the data.[11][3]
A single hierarchical database can accommodate multiple such trees, collectively forming a forest, particularly when distinct hierarchies are maintained in separate physical datasets or through definitions supporting varied root instances. This capability allows for the representation of disjoint data groupings within the overall system, such as independent organizational units, without violating the model's tree-based constraints. Each tree in the forest operates autonomously, with its own root, enabling modular data management while preserving the integrity of individual hierarchies.[3][12]
Storage in the hierarchical model prioritizes efficiency for hierarchical access patterns by placing parent and child segments contiguously on disk, often in sequential order within each database record. Access methods such as Hierarchical Sequential Access Method (HSAM) and Hierarchical Indexed Sequential Access Method (HISAM) facilitate this by storing all segments of a record in physically adjacent locations, optimizing sequential reads and writes along the tree paths. This contiguous arrangement minimizes seek times and supports rapid traversal from parents to children, though it can complicate updates that alter the hierarchy.[13][14][3]
To illustrate, consider a simple organizational hierarchy: a root "Department" segment branches to multiple "Employee" child segments, each of which in turn branches to one or more "Project" leaf segments. This arrangement might be depicted as:
[Department](/page/Department) ([Root](/page/Root))
├── Employee 1
│ ├── [Project](/page/Project) A
│ └── [Project](/page/Project) B
└── Employee 2
└── [Project](/page/Project) C
[Department](/page/Department) ([Root](/page/Root))
├── Employee 1
│ ├── [Project](/page/Project) A
│ └── [Project](/page/Project) B
└── Employee 2
└── [Project](/page/Project) C
Such a structure captures the containment of employees within departments and projects within employee assignments, with data flowing unidirectionally downward from the root.[11]
Parent-Child Relationships
In the hierarchical database model, the parent-child relationship forms the foundational linkage between records, organizing data into a tree structure where each child record is associated with exactly one parent record. This enforces a strict one-to-one or one-to-many relationship, ensuring that no child can belong to multiple parents, which eliminates ambiguity in data ownership but restricts the model's ability to represent complex many-to-many associations without data duplication.[15]
A single parent can have multiple child occurrences, allowing for the representation of one-to-many relationships through repeating groups of child records subordinate to the parent. These repeating groups are implemented as multiple instances of child segments under a given parent segment, such as an employee record containing several dependent job or absence entries, which are linked to facilitate sequential or random access.[16][17]
Linkages between parents and children are typically established using pointer-based mechanisms, including physical pointers from children to their parents (often as addresses or relative byte addresses) and hierarchical pointers that connect parents to their first child or sequence of children and siblings. In systems like IBM's IMS, these include forward hierarchical pointers for parent-to-child traversal and twin pointers for chaining multiple children of the same type under a parent, enabling efficient navigation along the tree path.[17][15]
Key constraints in these relationships include the prohibition of cycles, ensuring acyclic tree formation, and the requirement that navigation between siblings or back to parents occurs only through designated pointers or by traversing the hierarchy, preventing direct cross-links outside the defined parent-child paths. This structure maintains data integrity by confining associations to the tree topology but necessitates full path traversal for any non-adjacent access.[15]
Querying and Operations
Navigation and Access Methods
In hierarchical database models, navigation primarily involves traversing the tree structure from the root segment downward through parent-child relationships to reach target data. This path-based approach requires sequential access along predefined hierarchical paths, often using a preorder traversal to follow the leftmost branches first for efficiency. For instance, to retrieve details on an invoice line item, an application might start at the customer root segment, proceed to the associated invoice parent, and then access the child line item segment.[3]
To facilitate faster access without always beginning at the root, hierarchical databases employ secondary indexes that provide alternative entry points based on key values outside the primary hierarchy. These indexes, such as those in IBM's Hierarchical Indexed Direct Access Method (HIDAM), allow direct retrieval of root segments or specific paths using sequence fields, reducing the need for full tree traversal. Secondary indexes can be qualified or unqualified, enabling targeted searches by attributes like customer number or order ID while maintaining the underlying tree integrity.[18]
Querying in hierarchical models typically uses procedural languages that specify navigation commands along these paths. In IBM's Information Management System (IMS), the Data Language/I (DL/I) interface supports calls like Get Unique (GU) for direct segment retrieval by key, Get Next (GN) for sequential traversal within a path, and Get Next within Parent (GNP) for accessing siblings under the same parent. These commands, often embedded in application code (e.g., COBOL), use Segment Search Arguments (SSAs) to qualify paths, such as retrieving all line items under a specific invoice.[3][18]
Access patterns in hierarchical databases are optimized for queries that align with the tree structure, such as retrieving entire subtrees or following known parent-child chains, which minimizes I/O through contiguous storage in methods like Hierarchical Sequential Access Method (HSAM). However, they prove inefficient for ad-hoc or non-hierarchical searches, as locating data without a predefined path often requires exhaustive scanning or multiple index lookups.[18]
Insert, Update, and Delete Operations
In hierarchical database models, the insert operation adds a new record as a child of an existing parent record, ensuring adherence to the predefined tree structure and relationship rules. The process involves setting the values for the new record in a work area or template, then specifying the parent under which it will be placed, often via a condition matching the parent's key. Pointers are updated to link the new child to the parent, typically inserting it as the leftmost child or in sequence order if defined by the schema. Hierarchy rules are checked to prevent violations, such as inserting into invalid positions or exceeding cardinality constraints like one-to-many limits. For instance, adding a new employee record under a department parent requires verifying the department exists and updating the parent's child pointer list.[15][19]
The update operation, often termed "replace," modifies the fields of an existing record while preserving the overall tree integrity. It begins by navigating to the target record using a get-hold command to establish a currency pointer, followed by altering the desired non-key fields in the work area. If a key field changes, such as a parent's identifier, the operation may require repositioning all dependent children by deleting and re-inserting them to maintain links, as direct key updates are typically prohibited to avoid breaking navigational paths. This ensures that parent-child relationships remain valid post-modification, with checks for data type consistency and range limits. An example is updating an employee's salary field without affecting its position under the department parent, or adjusting a branch name and cascading pointer adjustments for accounts.[15][3][19]
Deletion removes a record and handles its dependents according to schema rules, commonly cascading to eliminate the entire subtree to uphold referential integrity. The procedure locates the record via a get-hold, then executes the delete, which severs parent pointers and frees the space for the node and all descendants. Orphaning children is possible but constrained, requiring explicit schema allowance and subsequent re-parenting to avoid dangling references. Constraints prevent deletion of roots with dependents unless cascading is enabled, ensuring no incomplete hierarchies remain. For example, deleting a customer record would remove associated order subtrees, maintaining the database's structural consistency.[15][3][19]
Transaction control in hierarchical models groups insert, update, and delete operations into units of work to guarantee atomicity and consistency, using commit to persist all changes or rollback to undo them entirely. A transaction starts implicitly with the first database call or explicitly via allocation, encompassing multiple modifications across records while locking paths to prevent concurrent interference. Upon commit, all pointer updates and data changes are finalized; rollback reverts the database to the pre-transaction state by reversing operations in reverse order. This mechanism ensures that partial failures do not leave the hierarchy in an inconsistent state, such as orphaned children or broken links, particularly in systems like IMS where DL/I calls form the unit of recovery.[20][18]
Historical Development
Origins and Early Implementations
The hierarchical database model emerged in the mid-20th century as computing demands grew for organized, efficient data storage on mainframe systems. Its foundations trace back to early file organization methods like IBM's Indexed Sequential Access Method (ISAM), introduced in the 1960s, which enabled indexed access to sequentially stored records and laid the groundwork for structured data hierarchies in subsequent database designs.[21] This approach addressed the limitations of purely sequential file systems by incorporating tree-like indexing, influencing the development of more sophisticated models in the 1960s.[22]
In the early 1960s, IBM spearheaded innovations in data management to support large-scale applications on its System/360 mainframes, with key contributions from figures like chief architect Vern Watts.[23] Charles Bachman's work at General Electric further shaped the field through the Integrated Data Store (IDS), an early prototype DBMS released around 1964 that introduced navigational access via pointer chains, inspiring hierarchical and related structures for linking records.[22] Bachman's IDS demonstrated the potential for integrated data handling beyond flat files, emphasizing parent-child linkages that became central to hierarchical organization.[24]
The model's first major implementation arrived with IBM's Information Management System (IMS) in 1968, conceived in 1963 to fulfill NASA's requirements for the Apollo space program.[23] IMS was designed for efficient mainframe-based management of high-volume, structured data, such as tracking thousands of rocket parts through bill-of-materials hierarchies, enabling real-time inventory and versioning control.[23] This system marked a pivotal shift from file-based storage to true database capabilities, designed to handle high-volume structured data processing, such as inventory management for the Apollo program.[23]
Initial adopters focused on aerospace and government sectors, where the model's tree-like structure excelled at representing complex, one-to-many relationships in mission-critical environments like the Apollo missions.[23] NASA's use of IMS at the Rockwell Space Division in 1968 exemplified its suitability for handling voluminous, hierarchical data in high-stakes projects requiring rapid access and reliability.[23] These applications underscored the model's value in organized sectors demanding scalable data navigation without the flexibility of later paradigms.[25]
Evolution, Decline, and Resurgence
The hierarchical database model reached its peak of widespread adoption during the 1970s, particularly in mainframe environments where it powered large-scale enterprise applications due to its efficiency in handling structured, tree-like data relationships.[26] This dominance was evident in systems like IBM's IMS, which became a standard for transaction processing in industries requiring high performance and reliability.[27] However, the publication of Edgar F. Codd's seminal 1970 paper, "A Relational Model of Data for Large Shared Data Banks," introduced an alternative paradigm that emphasized data independence and declarative querying, setting the stage for a shift away from hierarchical rigidity.
The model's decline accelerated in the 1980s and 1990s as relational databases gained prominence, driven by the standardization of SQL and the need for greater flexibility in data manipulation and schema evolution.[28] Hierarchical systems were increasingly viewed as inflexible, with their parent-child structures complicating ad-hoc queries and adaptations to changing business requirements, leading to a major market shift toward relational systems by the late 1980s.[29] Commercial successes like IBM DB2 and Oracle further solidified relational dominance, rendering pure hierarchical databases largely obsolete for new developments.[26]
A partial resurgence occurred in the 2000s, fueled by the rise of hierarchical data formats like XML and JSON, which echoed the tree-based organization of the original model and found applications in semi-structured data storage and web services.[30] XML databases, in particular, revived interest in hierarchical approaches for handling nested documents, while JSON's adoption in NoSQL document stores extended this trend into the 2010s.[31] Legacy hierarchical systems, such as IMS, continue to support critical operations in banking, telecommunications, and healthcare as of 2025, valued for their proven scalability in high-volume transaction environments.[32][33][34]
As of 2025, the hierarchical model occupies a niche role, including in file systems like the Windows Registry for configuration storage and in specialized sectors where data hierarchies align naturally with operational needs.[35] No major new developments in pure hierarchical database technologies have emerged post-2020, with focus instead on modernization and integration of existing systems into hybrid environments.[32]
Notable Examples
The IBM Information Management System (IMS), released in 1968, serves as a foundational implementation of the hierarchical database model, originally developed to support the Apollo space program.[23] It integrates IMS Database Manager (IMS/DB), which provides hierarchical data storage and retrieval, with IMS Transaction Manager (IMS/TM), enabling robust transaction processing alongside database operations.[36] This combination allows IMS to manage complex, parent-child data relationships in a tree-like structure, where root segments anchor hierarchies and dependent segments represent subordinate entities.[37]
A core feature of IMS is its Data Language/Interface (DL/I) call-level query language, which facilitates path-based navigation through the database hierarchy by specifying segment search arguments and qualifiers to traverse from parent to child levels.[18] DL/I supports operations like retrieval, insertion, and modification via calls such as GU (Get Unique) and GNP (Get Next within Parent), optimized for sequential or random access in mainframe environments.[38] Designed for large-scale z/OS mainframes, IMS excels in high-throughput scenarios, processing up to 100,000 transactions per second in benchmarks while maintaining data integrity through locking mechanisms at the segment level.[32]
IMS hierarchical databases are structured around segments stored in physical sequential datasets (PSDs), which form the underlying storage for access methods like HDAM (Hierarchical Direct Access Method) using randomizers or HIDAM (Hierarchical Indexed Direct Access Method) employing entry sequences.[39] These datasets support efficient organization of up to 15 levels and 255 segment types per database, with pointers linking parent-child relationships.[37] To enhance flexibility, IMS incorporates secondary indexes as separate VSAM-based databases, allowing access to segments via non-primary keys—up to 32 indexes per segment and 1,000 per database—without altering the physical hierarchy.[40]
As of 2025, IBM continues to maintain and update IMS, particularly for mission-critical applications in sectors such as finance and aerospace, where it powers systems handling petabyte-scale data volumes and billions of daily transactions.[32] Its enduring architecture ensures reliability and scalability, with features like High Availability Large Databases (HALDB) extending partition support for massive datasets.[41]
Other Hierarchical Systems
Early precursors to commercial hierarchical systems include inventory management databases from the 1950s, which evolved into more structured models like IMS for complex applications such as the Apollo program.[25]
In modern operating systems, the Windows Registry exemplifies a hierarchical structure for managing configuration data, introduced with Windows NT 3.1 in 1993. Organized as a tree with root keys branching into subkeys and values, it stores settings for the OS, applications, and hardware in a parent-child manner, enabling efficient navigation similar to a file system.[42]
XML databases represent another variation, leveraging the inherent hierarchical nature of XML documents where tags define parent-child relationships. Oracle XML DB, released in the early 2000s with Oracle 9i Database Release 2, integrates this model into a relational environment, mapping XML hierarchies to database objects for storage, querying, and indexing of semi-structured data.[43][44]
In industry applications, hierarchical models persist in specialized domains requiring structured relationships. For instance, Amdocs' billing systems in telecommunications employ hierarchical structures to manage complex customer and service hierarchies, supporting B2B scenarios with interactive, multi-level billing presentations as of 2023.[45] Similarly, in healthcare, hierarchical databases organize patient records into tree-like layers—such as facilities, departments, and individual encounters—for access control and data management, with ongoing use in electronic medical record systems to enforce privacy regulations like HIPAA.[46][47]
Advantages and Disadvantages
Strengths
The hierarchical database model excels in performance due to its tree-like structure, which enables fast sequential access to data through pre-linked parent-child relationships, eliminating the need for complex joins common in other models. This design allows for rapid retrieval of related records by traversing the hierarchy directly, making it particularly efficient for large-scale, batch-oriented processing in environments like mainframe systems. For instance, in IBM's Information Management System (IMS), access methods such as Hierarchical Direct Access Method (HDAM) minimize I/O operations by avoiding index lookups, while Fast Path DEDB configurations can achieve high throughput rates exceeding 11,000 transactions per second in optimized setups.[18] Similarly, the model's generalized retrieval algorithms further reduce access times by narrowing the search space efficiently.[48]
Data integrity is a core strength, as the enforced one-to-many relationships inherent in the hierarchy prevent data redundancy and anomalies by ensuring each child record is linked to a single parent, maintaining consistent entity associations. This structure supports single-instance storage of shared data, reducing duplication while allowing multiple users to access the same information without conflicts. In IMS implementations, features like fine-grained locking and program isolation isolate transactions, minimizing deadlocks and ensuring modifications are committed only at synchronization points, thereby upholding database consistency even under concurrent access.[18] Parent-child dependencies also facilitate automatic enforcement of referential integrity, as changes to parent records can propagate predictably to dependents.[49]
The model's simplicity makes it intuitive for representing naturally hierarchical data, such as organizational charts, bills of materials, or file systems, where entities exhibit clear containment or subordination. Developers can navigate the structure using straightforward calls like Get Unique or Get Next, which align directly with the logical tree, simplifying application programming without requiring abstract query languages. This conceptual clarity stems from the parent-child paradigm, which promotes easy comprehension and maintenance of data relationships.[18] As noted in early database research, the unified framework of hierarchical access simplifies design across various organizations.[48]
Resource efficiency is evident in low-overhead operations tailored for mainframe environments and read-heavy workloads, where the compact storage of hierarchies optimizes memory and CPU usage. IMS, for example, employs reusable buffers and asynchronous I/O, with DEDB achieving up to 50% reduction in CPU utilization compared to full-function databases, while supporting scalable configurations across multiple processors without data affinities.[18] The absence of redundant links reduces storage needs, and efficient space management parameters like loading factors further enhance utilization in high-volume scenarios.[48] This makes the model suitable for legacy systems handling massive, structured datasets with predictable access patterns.
Weaknesses
The hierarchical database model exhibits significant rigidity in its structure, primarily because it is designed to represent only one-to-one and one-to-many (1:N) relationships efficiently, forcing many-to-many (M:N) relationships to be handled through data duplication or extensive redesign.[50] For instance, in modeling an employee assigned to multiple projects, the employee's record must be duplicated under each relevant project parent, leading to redundancy and potential inconsistencies during updates.[50] This fixed tree-like organization, enforced by parent-child constraints, limits adaptability to complex or non-hierarchical data patterns without altering the entire schema.[9]
Querying in hierarchical databases is often inefficient for ad-hoc or exploratory access, as it requires specifying the full navigational path from root to the desired segment, which can involve multiple sequential operations for deep hierarchies.[50] Unlike declarative query languages, the model's procedural navigation—using commands like GET NEXT or GET UNIQUE—demands precise knowledge of the structure, making non-hierarchical queries cumbersome and prone to inefficiency, especially when virtual parent-child relationships rely on pointers rather than inherent sequences.[9] Segment search arguments (SSAs) further complicate this by necessitating complex specifications for filtering, reducing usability for dynamic querying needs.[9]
Maintenance poses substantial challenges due to the model's dependence on predefined hierarchies, where structural changes such as adding new levels or reparenting segments require rewriting application code and potentially reorganizing large portions of the database.[51] Insertions are restricted by the rule that child records cannot exist independently of a parent, often necessitating temporary placeholders or batch operations to maintain integrity.[50] Deletions exacerbate this, as removing a parent automatically cascades to all descendants, risking unintended data loss without careful safeguards.[50] Overall, these factors demand specialized tuning and increase the workload for database administrators.[9]
Scalability limitations become evident in handling unstructured or evolving data schemas, particularly in modern big data environments, where the rigid tree structure struggles to accommodate irregular relationships or schema evolution without performance degradation.[52] As data volumes grow, navigation through deep hierarchies slows, and the inability to easily model M:N links or distributed data leads to bottlenecks in large-scale systems.[50] Implementations like IMS impose hard limits, such as one virtual parent per record type, further constraining expansion in dynamic contexts.[50]
Comparisons with Other Models
Versus Relational Model
The hierarchical database model organizes data into a tree-like structure, where records are linked through parent-child relationships using pointers, enforcing a strict one-to-many hierarchy from a single root.[53] In contrast, the relational model structures data as a collection of tables (relations) with rows and columns, where relationships between tables are established via primary and foreign keys, allowing for more flexible many-to-one or many-to-many associations without inherent tree constraints.[15] This pointer-based navigation in hierarchical models directly traverses predefined paths, while relational models rely on logical keys to maintain referential integrity across independent tables.[53]
Querying in the hierarchical model involves procedural path navigation, such as sequential access along parent-child links (e.g., using commands like "get next" in a preorder traversal), which is efficient for known hierarchical traversals but requires explicit programming for complex queries.[15] The relational model, however, employs declarative SQL queries with joins to combine data from multiple tables, enabling ad-hoc, set-based operations that are more intuitive and flexible for unforeseen queries, though they may incur higher overhead for deeply nested hierarchies.[53] For instance, retrieving child records in a hierarchical system follows implicit segment joins, whereas relational SQL explicitly specifies joins like SELECT * FROM Child WHERE ParentID = ?, supporting optimization by query planners.[53]
Hierarchical data can be represented in relational databases using techniques like the adjacency list model, where a single table stores entities with a self-referencing foreign key (e.g., a ParentID column pointing to the same table's primary key), and hierarchies are reconstructed via recursive self-joins. For example, a table for organizational departments might include columns for DepartmentID, Name, and ParentDepartmentID, with queries like SELECT * FROM Departments WHERE ParentDepartmentID = 1 fetching direct children, and recursive common table expressions (CTEs) in SQL traversing deeper levels.[54] This approach avoids the physical duplication common in hierarchical models for many-to-many scenarios, instead deriving relationships dynamically through joins.[15]
The widespread adoption of the relational model over hierarchical systems stemmed from its normalization principles, which systematically eliminate data redundancy by decomposing relations into smaller, dependency-free tables, thereby addressing the duplication inherent in hierarchical trees where child data may replicate across parent branches.[55] Edgar F. Codd's foundational work emphasized that normalization reduces inconsistencies and storage waste, providing a mathematical basis for derivability and consistency that hierarchical models lack due to their rigid structure and potential for replicated segments.[55] This shift enabled scalable, maintainable databases for diverse applications, as relational normalization (e.g., to third normal form) prevents anomalies from updates in replicated data, a common issue in unnormalized hierarchical representations.[15]
Versus Network Model
The hierarchical database model and the network database model, both prominent pre-relational approaches developed during the late 1960s and early 1970s, differ fundamentally in their approaches to data organization and relationship representation. The hierarchical model structures data in a tree-like format, enforcing a strict one-to-many parent-child relationship where each child record has exactly one parent, which suits hierarchical data such as organizational charts or bill-of-materials systems. In contrast, the network model, formalized by the CODASYL Data Base Task Group (DBTG) in their 1971 report, organizes data as a graph using owner-member sets to enable more flexible linkages, including many-to-many relationships between record types. This allowed the network model to represent complex interconnections without being confined to a tree topology, addressing limitations in scenarios requiring multiple parent-child associations.[56]
Navigation in these models also highlights their structural disparities. Hierarchical systems rely on sequential traversal along predefined parent-child paths, starting from the root and descending through branches, which simplifies access for tree-structured queries but requires duplicating data for records needing multiple associations. The network model, however, employs pointer-based chains within sets, where programs use currency indicators to follow links from owner records to multiple member records or vice versa, supporting arbitrary path navigation across the graph. This pointer-driven approach in CODASYL systems provides greater expressiveness for interconnected data but demands explicit programming of traversal logic.[56]
In terms of complexity, the hierarchical model's tree constraint makes it simpler to implement and maintain, as it avoids cycles and enforces a clear hierarchy, though this generality is limited to non-graph data. The network model offers more power for modeling real-world graphs with broad relationships, as seen in CODASYL's set-based framework, but introduces higher management overhead due to the need to handle multiple linkages, potential cycles, and intricate pointer maintenance. Historically, these models coexisted in the 1960s-1970s era of mainframe computing, with the network approach evolving as an extension to overcome the hierarchical model's rigidity while both preceded the relational paradigm's rise in the late 1970s.[56]