Network model
The network model is a type of database management system architecture that represents data as collections of records connected through predefined relationships, enabling each record to have multiple parent and child associations to model complex, many-to-many linkages in a graph-like structure.[1] This model, formalized in the late 1960s by the Conference on Data Systems Languages (CODASYL), uses a schema composed of record types and set types to define the database structure, where sets act as pointers linking owner records (parents) to member records (children).[2]
Developed as an extension of the hierarchical model to address its limitations in handling non-tree structures, the network model gained prominence in the 1970s through the Database Task Group (DBTG) standard, which influenced commercial systems like Integrated Data Store (IDS) and IDMS.[3] It supports navigational access via procedural queries that traverse links between records, making it suitable for applications requiring efficient handling of interconnected data, such as telecommunications or manufacturing inventories, though it demands detailed knowledge of the database schema for effective querying.[4] Despite its flexibility in representing real-world relationships, the model's complexity in schema design and maintenance contributed to its decline with the rise of the relational model in the 1980s, which offered declarative querying and greater simplicity.[5]
Core Concepts
Definition and Structure
The network model is a database architecture that organizes data in a graph-like structure, where records function as nodes and sets serve as directed edges to represent relationships between them. This approach facilitates the modeling of complex interconnections, including many-to-many relationships, by allowing records to participate in multiple linkages without the rigid parent-child hierarchy of tree structures.[1]
The fundamental structural principle of the network model revolves around owner-member relationships, in which one record type is designated as the owner (or parent) of a set occurrence, and one or more other record types act as members (or children). Each set occurrence links a single owner to zero or more members, enabling flexible navigation across the data graph while maintaining directed associations that support efficient querying of interrelated records. Unlike simpler models limited to single-parentage, this principle allows members to connect to multiple owners through different sets, thereby accommodating real-world scenarios with multifaceted dependencies.[6][1]
Key components of the network model include record types, which are structural definitions analogous to entities and comprise one or more data items (fields) that store specific attribute values, such as names or identifiers. Set types, in turn, specify the relationships between record types, implemented through pointer-based links that physically connect occurrences of owner and member records in storage. These pointers enable direct traversal from owners to members and, in some cases, vice versa, forming the backbone of data access.[1][6]
In graphical representations, the network model's structure is commonly illustrated via data-structure diagrams, which depict record types as rectangular boxes and set relationships as arrows or lines indicating the direction from owner to member. These diagrams highlight the interconnected nature of the database, showing how pointers facilitate navigation paths akin to traversing a directed graph, thus providing a visual schema for understanding the overall topology.[1][6]
Records, Sets, and Relationships
In the CODASYL network model, data is organized into records, which serve as the fundamental units of storage and retrieval. A record type defines the structure of a group of similar records, consisting of named data items or fields that hold atomic values, such as strings or numbers.[1] Records are distinguished as logical or physical: logical records represent the conceptual view accessible to applications, while physical records handle the underlying storage, often implemented as blocks or pages in files.[7] Within relationships, records are further classified as owner records, which act as parents in a set, or member records, which act as children linked to one or more owners.[6]
Sets form the core mechanism for expressing relationships between record types in the network model, defined as an ordered collection that links one owner record type to zero or more member record instances of another type, establishing a many-to-one association.[1] Set types specify this linkage, with ordering modes such as first-in, last-in, or sorted by a key field to determine the sequence of members.[6] Single-parent sets restrict a member to exactly one owner occurrence, while multi-parent structures are achieved indirectly by allowing a member record type to participate in multiple set types, each with a different owner, thus enabling complex many-to-many relationships through intermediary records if needed.[7]
Navigation through these relationships relies on pointers embedded within records, forming circular linked lists or rings that connect an owner to its members for efficient traversal.[7] Currency indicators maintain the position during operations, tracking the current record of a specific type, set type, or the entire run unit (a transaction-like scope), allowing commands to find and retrieve related data by moving forward, backward, or to the first/last position in a set.[6] This pointer-based approach facilitates direct access without full scans, with each set occurrence represented as a self-contained structure.[1]
Several constraints ensure data integrity in sets. Uniqueness rules prohibit a member record from belonging to more than one occurrence of the same set type, preventing duplicates within a single relationship while allowing participation in multiple set types.[7] Cardinality is enforced as one-to-many per set occurrence, with exactly one owner per set and variable members (zero or more), though system limits may cap the maximum number of members to manage storage.[6] Additional options like mandatory membership require every member to connect to an owner, while optional allows standalone records.[7]
Historical Development
Origins and Early Influences
The network database model emerged in the early 1960s, drawing foundational influences from graph theory, which provided a conceptual framework for representing complex interconnections between data entities as nodes and edges. This mathematical approach, developed in the 18th and 19th centuries but increasingly applied in computer science by the mid-20th century, enabled more flexible modeling of relationships compared to linear or tree-like structures prevalent in earlier file systems.[8] Charles Bachman, while working at General Electric, leveraged these graph-theoretic principles to create the Integrated Data Store (IDS) in 1963, marking the first direct-access database management system and laying the groundwork for the network model.[9] IDS represented data as records linked through sets, allowing navigation across multifaceted associations in a graph-like manner.[10]
Early motivations for the network model stemmed from the growing demands of business computing in the 1960s, where organizations required integrated systems to manage intricate, many-to-many relationships in operational data—such as those in supply chains and production processes—that rigid file management techniques could not efficiently handle. At the time, data storage relied heavily on sequential tape systems or early navigational databases, but these lacked the ability to support shared data access across departments without redundancy. IBM's Information Management System (IMS), introduced in 1966 as a hierarchical model for the Apollo space program, further highlighted the limitations of tree-structured hierarchies, which struggled with non-parent-child linkages common in real-world business scenarios.[9] Bachman, as a key figure, pioneered the model's development through IDS and its companion Integrated File System (FS), aiming to enable company-wide data sharing at GE's manufacturing divisions. His innovations, including data structure diagrams for visualizing relationships, earned him the 1971 ACM Turing Award for contributions to database technology.[11]
Initial adoption of the network model occurred in mid-1960s mainframe environments, particularly for manufacturing and inventory management applications where complex interdependencies between parts, suppliers, and production lines necessitated graph-based navigation. GE implemented IDS across its facilities to streamline appliance production data, demonstrating the model's practicality for large-scale, integrated operations on systems like the GE-600 series computers. This early use case influenced subsequent implementations at other industrial firms, establishing the network approach as a viable alternative for handling enterprise-scale data before broader standardization efforts.[12]
CODASYL Standardization
The Conference on Data Systems Languages (CODASYL), established in 1959 by the U.S. Department of Defense to standardize programming languages such as COBOL, turned its attention to database management in the 1960s.[13] Building on early influences like Charles Bachman's integrated data store (IDS) system, CODASYL formed the List Processing Task Force in 1965, which was renamed to the Data Base Task Group (DBTG) in May 1967, to develop specifications for a common database management system compatible with COBOL and other languages.[9] The DBTG stabilized its membership in January 1969 under chairman A. Metaxides and published its first proposals that October, laying the groundwork for the network database model.[14]
The pivotal 1971 DBTG Report, released in April and reviewed by the CODASYL Programming Language Committee in May, formalized the network model by defining key components including the schema for overall database structure, subschema for user views, and data storage mechanisms using records and sets to represent complex relationships.[14] This report incorporated 130 of 179 submitted proposals, emphasizing a three-level architecture to separate conceptual, external, and internal data representations.[14] The June 1973 Journal of Development, produced by the Data Description Language Committee (DDLC) formed in 1971, provided updates refining these elements, particularly introducing privacy locks for controlling access to records and items, as well as module concepts for organizing subschemas to enhance security and modularity.[14]
Central to these standards were the Data Definition Language (DDL) specifications for describing database structures independently of host languages, and the Data Manipulation Language (DML) for operations like storing, retrieving, and updating data, both designed to ensure portability across diverse systems and vendors.[14] These vendor-neutral features promoted interoperability, reducing proprietary lock-in and facilitating program migration.[14]
The CODASYL standards significantly influenced the database industry, driving widespread adoption of network model systems in government agencies and large enterprises during the 1970s, as evidenced by commercial implementations like Cullinet's IDMS and other COBOL-integrated solutions that supported complex, many-to-many data relationships in mainframe environments.[15] This peaked in the mid-1970s, with the standards enabling scalable data management for critical applications in sectors requiring robust, pointer-based navigation.[16]
Technical Implementation
Data Definition Language
The Data Definition Language (DDL) in the CODASYL network model provides a formal syntax for defining the logical and physical structure of the database, primarily through schema, subschema, and storage specifications.[14] The schema DDL establishes the overall database blueprint, including record types, data items, and set types that model relationships between records.[17]
Schema definition begins with the RECORD entry, which declares a record type and its constituent data items. The basic syntax is RECORD NAME IS record-name, followed by subentries for data items using level numbers (e.g., 01 for the record, 02 for subgroups) and clauses like PICTURE for formatting or TYPE for data categories such as arithmetic, character string, or database key.[14] For example:
RECORD NAME IS EMPLOYEE
01 EMP-ID PICTURE IS "9(6)"
01 EMP-NAME PICTURE IS "X(30)"
01 SALARY TYPE IS [DECIMAL](/page/Decimal)(7,2).
RECORD NAME IS EMPLOYEE
01 EMP-ID PICTURE IS "9(6)"
01 EMP-NAME PICTURE IS "X(30)"
01 SALARY TYPE IS [DECIMAL](/page/Decimal)(7,2).
Data items represent the smallest named units, such as numeric fields or strings, and can include aggregates like vectors via the OCCURS clause for repeating groups.[17] Set types are defined with the SET entry to specify owner-member relationships, using syntax like SET NAME IS set-name OWNER IS owner-record MEMBER IS member-record, optionally including ORDER (e.g., ASCENDING on a key) or membership rules (e.g., MANDATORY AUTOMATIC for DBMS-managed links).[14] An example is:
SET NAME IS EMPLOYEE-DEPT
OWNER IS [DEPARTMENT](/page/Department)
MEMBER IS [EMPLOYEE](/page/Member)
MANDATORY AUTOMATIC
ORDER IS ASCENDING DEPT-NO.
SET NAME IS EMPLOYEE-DEPT
OWNER IS [DEPARTMENT](/page/Department)
MEMBER IS [EMPLOYEE](/page/Member)
MANDATORY AUTOMATIC
ORDER IS ASCENDING DEPT-NO.
This structure enforces the network's graph-like connections between records.[17]
The subschema DDL creates user-specific views by extracting a subset of the schema, allowing programs to access only relevant records, sets, and data items while hiding others for security and simplicity.[14] Defined with syntax like SUBSCHEMA NAME IS subschema-name WITHIN [SCHEMA](/page/Schema) schema-name PRIVACY KEY IS 'password', it supports modifications such as redefining vectors as fixed arrays or applying privacy locks to restrict access.[17] This enables controlled views without altering the underlying schema.
Storage schema details the physical organization, starting with AREA entries like AREA NAME IS area-name [TEMPORARY], which divide the database into logical storage regions where records are assigned via a WITHIN clause in the RECORD definition (e.g., WITHIN MAIN-AREA).[14] Pages serve as fixed-length physical units within areas, managed automatically by the DBMS for record placement.[17] Indexing is handled through clauses like INDEXED in SET entries or SEARCH KEY in records, enabling efficient retrieval based on specified keys.[14]
Key DDL elements include locators, implemented as database keys (DBKEYs), which act as unique pointers to record occurrences for direct access; these are declared with TYPE IS DATA-BASE-KEY and managed by the DBMS.[17] Calculated fields, or calc keys, support computed values for storage or access, using LOCATION MODE IS CALC USING key-fields in the RECORD entry to derive record positions dynamically (e.g., hashing on a name field).[14] Rename clauses appear in subschemas to alias data items or records, such as redefining a field name for application-specific use without impacting the schema.[17]
Data Manipulation Language
The Data Manipulation Language (DML) in the network model, as defined by the CODASYL Database Task Group (DBTG), is a procedural interface embedded within a host programming language such as COBOL or PL/I, enabling applications to navigate and manipulate records through explicit commands that manage currency pointers and status indicators.[18] Unlike declarative query languages, it requires programmers to specify step-by-step operations, including loops and conditional checks, to traverse the complex graph of records and sets, reflecting the model's emphasis on direct pointer-based access for efficiency in hierarchical or many-to-many relationships.[7]
Navigation in the network model relies on commands that position currency indicators—pointers to the current record instance (CRU) or set occurrence—to facilitate traversal. The FIND command locates a specific record or set element, setting the appropriate currency; for example, FIND ANY CUSTOMER USING CUSTOMER-NAME retrieves the first matching customer record, while FIND OWNER DEPOSITOR positions to the owner record in the depositor set, and FIND NEXT MEMBER WITHIN DEPOSITOR advances to the subsequent member record linked to the current owner.[18] Once positioned, the GET command transfers the current record's data into the program's user work area (UWA) for processing, as in GET CUSTOMER after a FIND operation.[6] The READY command prepares database areas or realms for access, specifying modes like retrieval or update to enable subsequent operations on sets, ensuring controlled concurrency.[19]
Insertion operations use the STORE command to add new records to the database, populating the UWA with data before execution; for instance, STORE [ACCOUNT](/page/Account) creates a new account record, automatically connecting it to an owner set if schema rules dictate mandatory membership.[7] Set membership updates accompany this via CONNECT, which links the new record as a member to a specified owner, such as CONNECT [ACCOUNT](/page/Account) TO DEPOSITOR after storing an account under a customer.[18] Deletion employs the ERASE command to remove the current record, with options like ERASE ALL [CUSTOMER](/page/Customer) recursively deleting the owner and all connected members; prior to erasure, DISCONNECT severs set links, e.g., DISCONNECT [ACCOUNT](/page/Account) FROM DEPOSITOR, to maintain referential integrity without cascading deletes unless specified.[6]
Updates are handled by the MODIFY command, which alters data items in the current record after a positioning FIND and GET; for example, after FIND FOR UPDATE CUSTOMER and GET CUSTOMER, MODIFY CUSTOMER can change an address field, with the system updating currency pointers to reflect the modified instance.[18] This requires explicit "for update" clauses in FIND to lock the record, preventing concurrent modifications.
The procedural essence of the DML demands application code to orchestrate navigation, often through iterative constructs like while loops checking status flags (e.g., DB-STATUS for success or end-of-set), contrasting sharply with declarative paradigms where queries abstract away pointer management.[7] This approach, while verbose, allows fine-grained control suited to the network model's linked structures.
Comparisons and Applications
Differences from Other Models
The network database model differs fundamentally from the hierarchical model in its structural flexibility. While the hierarchical model organizes data into tree-like structures where each child record has exactly one parent, enforcing one-to-many relationships, the network model extends this to arbitrary graphs, permitting records to have multiple parents and supporting many-to-many relationships.[20][16] For instance, in modeling organizational data, a hierarchical approach might assign an employee to a single department as the sole parent, whereas the network model allows that employee record to link to multiple department records simultaneously, reflecting real-world dual reporting lines without duplicating the employee data across trees.[20] This generalization from trees to graphs enables the network model to represent more complex interconnections, though it requires explicit management of links that the hierarchical model avoids through its rigid parent-child hierarchy.[16]
In contrast to the relational model, the network model relies on physical pointers and explicit links between records for navigation, rather than logical associations via shared keys and table joins.[1][21] Relationships in the network model are defined through sets—collections of owner-member record pairs—necessitating procedural traversal commands to follow these links, which can lead to redundancy if the same data must be repeated to accommodate multiple pathways.[1] The relational model, however, employs normalization techniques to eliminate such redundancy by storing data in independent tables connected declaratively through primary and foreign keys, avoiding the need for embedded pointers.[21][16]
Query paradigms further distinguish the two models: the network approach uses a procedural data manipulation language embedded in a host programming language, requiring developers to specify step-by-step navigation (e.g., finding an owner record and then iterating through its set members), without a built-in query optimizer to automate path selection.[1][21] Relational databases, by comparison, support declarative queries in SQL, where users describe desired results and the system optimizer determines efficient join execution plans.[16] This navigational rigidity in the network model contributes to lower data independence, as changes to physical pointer structures can invalidate application code, whereas the relational model's logical table abstraction insulates applications from storage details.[1][21]
Modern and Legacy Usage
Despite the standardization of SQL and the dominance of relational database management systems in the 1980s, which simplified data independence and querying compared to the navigational access of network models, the network database approach saw a sharp decline in adoption.[22][23] This shift was driven by relational systems' ability to handle complex queries more intuitively without requiring programmers to navigate explicit pointers and sets, leading to broader commercial success and easier maintenance.[24]
Network databases nonetheless persist in legacy mainframe environments, especially COBOL-based systems supporting mission-critical operations in sectors like banking and insurance, where high-throughput transaction processing remains essential.[25] For instance, CA IDMS, a prominent CODASYL-compliant implementation now maintained by Broadcom, continues—as of 2025—to manage structured data in financial applications requiring reliability and performance on IBM z Systems, with release notes updated in October 2025 confirming active support.[26][27] Similar usage extends to telecommunications for handling interconnected records in billing and network management, though often alongside modernization efforts to integrate with cloud analytics.[28]
Migrating from network models to relational databases presents significant challenges due to the intricate pointer-based relationships and set structures, which lack the logical independence of relational schemas and often require extensive reverse engineering to map owner-member links accurately.[29] Tools and methods, such as automated schema transformation and wrapper-based evolution, address these issues by capturing source data schemas, resolving many-to-many complexities, and ensuring data integrity during conversion, but projects can still face high costs and risks from incomplete navigational logic translation.[30][31]
In modern contexts, the network model's emphasis on explicit connectivity has influenced hybrid integrations within NoSQL graph databases, where concepts like record sets prefigure node-edge representations for complex relationship modeling, as seen in systems like Neo4j that build on graph theory while echoing CODASYL's flexible navigation.[32] Niche revivals appear in embedded systems, where the model's efficient, low-overhead pointer traversal suits resource-constrained environments, such as real-time IoT devices combining network-like structures with relational features for optimized data access.[33]