History and Development
Origins and Initial Creation
The development of GEDCOM originated within the Family History Department of The Church of Jesus Christ of Latter-day Saints (LDS Church) in 1984, as part of broader efforts to computerize family history research and facilitate the exchange of genealogical data among Church members and their software tools.[1] This initiative was deeply motivated by the Church's doctrinal emphasis on temple ordinances, including baptisms and endowments for deceased ancestors, which required accurate tracking and sharing of lineage-linked information to support ordinance reservations and avoid duplication.[1][2] The initial version, known as GEDCOM 1.0, was released in 1984 as a straightforward, human-readable text-based format designed primarily for mainframe computer systems used in the Church's Ancestral File database.[1][3] This format employed line-based records with level indicators and tags to represent hierarchical family structures, enabling the transfer of pedigree and family group data without proprietary software dependencies. Key contributors included members of the LDS Family History Department.[4] Early adoption of GEDCOM was largely confined to LDS-specific applications, such as the Personal Ancestral File (PAF) software, which the Church released in 1984 to empower members in compiling and submitting family data for temple work. PAF integrated GEDCOM export capabilities starting with version 2.0 in 1985, allowing users to submit digital files directly to Church systems for ordinance tracking and integration into centralized databases.[3] This limited scope reflected GEDCOM's initial focus on internal Church needs before broader genealogical community involvement.Standardization and Evolution
The GEDCOM specification emerged from collaborative efforts within the Family History Department of The Church of Jesus Christ of Latter-day Saints, with GEDCOM 4.0 released in August 1989 as a key standardized version, building on earlier drafts to define a uniform format for genealogical data exchange.[4] This release marked a shift toward broader industry adoption, moving beyond its initial creation within the LDS Church to encourage participation from external developers and software producers.[5] Prepared by the Projects and Planning Division under Data Administration, the standard emphasized flexibility and compatibility to support the growing ecosystem of genealogical tools.[4] The evolution of GEDCOM was primarily driven by the imperative for interoperability among diverse software applications, prompting invitations to commercial vendors to register their products and incorporate the Lineage-Linked GEDCOM Form for seamless data sharing.[5] Notable examples include Broderbund's Family Tree Maker and Leister Productions' Reunion, which integrated GEDCOM support to enable users to transfer family history data across platforms without loss of structure.[6] This vendor involvement helped establish GEDCOM as a de facto industry standard, fostering a wide range of interoperable products while maintaining backward compatibility with prior versions.[5] In the post-2010 era, FamilySearch, as the steward of the specification, has played a central role in its ongoing maintenance and enhancement, culminating in the release of GEDCOM 7.0 in 2021, with subsequent minor updates continuing as of 2025 to address modern needs.[7][8] Collaborative development accelerated through initiatives like the RootsTech 2020 effort, involving industry stakeholders to update the standard based on GEDCOM 5.5.1.[7] FamilySearch has further promoted open-source contributions by hosting the specification on a public GitHub repository at gedcom.io, allowing developers to review, suggest improvements, and ensure continued relevance in genealogical research.[7]Data Model
Hierarchical Records and Levels
GEDCOM employs a tree-like hierarchical structure to organize genealogical data, where information is represented as nested records and substructures. This model uses numeric levels to denote parent-child relationships, beginning with level 0 for top-level records that serve as the primary entities in a family tree. Each subsequent level indicates subordination to the nearest preceding line at a lower level, creating a logical nesting that mirrors familial and event-based connections without requiring a relational database schema.[5][9] The core record types at level 0 include Individual (INDI) for personal details, Family (FAM) for marital or parental units, and Source (SOUR) for bibliographic references, among others such as Repository (REPO) and Note (NOTE). Each record initiates with a level 0 line followed by a unique cross-reference identifier (XREF), such as0 @I1@ INDI, which acts as a pointer for linking across the file. Substructures under these records appear at level 1 or higher, encapsulating attributes, events, and multimedia references; for instance, an individual's birth event might nest as 1 BIRT with further details like date at level 2 (2 DATE 15 NOV 1950). This indentation via levels ensures that data like names, occupations, or residences are contextually tied to their parent record.[5][9]
Relationships between records are established through cross-reference pointers rather than duplication, promoting data integrity and efficiency. For example, a Family record links to Individual records via tags like 1 HUSB @I1@ for the husband and 1 CHIL @I2@ for a child, allowing bidirectional navigation without repeating personal details. This pointer system extends to associations, such as an individual's family membership via 1 FAMC @F1@, enabling complex pedigrees while maintaining the hierarchical nesting for intra-record elements like events and notes.[5][9]
Unlike flat-file or tabular databases, GEDCOM's hierarchy emphasizes parent-child nesting to group temporally or thematically related data, such as sequencing life events under an individual or embedding citations within sources. This approach facilitates the representation of irregular, narrative-driven genealogical information, where substructures can vary in depth and cardinality to accommodate diverse family histories.[5][9]
Tags, Values, and Pointers
In GEDCOM, tags serve as three- or four-letter mnemonic codes that identify the type of data element within a line, providing semantic meaning in the hierarchical structure. These tags are always uppercase and typically abbreviated for brevity, such asNAME for a person's name, BIRT for birth event, or DEAT for death.[5] Tags are defined in the specification's appendix, distinguishing between standard tags approved for universal use and user-defined extensions prefixed with an underscore (e.g., _MYTAG), which allow customization without conflicting with core elements.[5] Within records, certain tags are mandatory—such as NAME in an individual (INDI) record—to ensure completeness, while others like SOUR (source citation) are optional but recommended for verifiability.[5] In GEDCOM 7.0, tags are further formalized with URIs for semantic interoperability (e.g., g7:NAME), enhancing machine readability while maintaining backward compatibility with prior versions.[9]
Values follow the tag on each line, separated by a single space, and represent the actual data content associated with that tag. They are text-based strings limited to 255 characters per line in GEDCOM 5.5.1, with longer values extended using continuation tags like CONC (concatenation without newline) or CONT (continuation with newline) to preserve formatting.[5] For example, a name value might appear as John /Doe/, where slashes delimit surname components, or a place as Cove, Cache, Utah, USA.[5] Special characters in values are handled via escape sequences, such as doubling the at sign (@@) to include a literal @, or using @#LANG@ to specify language (e.g., @#ENGLISH@).[5] GEDCOM 7.0 removes the CONC tag and character limits, favoring UTF-8 encoding for unrestricted text handling and multi-line CONT for notes.[9]
Pointers, also known as cross-reference identifiers (XREFs), enable linkages between records using a unique format enclosed in at signs: @<identifier>@, where the identifier is an alphanumeric string up to 22 characters (e.g., @I123@ for an individual).[5] These appear optionally at the start of a line after the level number, such as in 1 CHIL @I123@ to link a child to an individual record, ensuring no duplicates within a file.[5] Pointers are distinct from values by their @...@ delimiters and are used exclusively for referencing, not storing data. In GEDCOM 7.0, pointers support a null value (@VOID@) for optional links and integrate with URI-based tags for extended semantics.[9]
GEDCOM employs specific data types for values to standardize common genealogical elements, parsed line-by-line for efficiency. Dates use a structured format like <calendar> <day> <month> <year>, with escape sequences for calendars (e.g., @#DGREG@ 3 JAN 2000 for Gregorian), supporting ranges (BET 1904 AND 1915) and approximations (ABT 1920).[5] Places are free-form but conventionally hierarchical (e.g., City, County, State, Country), often paired with a FORM tag for jurisdiction details.[5] Notes allow unstructured text for annotations, continued across lines with CONT to embed research context without altering hierarchy.[5] In GEDCOM 7.0, dates incorporate a PHRASE substructure for dual-date handling (e.g., old vs. new style), and all data types align with XML Schema primitives like xsd:string for broader compatibility.[9] This line-based syntax—comprising level, optional pointer, tag, and value—facilitates simple parsing while accommodating the format's emphasis on portability across systems.[5]
File Structure
Header Block
The Header Block is the mandatory initial segment of a GEDCOM file, beginning with the level 0HEAD tag to delineate the start of the transmission and provide essential metadata for parsers to interpret the file correctly.[5] This block declares the GEDCOM version, source software, character encoding, submitter reference, and optional copyright information, ensuring compatibility across genealogical software systems.[5] By specifying these elements, the Header Block allows receiving applications to validate the file format and handle data appropriately before processing the subsequent body records. The header must include a reference to a submitter record in the body via the SUBM tag.[5]
The structure commences with 0 HEAD, followed by required level 1 substructures such as 1 GEDC containing 2 VERS 5.5.1 to indicate the GEDCOM specification version, and 1 CHAR UTF-8 (valid in 5.5.1 and later; ANSEL or ASCII in earlier versions) to define the character set for text rendering.[5] The source is identified via 1 SOUR <APPROVED_SYSTEM_ID>, often accompanied by 2 VERS <VERSION_NUMBER> for the producing software's version, while 1 SUBM @<XREF:SUBM>@ references the submitter record elsewhere in the file using a unique cross-reference identifier.[5] An optional 1 COPR <COPYRIGHT_GEDCOM_FILE> tag includes a copyright notice to protect the dataset.[5] In GEDCOM 7.0, these elements are retained but with UTF-8 as the exclusive encoding and stricter URI recommendations for the SOUR tag to enhance interoperability.[10]
A representative example of a Header Block in GEDCOM 5.5.1 format is:
This setup follows the body block, which contains the core genealogical records, including a submitter record such as0 HEAD 1 SOUR Family Historian 2 VERS 7.0.10 1 GEDC 2 VERS 5.5.1 2 FORM LINEAGE-LINKED 1 CHAR UTF-8 1 SUBM @S1@ 1 COPR Copyright 2025 by Example User0 HEAD 1 SOUR Family Historian 2 VERS 7.0.10 1 GEDC 2 VERS 5.5.1 2 FORM LINEAGE-LINKED 1 CHAR UTF-8 1 SUBM @S1@ 1 COPR Copyright 2025 by Example User
0 @S1@ SUBM with details like name.[5]
Common errors in the Header Block include mismatched version declarations between GEDC VERS and the actual file structure, leading to import failures in parsers that enforce strict compliance.[5] Omitting required tags like CHAR or SUBM can also cause data corruption during transmission, as software may default to incompatible encodings or fail to associate the file with a submitter.[10] Proper adherence to these specifications mitigates such issues, promoting reliable exchange of genealogical data.[5]
Body Block
The Body Block constitutes the core data payload of a GEDCOM file, immediately following the Header Block and encapsulating all genealogical records in a hierarchical, line-based format.[5] It comprises a series of logical records, each initiated by a level 0 line such as0 @I1@ INDI for an individual or 0 @F1@ FAM for a family group, with subordinate lines detailing attributes and events.[5] These substructures include event records like 1 BIRT for birth details (potentially nested with 2 DATE for dates or 2 PLAC for places) and attribute records such as 1 SEX M for gender, allowing for multi-level nesting to represent complex relationships and facts.[5] In GEDCOM 7.0, this structure persists with similar leveled lines and substructures, though parsing simplifications like the elimination of line continuations via CONC (replaced by CONT) streamline handling of nested elements.[9]
Records within the Body Block are organized hierarchically by indentation levels (ranging from 0 to 99, without leading zeros), where each level indicates subordination to the preceding line, enabling a tree-like representation of data.[5] While there is no mandated sequence for top-level records across the block—allowing submitters to arrange them by preference—substructures within a given record adhere to a conventional order, such as events preceding attributes.[5] Cross-references facilitate interconnections between records through unique pointers (e.g., @<XREF:INDI>@), which link elements like a family record's children to individual records via 1 FAMC @F1@.[5] This pointer system ensures data cohesion without requiring physical adjacency, supporting bidirectional relationships in the genealogy.[9]
Indexing in the Body Block relies implicitly on these pointers rather than explicit indices, as parsers process the file line-by-line to construct a relational graph from the links.[5] Upon encountering a pointer, compliant software resolves it by scanning for the corresponding record elsewhere in the block, building an in-memory model of entities and their associations.[9] This approach accommodates dynamic data volumes but demands efficient parsing to handle potential forward references.[5]
Due to extensive nesting—particularly in notes (1 NOTE) and source citations (1 SOUR) that can embed further substructures—GEDCOM files in the Body Block phase can expand significantly, often reaching megabytes for large pedigrees.[5] To mitigate memory constraints during processing, GEDCOM 5.5.1 recommends constraining individual logical records to under 32 kilobytes, fitting typical buffers of the era.[5] GEDCOM 7.0 removes such explicit limits on nesting depth or line length (previously capped at 255 characters), permitting greater flexibility at the cost of increased computational demands for deeply nested datasets.[9]
Trailer Block
The Trailer Block serves as the simple closing segment of a GEDCOM file, consisting of a single mandatory line at level 0 formatted as0 TRLR. This tag specifies the end of the GEDCOM transmission, with no associated value or subordinate structures permitted.[5]
Its primary role is to mark the completion of the data transmission, thereby preventing errors from partial file reads by informing parsers that no further content follows.[10] In some multi-disk or segmented transmissions, it appears only on the final segment to confirm overall completeness.[5] Strict parsers treat the absence of the trailer as an indication of an invalid or incomplete file, often triggering processing errors.[10]
Historically, the trailer evolved from simpler termination indicators in early GEDCOM drafts to a standardized, robust endpoint mechanism, ensuring reliable interchange in versions from 4.0 onward.[4] It directly follows the preceding body records to delineate the file's boundary.[5]
Sample File Excerpt
To illustrate the practical structure of a GEDCOM file, consider the following minimal example, which includes a header block, a basic body with one submitter record, one individual record, and one family record, and a trailer block. This example conforms to the GEDCOM 5.5 standard and demonstrates core syntax elements such as levels, tags, pointers, and values.[5]This example can be broken down line by line to highlight key components:0 HEAD 1 SOUR PAF 2 VERS 2.1 1 DATE 15 NOV 1995 1 FILE MYFILE.GED 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR ANSEL 1 SUBM @S1@ 0 @S1@ SUBM 1 NAME Example User 0 @I1@ [INDI](/page/Indi) 1 NAME John /Smith/ 1 [SEX](/page/Sex) M 1 BIRT 2 DATE 12 MAY 1960 0 @F1@ FAM 1 HUSB @I1@ 1 WIFE @I2@ 1 CHIL @I3@ 0 TRLR0 HEAD 1 SOUR PAF 2 VERS 2.1 1 DATE 15 NOV 1995 1 FILE MYFILE.GED 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED 1 CHAR ANSEL 1 SUBM @S1@ 0 @S1@ SUBM 1 NAME Example User 0 @I1@ [INDI](/page/Indi) 1 NAME John /Smith/ 1 [SEX](/page/Sex) M 1 BIRT 2 DATE 12 MAY 1960 0 @F1@ FAM 1 HUSB @I1@ 1 WIFE @I2@ 1 CHIL @I3@ 0 TRLR
0 HEAD: Initiates the header block at level 0, marking the start of the file. The level 0 indicates a top-level record.[5]1 SOUR PAF: At level 1 (subordinate to HEAD), this tag identifies the software source ("PAF" for Personal Ancestral File) used to generate the file.[5]2 VERS 2.1: At level 2 (further subordinate), the VERS tag specifies the version of the source software.[5]1 DATE 15 NOV 1995: Level 1 under HEAD records the file creation date in a standardized format.[5]1 FILE MYFILE.GED: Level 1 under HEAD names the transmission file.[5]1 GEDC: Level 1 under HEAD begins the GEDCOM version details.[5]2 VERS 5.5: Level 2 under GEDC specifies the GEDCOM standard version.[5]2 FORM LINEAGE-LINKED: Level 2 under GEDC defines the file form, here the common lineage-linked structure for family trees.[5]1 CHAR ANSEL: Level 1 under HEAD declares the character set (ANSEL, an older encoding; 5.5.1 and later files often use UTF-8).[5]1 SUBM @S1@: Level 1 under HEAD references the submitter record via unique pointer @S1@.[5]0 @S1@ SUBM: Level 0 starts the submitter record with pointer @S1@ and SUBM tag.[5]1 NAME Example User: Level 1 under SUBM provides the submitter's name.[5]0 @I1@ [INDI](/page/Indi): Level 0 starts the body block with an individual record;@I1@is a unique pointer (xref ID) for referencing, followed by the INDI tag for a person.[5]1 NAME John /Smith/: Level 1 under INDI provides the name, with slashes delimiting surname.[5]1 [SEX](/page/Sex) M: Level 1 under INDI specifies gender (M for male).[5]1 BIRT: Level 1 under INDI introduces a birth event.[5]2 DATE 12 MAY 1960: Level 2 under BIRT gives the event date.[5]0 @F1@ FAM: Level 0 starts a family record;@F1@is its pointer, with FAM tag for family group.[5]1 HUSB @I1@: Level 1 under FAM links the husband via pointer@I1@.[5]1 WIFE @I2@: Level 1 under FAM links the wife (pointer@I2@assumes another INDI record, omitted here for brevity).[5]1 CHIL @I3@: Level 1 under FAM links a child (pointer@I3@assumes another INDI).[5]0 TRLR: Level 0 ends the file, marking the trailer block.[5]
@...@ format, the tag, and the value; subordinate lines use incremented levels to denote hierarchy, while continuation of long values employs the CONT or CONC tags at the next level with a leading space.[5]
Versions
GEDCOM 5.5 and 5.5.1
GEDCOM 5.5, released on January 2, 1996, with errata on January 10, 1996, represented a major update to the standard by adopting the American National Standards Institute (ANSI) ANSEL character set, enabling better handling of diacritical marks and special characters common in international genealogical records.[11] This version introduced refined date formats supporting multiple calendars, including Gregorian, Julian, Hebrew, and French Revolutionary, along with qualifiers such as "about" (ABT), "estimated" (EST), and "calculated" (CALC) for imprecise dates.[5] The place (PLAC) structure was enhanced to include a hierarchical jurisdiction path, specified via a FORM substructure, allowing representations like "Springfield, Sangamon County, Illinois, United States" for greater locational precision.[5] Key innovations in GEDCOM 5.5 included the Association (ASSO) tag, which links individuals through non-familial relationships like friends, neighbors, or witnesses, using a RELA subtag to describe the nature of the association.[5] It also added the Repository (REPO) record for cataloging sources, complete with call numbers and addresses, improving source management and citation traceability.[5] These features built on earlier versions while maintaining backward compatibility, with most implementations able to parse GEDCOM 5.5 files as a baseline for data exchange.[5] GEDCOM 5.5.1, released on November 15, 2019, offered minor corrections and enhancements to address ambiguities in the prior version.[5] It formalized Unicode support, including UTF-8 encoding, to accommodate a broader range of international scripts and reduce reliance on ANSEL.[5] Multimedia integration via Object (OBJE) records was streamlined by eliminating embedded binary data (BLOB) in favor of external file references, with FORM and TYPE substructures specifying formats like JPEG or TIFF for images and audio.[5] Event structures received clarifications, such as refined <<EVENT_DETAIL>> components for attributes like religion (RELI), ensuring more consistent representation of life events.[5] As of 2025, GEDCOM 5.5 and 5.5.1 continue to dominate genealogy software ecosystems due to their stability, extensive vendor support, and seamless interoperability with legacy datasets, serving as the de facto standard for file exchanges despite the availability of newer specifications.[12]GEDCOM 7.0
FamilySearch released GEDCOM 7.0 on May 19, 2021, as the latest major revision of the standard for exchanging genealogical data, aiming to address limitations in earlier versions by incorporating modern data handling practices.[7] The specification has undergone minor updates, with version 7.0.16 issued on March 18, 2025, incorporating patches for improved clarity and implementation guidance without altering core data structures.[13] This version maintains the hierarchical line-based format while introducing semantic enhancements to support more precise and extensible data representation. GEDCOM 7.0 introduces support for structured extensions using URI-defined schemas, enabling JSON-like flexibility for custom data types such as enumerated values and ages, which enhances interoperability across diverse software.[9] It improves semantic data handling, particularly for role-based relationships in events and family structures, allowing explicit definitions of participant roles (e.g., witness, informant) to better capture complex genealogical contexts beyond simple parent-child links.[9] Key innovations include enhanced multimedia embedding through the MULTIMEDIA_RECORD structure and GEDZIP packaging, which bundles external files like images and audio directly with the GEDCOM stream for seamless transfer.[14] The specification supports probabilistic and approximate dates via structures like DatePhrase for expressions of uncertainty (e.g., "about 1850" or ranges with calendars), multiple calendar systems (Gregorian, Julian, Hebrew, French Revolutionary), and period notations, reducing ambiguities in historical records.[9] Place data is augmented with coordinate support using MAP.LATL and MAP.LONG tags for latitude and longitude, facilitating geospatial integration in mapping tools.[9] Internationalization is strengthened by mandating UTF-8 encoding throughout and introducing the LANG tag for language specification, ensuring global compatibility without legacy character set issues.[10] Adoption of GEDCOM 7.0 has been integrated into FamilySearch's core tools for family tree management and export, with growing support in third-party software such as RootsMagic and Family Historian.[15] It includes mechanisms for backward compatibility, allowing import of GEDCOM 5.5 and 5.5.1 files while mapping legacy structures to new semantics, though some breaking changes require validation during conversion.[14] Since the initial 2021 release, updates have focused on patches for validation rules, expanded handling of research notes through versatile NOTE structures, and refined citation schemas in SOURCE records to better accommodate evidence evaluation and multi-source linking.[16] These revisions, tracked via semantic versioning on the official GitHub repository, emphasize stability and developer tools for conformance testing.[8]Release Timeline
The development of GEDCOM began in 1984 when the Family History Department of The Church of Jesus Christ of Latter-day Saints (LDS Church) released its first internal version, GEDCOM 1.0, as a proposed standard for exchanging genealogical data within their systems.[15] Subsequent internal iterations, such as version 2.0 in late 1985 and 2.1 in early 1987, were used in software like Personal Ancestral File (PAF) but remained non-public.[3] The first public release occurred on October 9, 1987, with GEDCOM 3.0, which introduced the lineage-linked form for representing family relationships and was made available for broader adoption by genealogical software developers.[3] This was followed by version 4.0 on August 4, 1989, which refined the structure for wider compatibility.[17] Version 5.0 arrived on September 25, 1991, enhancing lineage-linked structures to better handle complex pedigrees.[17] Interim drafts appeared in the early 1990s, including 5.1 in September 1992 and 5.3 in November 1993, which experimented with features like Unicode support and multimedia but were never finalized.[3] The major milestone of version 5.5 was released on January 2, 1996 (with errata on January 10), incorporating structured addresses, additional name parts, and contributions from standards bodies like the National Genealogical Society, though it did not achieve formal ANSI ratification.[11] A minor update, GEDCOM 5.5.1, followed on November 15, 2019, adding support for UTF-8 encoding, email addresses, URLs, and geographic coordinates while maintaining backward compatibility.[11] No official version 6.0 was ever released; a beta draft proposing XML-based storage was circulated in December 2002 for developer feedback but was abandoned in favor of alternative formats like GEDCOM X.[18] After a long hiatus, FamilySearch released GEDCOM 7.0 on May 19, 2021, as the first major update in over two decades, introducing semantic versioning, improved multimedia handling via GEDZIP packaging, and resolutions to prior ambiguities.[11] This version has seen ongoing patches, with the latest being 7.0.16 on March 18, 2025, focusing on refinements and interoperability.[13]| Version | Release Date | Status | Key Notes |
|---|---|---|---|
| 1.0 | 1984 | Internal/Proposed | Initial LDS Church development.[15] |
| 3.0 | 1987-10-09 | Public Standard | First public release; lineage-linked form.[3] |
| 4.0 | 1989-08-04 | Standard | Compatibility refinements.[17] |
| 5.0 | 1991-09-25 | Standard | Enhanced structures.[17] |
| 5.5 | 1996-01-02 | Standard | Address and name improvements (errata 1996-01-10).[11] |
| 5.5.1 | 2019-11-15 | Standard | Encoding and metadata additions.[11] |
| 7.0 | 2021-05-19 | Standard | Semantic versioning; GEDZIP support; latest patch 7.0.16 (2025-03-18).[10][13] |
Key Features
Multimedia Integration
GEDCOM supports the integration of multimedia elements, such as images, audio, and documents, primarily through the OBJE record type, which allows genealogical software to reference or embed media files associated with individuals, families, or events.[5] The OBJE record is defined at level 0 as0 @O1@ OBJE, serving as a container for media details without storing the actual file data in earlier versions.[5] Subordinate tags within the OBJE record include 1 FILE photo.jpg to specify the file path or reference, followed by 2 FORM JPG to indicate the media format, ensuring compatibility across systems.[5]
Linking multimedia to core records occurs via a pointer tag, such as 1 OBJE @O1@ under an individual's (INDI) or family's (FAM) event structure, enabling direct association without duplicating file information.[5] In GEDCOM 5.5, optional embedding via binary large objects (BLOB) was supported, but this was deprecated in 5.5.1 and later versions, limiting integration to external file references to maintain file portability and simplicity.[5] Additional metadata, such as descriptive notes via 1 NOTE This is a [family](/page/Family) [portrait](/page/Portrait) from 1950, can accompany the OBJE to provide context like captions.[5]
GEDCOM 7.0 maintains external file references for multimedia but introduces GEDZIP, a ZIP archive format with .gdz extension, to bundle the GEDCOM file and associated media files using local paths (e.g., media/filename), enabling self-contained transmission particularly useful for web-based applications.[9] This version also expands metadata options, including NOTE for detailed captions and CROP subtags under MULTIMEDIA_LINK (e.g., 1 CROP 2 TOP 10 2 LEFT 20 2 HEIGHT 100 2 WIDTH 150) to specify image coordinates for cropping or zooming.[9] Legacy limitations persist in older implementations, where only references are supported, potentially complicating data transfer if files are not bundled separately.[9]
Common use cases include attaching photographs to family (FAM) records to visualize group portraits or events, and linking audio files to individual (INDI) records for oral histories, such as digitized recordings of personal narratives.[19] For instance, a sound bite of an ancestor's story can be referenced alongside a scanned photo, enriching the genealogical context without altering the core text-based structure.[20]