GEDCOM - Grokipedia

Fact-checked by Grok 2 weeks ago

History and Development

Origins and Initial Creation

The development of GEDCOM originated within the Family History Department of The Church of Jesus Christ of Latter-day Saints (LDS Church) in 1984, as part of broader efforts to computerize family history research and facilitate the exchange of genealogical data among Church members and their software tools.^[1] This initiative was deeply motivated by the Church's doctrinal emphasis on temple ordinances, including baptisms and endowments for deceased ancestors, which required accurate tracking and sharing of lineage-linked information to support ordinance reservations and avoid duplication.^[1]^[2] The initial version, known as GEDCOM 1.0, was released in 1984 as a straightforward, human-readable text-based format designed primarily for mainframe computer systems used in the Church's Ancestral File database.^[1]^[3] This format employed line-based records with level indicators and tags to represent hierarchical family structures, enabling the transfer of pedigree and family group data without proprietary software dependencies. Key contributors included members of the LDS Family History Department.^[4] Early adoption of GEDCOM was largely confined to LDS-specific applications, such as the Personal Ancestral File (PAF) software, which the Church released in 1984 to empower members in compiling and submitting family data for temple work. PAF integrated GEDCOM export capabilities starting with version 2.0 in 1985, allowing users to submit digital files directly to Church systems for ordinance tracking and integration into centralized databases.^[3] This limited scope reflected GEDCOM's initial focus on internal Church needs before broader genealogical community involvement.

Standardization and Evolution

The GEDCOM specification emerged from collaborative efforts within the Family History Department of The Church of Jesus Christ of Latter-day Saints, with GEDCOM 4.0 released in August 1989 as a key standardized version, building on earlier drafts to define a uniform format for genealogical data exchange.^[4] This release marked a shift toward broader industry adoption, moving beyond its initial creation within the LDS Church to encourage participation from external developers and software producers.^[5] Prepared by the Projects and Planning Division under Data Administration, the standard emphasized flexibility and compatibility to support the growing ecosystem of genealogical tools.^[4] The evolution of GEDCOM was primarily driven by the imperative for interoperability among diverse software applications, prompting invitations to commercial vendors to register their products and incorporate the Lineage-Linked GEDCOM Form for seamless data sharing.^[5] Notable examples include Broderbund's Family Tree Maker and Leister Productions' Reunion, which integrated GEDCOM support to enable users to transfer family history data across platforms without loss of structure.^[6] This vendor involvement helped establish GEDCOM as a de facto industry standard, fostering a wide range of interoperable products while maintaining backward compatibility with prior versions.^[5] In the post-2010 era, FamilySearch, as the steward of the specification, has played a central role in its ongoing maintenance and enhancement, culminating in the release of GEDCOM 7.0 in 2021, with subsequent minor updates continuing as of 2025 to address modern needs.^[7]^[8] Collaborative development accelerated through initiatives like the RootsTech 2020 effort, involving industry stakeholders to update the standard based on GEDCOM 5.5.1.^[7] FamilySearch has further promoted open-source contributions by hosting the specification on a public GitHub repository at gedcom.io, allowing developers to review, suggest improvements, and ensure continued relevance in genealogical research.^[7]

Data Model

Hierarchical Records and Levels

GEDCOM employs a tree-like hierarchical structure to organize genealogical data, where information is represented as nested records and substructures. This model uses numeric levels to denote parent-child relationships, beginning with level 0 for top-level records that serve as the primary entities in a family tree. Each subsequent level indicates subordination to the nearest preceding line at a lower level, creating a logical nesting that mirrors familial and event-based connections without requiring a relational database schema.^[5]^[9] The core record types at level 0 include Individual (INDI) for personal details, Family (FAM) for marital or parental units, and Source (SOUR) for bibliographic references, among others such as Repository (REPO) and Note (NOTE). Each record initiates with a level 0 line followed by a unique cross-reference identifier (XREF), such as 0 @I1@ INDI, which acts as a pointer for linking across the file. Substructures under these records appear at level 1 or higher, encapsulating attributes, events, and multimedia references; for instance, an individual's birth event might nest as 1 BIRT with further details like date at level 2 (2 DATE 15 NOV 1950). This indentation via levels ensures that data like names, occupations, or residences are contextually tied to their parent record.^[5]^[9] Relationships between records are established through cross-reference pointers rather than duplication, promoting data integrity and efficiency. For example, a Family record links to Individual records via tags like 1 HUSB @I1@ for the husband and 1 CHIL @I2@ for a child, allowing bidirectional navigation without repeating personal details. This pointer system extends to associations, such as an individual's family membership via 1 FAMC @F1@, enabling complex pedigrees while maintaining the hierarchical nesting for intra-record elements like events and notes.^[5]^[9] Unlike flat-file or tabular databases, GEDCOM's hierarchy emphasizes parent-child nesting to group temporally or thematically related data, such as sequencing life events under an individual or embedding citations within sources. This approach facilitates the representation of irregular, narrative-driven genealogical information, where substructures can vary in depth and cardinality to accommodate diverse family histories.^[5]^[9]

Tags, Values, and Pointers

In GEDCOM, tags serve as three- or four-letter mnemonic codes that identify the type of data element within a line, providing semantic meaning in the hierarchical structure. These tags are always uppercase and typically abbreviated for brevity, such as NAME for a person's name, BIRT for birth event, or DEAT for death.^[5] Tags are defined in the specification's appendix, distinguishing between standard tags approved for universal use and user-defined extensions prefixed with an underscore (e.g., _MYTAG), which allow customization without conflicting with core elements.^[5] Within records, certain tags are mandatory—such as NAME in an individual (INDI) record—to ensure completeness, while others like SOUR (source citation) are optional but recommended for verifiability.^[5] In GEDCOM 7.0, tags are further formalized with URIs for semantic interoperability (e.g., g7:NAME), enhancing machine readability while maintaining backward compatibility with prior versions.^[9] Values follow the tag on each line, separated by a single space, and represent the actual data content associated with that tag. They are text-based strings limited to 255 characters per line in GEDCOM 5.5.1, with longer values extended using continuation tags like CONC (concatenation without newline) or CONT (continuation with newline) to preserve formatting.^[5] For example, a name value might appear as John /Doe/, where slashes delimit surname components, or a place as Cove, Cache, Utah, USA.^[5] Special characters in values are handled via escape sequences, such as doubling the at sign (@@) to include a literal @, or using @#LANG@ to specify language (e.g., @#ENGLISH@).^[5] GEDCOM 7.0 removes the CONC tag and character limits, favoring UTF-8 encoding for unrestricted text handling and multi-line CONT for notes.^[9] Pointers, also known as cross-reference identifiers (XREFs), enable linkages between records using a unique format enclosed in at signs: @<identifier>@, where the identifier is an alphanumeric string up to 22 characters (e.g., @I123@ for an individual).^[5] These appear optionally at the start of a line after the level number, such as in 1 CHIL @I123@ to link a child to an individual record, ensuring no duplicates within a file.^[5] Pointers are distinct from values by their @...@ delimiters and are used exclusively for referencing, not storing data. In GEDCOM 7.0, pointers support a null value (@VOID@) for optional links and integrate with URI-based tags for extended semantics.^[9] GEDCOM employs specific data types for values to standardize common genealogical elements, parsed line-by-line for efficiency. Dates use a structured format like <calendar> <day> <month> <year>, with escape sequences for calendars (e.g., @#DGREG@ 3 JAN 2000 for Gregorian), supporting ranges (BET 1904 AND 1915) and approximations (ABT 1920).^[5] Places are free-form but conventionally hierarchical (e.g., City, County, State, Country), often paired with a FORM tag for jurisdiction details.^[5] Notes allow unstructured text for annotations, continued across lines with CONT to embed research context without altering hierarchy.^[5] In GEDCOM 7.0, dates incorporate a PHRASE substructure for dual-date handling (e.g., old vs. new style), and all data types align with XML Schema primitives like xsd:string for broader compatibility.^[9] This line-based syntax—comprising level, optional pointer, tag, and value—facilitates simple parsing while accommodating the format's emphasis on portability across systems.^[5]

File Structure

Header Block

The Header Block is the mandatory initial segment of a GEDCOM file, beginning with the level 0 HEAD tag to delineate the start of the transmission and provide essential metadata for parsers to interpret the file correctly.^[5] This block declares the GEDCOM version, source software, character encoding, submitter reference, and optional copyright information, ensuring compatibility across genealogical software systems.^[5] By specifying these elements, the Header Block allows receiving applications to validate the file format and handle data appropriately before processing the subsequent body records. The header must include a reference to a submitter record in the body via the SUBM tag.^[5] The structure commences with 0 HEAD, followed by required level 1 substructures such as 1 GEDC containing 2 VERS 5.5.1 to indicate the GEDCOM specification version, and 1 CHAR UTF-8 (valid in 5.5.1 and later; ANSEL or ASCII in earlier versions) to define the character set for text rendering.^[5] The source is identified via 1 SOUR <APPROVED_SYSTEM_ID>, often accompanied by 2 VERS <VERSION_NUMBER> for the producing software's version, while 1 SUBM @<XREF:SUBM>@ references the submitter record elsewhere in the file using a unique cross-reference identifier.^[5] An optional 1 COPR <COPYRIGHT_GEDCOM_FILE> tag includes a copyright notice to protect the dataset.^[5] In GEDCOM 7.0, these elements are retained but with UTF-8 as the exclusive encoding and stricter URI recommendations for the SOUR tag to enhance interoperability.^[10] A representative example of a Header Block in GEDCOM 5.5.1 format is:

0 HEAD
1 SOUR Family Historian
2 VERS 7.0.10
1 GEDC
2 VERS 5.5.1
2 FORM LINEAGE-LINKED
1 CHAR UTF-8
1 SUBM @S1@
1 COPR Copyright 2025 by Example User
0 HEAD
1 SOUR Family Historian
2 VERS 7.0.10
1 GEDC
2 VERS 5.5.1
2 FORM LINEAGE-LINKED
1 CHAR UTF-8
1 SUBM @S1@
1 COPR Copyright 2025 by Example User

This setup follows the body block, which contains the core genealogical records, including a submitter record such as 0 @S1@ SUBM with details like name.^[5] Common errors in the Header Block include mismatched version declarations between GEDC VERS and the actual file structure, leading to import failures in parsers that enforce strict compliance.^[5] Omitting required tags like CHAR or SUBM can also cause data corruption during transmission, as software may default to incompatible encodings or fail to associate the file with a submitter.^[10] Proper adherence to these specifications mitigates such issues, promoting reliable exchange of genealogical data.^[5]

Body Block

The Body Block constitutes the core data payload of a GEDCOM file, immediately following the Header Block and encapsulating all genealogical records in a hierarchical, line-based format.^[5] It comprises a series of logical records, each initiated by a level 0 line such as 0 @I1@ INDI for an individual or 0 @F1@ FAM for a family group, with subordinate lines detailing attributes and events.^[5] These substructures include event records like 1 BIRT for birth details (potentially nested with 2 DATE for dates or 2 PLAC for places) and attribute records such as 1 SEX M for gender, allowing for multi-level nesting to represent complex relationships and facts.^[5] In GEDCOM 7.0, this structure persists with similar leveled lines and substructures, though parsing simplifications like the elimination of line continuations via CONC (replaced by CONT) streamline handling of nested elements.^[9] Records within the Body Block are organized hierarchically by indentation levels (ranging from 0 to 99, without leading zeros), where each level indicates subordination to the preceding line, enabling a tree-like representation of data.^[5] While there is no mandated sequence for top-level records across the block—allowing submitters to arrange them by preference—substructures within a given record adhere to a conventional order, such as events preceding attributes.^[5] Cross-references facilitate interconnections between records through unique pointers (e.g., @<XREF:INDI>@), which link elements like a family record's children to individual records via 1 FAMC @F1@.^[5] This pointer system ensures data cohesion without requiring physical adjacency, supporting bidirectional relationships in the genealogy.^[9] Indexing in the Body Block relies implicitly on these pointers rather than explicit indices, as parsers process the file line-by-line to construct a relational graph from the links.^[5] Upon encountering a pointer, compliant software resolves it by scanning for the corresponding record elsewhere in the block, building an in-memory model of entities and their associations.^[9] This approach accommodates dynamic data volumes but demands efficient parsing to handle potential forward references.^[5] Due to extensive nesting—particularly in notes (1 NOTE) and source citations (1 SOUR) that can embed further substructures—GEDCOM files in the Body Block phase can expand significantly, often reaching megabytes for large pedigrees.^[5] To mitigate memory constraints during processing, GEDCOM 5.5.1 recommends constraining individual logical records to under 32 kilobytes, fitting typical buffers of the era.^[5] GEDCOM 7.0 removes such explicit limits on nesting depth or line length (previously capped at 255 characters), permitting greater flexibility at the cost of increased computational demands for deeply nested datasets.^[9]

Trailer Block

The Trailer Block serves as the simple closing segment of a GEDCOM file, consisting of a single mandatory line at level 0 formatted as 0 TRLR. This tag specifies the end of the GEDCOM transmission, with no associated value or subordinate structures permitted.^[5] Its primary role is to mark the completion of the data transmission, thereby preventing errors from partial file reads by informing parsers that no further content follows.^[10] In some multi-disk or segmented transmissions, it appears only on the final segment to confirm overall completeness.^[5] Strict parsers treat the absence of the trailer as an indication of an invalid or incomplete file, often triggering processing errors.^[10] Historically, the trailer evolved from simpler termination indicators in early GEDCOM drafts to a standardized, robust endpoint mechanism, ensuring reliable interchange in versions from 4.0 onward.^[4] It directly follows the preceding body records to delineate the file's boundary.^[5]

Sample File Excerpt

To illustrate the practical structure of a GEDCOM file, consider the following minimal example, which includes a header block, a basic body with one submitter record, one individual record, and one family record, and a trailer block. This example conforms to the GEDCOM 5.5 standard and demonstrates core syntax elements such as levels, tags, pointers, and values.^[5]

0 HEAD
1 SOUR PAF
2 VERS 2.1
1 DATE 15 NOV 1995
1 FILE MYFILE.GED
1 GEDC
2 VERS 5.5
2 FORM LINEAGE-LINKED
1 CHAR ANSEL
1 SUBM @S1@
0 @S1@ SUBM
1 NAME Example User
0 @I1@ [INDI](/page/Indi)
1 NAME John /Smith/
1 [SEX](/page/Sex) M
1 BIRT
2 DATE 12 MAY 1960
0 @F1@ FAM
1 HUSB @I1@
1 WIFE @I2@
1 CHIL @I3@
0 TRLR
0 HEAD
1 SOUR PAF
2 VERS 2.1
1 DATE 15 NOV 1995
1 FILE MYFILE.GED
1 GEDC
2 VERS 5.5
2 FORM LINEAGE-LINKED
1 CHAR ANSEL
1 SUBM @S1@
0 @S1@ SUBM
1 NAME Example User
0 @I1@ [INDI](/page/Indi)
1 NAME John /Smith/
1 [SEX](/page/Sex) M
1 BIRT
2 DATE 12 MAY 1960
0 @F1@ FAM
1 HUSB @I1@
1 WIFE @I2@
1 CHIL @I3@
0 TRLR

This example can be broken down line by line to highlight key components:

0 HEAD: Initiates the header block at level 0, marking the start of the file. The level 0 indicates a top-level record.^[5]
1 SOUR PAF: At level 1 (subordinate to HEAD), this tag identifies the software source ("PAF" for Personal Ancestral File) used to generate the file.^[5]
2 VERS 2.1: At level 2 (further subordinate), the VERS tag specifies the version of the source software.^[5]
1 DATE 15 NOV 1995: Level 1 under HEAD records the file creation date in a standardized format.^[5]
1 FILE MYFILE.GED: Level 1 under HEAD names the transmission file.^[5]
1 GEDC: Level 1 under HEAD begins the GEDCOM version details.^[5]
2 VERS 5.5: Level 2 under GEDC specifies the GEDCOM standard version.^[5]
2 FORM LINEAGE-LINKED: Level 2 under GEDC defines the file form, here the common lineage-linked structure for family trees.^[5]
1 CHAR ANSEL: Level 1 under HEAD declares the character set (ANSEL, an older encoding; 5.5.1 and later files often use UTF-8).^[5]
1 SUBM @S1@: Level 1 under HEAD references the submitter record via unique pointer @S1@.^[5]
0 @S1@ SUBM: Level 0 starts the submitter record with pointer @S1@ and SUBM tag.^[5]
1 NAME Example User: Level 1 under SUBM provides the submitter's name.^[5]
0 @I1@ [INDI](/page/Indi): Level 0 starts the body block with an individual record; @I1@ is a unique pointer (xref ID) for referencing, followed by the INDI tag for a person.^[5]
1 NAME John /Smith/: Level 1 under INDI provides the name, with slashes delimiting surname.^[5]
1 [SEX](/page/Sex) M: Level 1 under INDI specifies gender (M for male).^[5]
1 BIRT: Level 1 under INDI introduces a birth event.^[5]
2 DATE 12 MAY 1960: Level 2 under BIRT gives the event date.^[5]
0 @F1@ FAM: Level 0 starts a family record; @F1@ is its pointer, with FAM tag for family group.^[5]
1 HUSB @I1@: Level 1 under FAM links the husband via pointer @I1@.^[5]
1 WIFE @I2@: Level 1 under FAM links the wife (pointer @I2@ assumes another INDI record, omitted here for brevity).^[5]
1 CHIL @I3@: Level 1 under FAM links a child (pointer @I3@ assumes another INDI).^[5]
0 TRLR: Level 0 ends the file, marking the trailer block.^[5]

When parsing GEDCOM files, note that each line must not exceed 255 characters, including the level, tags, and value, to ensure compatibility across systems. Whitespace rules are strict: lines begin immediately with the level number (no leading spaces), followed by an optional xref ID in @...@ format, the tag, and the value; subordinate lines use incremented levels to denote hierarchy, while continuation of long values employs the CONT or CONC tags at the next level with a leading space.^[5]

Versions

GEDCOM 5.5 and 5.5.1

GEDCOM 5.5, released on January 2, 1996, with errata on January 10, 1996, represented a major update to the standard by adopting the American National Standards Institute (ANSI) ANSEL character set, enabling better handling of diacritical marks and special characters common in international genealogical records.^[11] This version introduced refined date formats supporting multiple calendars, including Gregorian, Julian, Hebrew, and French Revolutionary, along with qualifiers such as "about" (ABT), "estimated" (EST), and "calculated" (CALC) for imprecise dates.^[5] The place (PLAC) structure was enhanced to include a hierarchical jurisdiction path, specified via a FORM substructure, allowing representations like "Springfield, Sangamon County, Illinois, United States" for greater locational precision.^[5] Key innovations in GEDCOM 5.5 included the Association (ASSO) tag, which links individuals through non-familial relationships like friends, neighbors, or witnesses, using a RELA subtag to describe the nature of the association.^[5] It also added the Repository (REPO) record for cataloging sources, complete with call numbers and addresses, improving source management and citation traceability.^[5] These features built on earlier versions while maintaining backward compatibility, with most implementations able to parse GEDCOM 5.5 files as a baseline for data exchange.^[5] GEDCOM 5.5.1, released on November 15, 2019, offered minor corrections and enhancements to address ambiguities in the prior version.^[5] It formalized Unicode support, including UTF-8 encoding, to accommodate a broader range of international scripts and reduce reliance on ANSEL.^[5] Multimedia integration via Object (OBJE) records was streamlined by eliminating embedded binary data (BLOB) in favor of external file references, with FORM and TYPE substructures specifying formats like JPEG or TIFF for images and audio.^[5] Event structures received clarifications, such as refined <<EVENT_DETAIL>> components for attributes like religion (RELI), ensuring more consistent representation of life events.^[5] As of 2025, GEDCOM 5.5 and 5.5.1 continue to dominate genealogy software ecosystems due to their stability, extensive vendor support, and seamless interoperability with legacy datasets, serving as the de facto standard for file exchanges despite the availability of newer specifications.^[12]

GEDCOM 7.0

FamilySearch released GEDCOM 7.0 on May 19, 2021, as the latest major revision of the standard for exchanging genealogical data, aiming to address limitations in earlier versions by incorporating modern data handling practices.^[7] The specification has undergone minor updates, with version 7.0.16 issued on March 18, 2025, incorporating patches for improved clarity and implementation guidance without altering core data structures.^[13] This version maintains the hierarchical line-based format while introducing semantic enhancements to support more precise and extensible data representation. GEDCOM 7.0 introduces support for structured extensions using URI-defined schemas, enabling JSON-like flexibility for custom data types such as enumerated values and ages, which enhances interoperability across diverse software.^[9] It improves semantic data handling, particularly for role-based relationships in events and family structures, allowing explicit definitions of participant roles (e.g., witness, informant) to better capture complex genealogical contexts beyond simple parent-child links.^[9] Key innovations include enhanced multimedia embedding through the MULTIMEDIA_RECORD structure and GEDZIP packaging, which bundles external files like images and audio directly with the GEDCOM stream for seamless transfer.^[14] The specification supports probabilistic and approximate dates via structures like DatePhrase for expressions of uncertainty (e.g., "about 1850" or ranges with calendars), multiple calendar systems (Gregorian, Julian, Hebrew, French Revolutionary), and period notations, reducing ambiguities in historical records.^[9] Place data is augmented with coordinate support using MAP.LATL and MAP.LONG tags for latitude and longitude, facilitating geospatial integration in mapping tools.^[9] Internationalization is strengthened by mandating UTF-8 encoding throughout and introducing the LANG tag for language specification, ensuring global compatibility without legacy character set issues.^[10] Adoption of GEDCOM 7.0 has been integrated into FamilySearch's core tools for family tree management and export, with growing support in third-party software such as RootsMagic and Family Historian.^[15] It includes mechanisms for backward compatibility, allowing import of GEDCOM 5.5 and 5.5.1 files while mapping legacy structures to new semantics, though some breaking changes require validation during conversion.^[14] Since the initial 2021 release, updates have focused on patches for validation rules, expanded handling of research notes through versatile NOTE structures, and refined citation schemas in SOURCE records to better accommodate evidence evaluation and multi-source linking.^[16] These revisions, tracked via semantic versioning on the official GitHub repository, emphasize stability and developer tools for conformance testing.^[8]

Release Timeline

The development of GEDCOM began in 1984 when the Family History Department of The Church of Jesus Christ of Latter-day Saints (LDS Church) released its first internal version, GEDCOM 1.0, as a proposed standard for exchanging genealogical data within their systems.^[15] Subsequent internal iterations, such as version 2.0 in late 1985 and 2.1 in early 1987, were used in software like Personal Ancestral File (PAF) but remained non-public.^[3] The first public release occurred on October 9, 1987, with GEDCOM 3.0, which introduced the lineage-linked form for representing family relationships and was made available for broader adoption by genealogical software developers.^[3] This was followed by version 4.0 on August 4, 1989, which refined the structure for wider compatibility.^[17] Version 5.0 arrived on September 25, 1991, enhancing lineage-linked structures to better handle complex pedigrees.^[17] Interim drafts appeared in the early 1990s, including 5.1 in September 1992 and 5.3 in November 1993, which experimented with features like Unicode support and multimedia but were never finalized.^[3] The major milestone of version 5.5 was released on January 2, 1996 (with errata on January 10), incorporating structured addresses, additional name parts, and contributions from standards bodies like the National Genealogical Society, though it did not achieve formal ANSI ratification.^[11] A minor update, GEDCOM 5.5.1, followed on November 15, 2019, adding support for UTF-8 encoding, email addresses, URLs, and geographic coordinates while maintaining backward compatibility.^[11] No official version 6.0 was ever released; a beta draft proposing XML-based storage was circulated in December 2002 for developer feedback but was abandoned in favor of alternative formats like GEDCOM X.^[18] After a long hiatus, FamilySearch released GEDCOM 7.0 on May 19, 2021, as the first major update in over two decades, introducing semantic versioning, improved multimedia handling via GEDZIP packaging, and resolutions to prior ambiguities.^[11] This version has seen ongoing patches, with the latest being 7.0.16 on March 18, 2025, focusing on refinements and interoperability.^[13]

Version	Release Date	Status	Key Notes
1.0	1984	Internal/Proposed	Initial LDS Church development.^[15]
3.0	1987-10-09	Public Standard	First public release; lineage-linked form.^[3]
4.0	1989-08-04	Standard	Compatibility refinements.^[17]
5.0	1991-09-25	Standard	Enhanced structures.^[17]
5.5	1996-01-02	Standard	Address and name improvements (errata 1996-01-10).^[11]
5.5.1	2019-11-15	Standard	Encoding and metadata additions.^[11]
7.0	2021-05-19	Standard	Semantic versioning; GEDZIP support; latest patch 7.0.16 (2025-03-18).^[10]^[13]

Key Features

Multimedia Integration

GEDCOM supports the integration of multimedia elements, such as images, audio, and documents, primarily through the OBJE record type, which allows genealogical software to reference or embed media files associated with individuals, families, or events.^[5] The OBJE record is defined at level 0 as 0 @O1@ OBJE, serving as a container for media details without storing the actual file data in earlier versions.^[5] Subordinate tags within the OBJE record include 1 FILE photo.jpg to specify the file path or reference, followed by 2 FORM JPG to indicate the media format, ensuring compatibility across systems.^[5] Linking multimedia to core records occurs via a pointer tag, such as 1 OBJE @O1@ under an individual's (INDI) or family's (FAM) event structure, enabling direct association without duplicating file information.^[5] In GEDCOM 5.5, optional embedding via binary large objects (BLOB) was supported, but this was deprecated in 5.5.1 and later versions, limiting integration to external file references to maintain file portability and simplicity.^[5] Additional metadata, such as descriptive notes via 1 NOTE This is a [family](/page/Family) [portrait](/page/Portrait) from 1950, can accompany the OBJE to provide context like captions.^[5] GEDCOM 7.0 maintains external file references for multimedia but introduces GEDZIP, a ZIP archive format with .gdz extension, to bundle the GEDCOM file and associated media files using local paths (e.g., media/filename), enabling self-contained transmission particularly useful for web-based applications.^[9] This version also expands metadata options, including NOTE for detailed captions and CROP subtags under MULTIMEDIA_LINK (e.g., 1 CROP 2 TOP 10 2 LEFT 20 2 HEIGHT 100 2 WIDTH 150) to specify image coordinates for cropping or zooming.^[9] Legacy limitations persist in older implementations, where only references are supported, potentially complicating data transfer if files are not bundled separately.^[9] Common use cases include attaching photographs to family (FAM) records to visualize group portraits or events, and linking audio files to individual (INDI) records for oral histories, such as digitized recordings of personal narratives.^[19] For instance, a sound bite of an ancestor's story can be referenced alongside a scanned photo, enriching the genealogical context without altering the core text-based structure.^[20]

Source Citations and Conflicting Data

In GEDCOM, source citations are primarily handled through the SOUR tag, which allows users to attribute specific pieces of genealogical data to their evidentiary origins. The SOUR tag appears as a level 1 structure (e.g., 1 SOUR @S1@) within event or fact substructures, pointing to a separate SOURCE_RECORD via a cross-reference identifier. This SOURCE_RECORD contains detailed metadata, such as the source's title (TITL tag), author (AUTH tag), publication details (PUBL tag), and repository information (REPO tag), enabling comprehensive documentation without redundancy.^[5]^[9] To enhance citation precision, substructures under SOUR include the PAGE tag for specifying exact locations within the source (e.g., 2 PAGE 45) and the TEXT tag for excerpting relevant verbatim content (e.g., 2 TEXT Birth date as 12 May 1920). Multiple SOUR tags can be attached to a single fact, accommodating variant interpretations from different sources, such as conflicting birth dates from census records versus vital certificates. The QUAY tag further assesses citation reliability on a scale from 0 (unreliable) to 3 (primary evidence), aiding in evaluating evidential weight.^[5]^[9] For conflicting data, GEDCOM recommends representing discrepancies—such as variant event dates or places—in separate event structures, each with its own source citations to preserve evidential context without merging or overwriting information. The ASSO tag links associated individuals to the cited data (e.g., 1 ASSO @I2@ 2 RELA Witness), while the NOTE tag provides explanatory commentary on discrepancies (e.g., 2 NOTE Conflicting date likely due to transcription error). In GEDCOM 7.0, these mechanisms remain core, with enhancements like UTF-8 support for multilingual notes improving clarity in international contexts, though no dedicated CONFL tag is introduced for explicit conflicts. Pointers via cross-references (@XREF@) facilitate linking these elements across records.^[5]^[9] Best practices emphasize retaining all sourced variants to avoid data loss during file exchanges, using the _UID (unique identifier) tag to track individual records and changes across software implementations. This approach ensures traceability, as overwriting conflicting data can obscure research provenance, while structured citations promote interoperability among genealogy applications.^[5]^[9]

Internationalization Support

GEDCOM provides mechanisms for handling international data, enabling the representation of genealogical information across diverse languages, scripts, and cultural contexts. Early versions, such as 5.5, primarily relied on the ANSEL character set, an 8-bit extension of ASCII designed for Latin-based languages with diacritics, as specified in the HEADER's CHAR tag (e.g., "1 CHAR ANSEL").^[5] This encoding supported most Western European characters but had limitations for non-Latin scripts.^[5] In contrast, GEDCOM 7.0 mandates UTF-8 encoding exclusively, aligning with ISO/IEC 10646:2020 and RFC 3629, to accommodate a broader range of Unicode characters, including those from Asian, African, and Middle Eastern languages.^[10] Files in this version use the .ged extension and recommend a U+FEFF byte-order mark for compatibility.^[10] The LANG tag facilitates language identification, allowing users to specify the human language of textual content. In GEDCOM 5.5, this is implemented as a level 1 tag in the header (e.g., "1 LANG en-US"), indicating the primary language for reading or writing the data, with examples including English, French, and Hebrew.^[5] GEDCOM 7.0 enhances this with BCP 47 compliant tags (e.g., "en", "es", "he"), applied in structures like SUBM.LANG or NOTE.LANG to denote the language of specific elements, supporting multilingual documents.^[10] This tag aids in processing and display, particularly for locale-specific formatting. Place names are structured hierarchically using the PLAC tag, listing jurisdictions from smallest to largest (e.g., "2 PLAC City, County, State, Country"), with no abbreviations permitted to ensure clarity.^[5] In GEDCOM 5.5, phonetic and romanized variations are supported via FONE and ROMN substructures, optionally with a TYPE tag for the phonetisation method.^[5] GEDCOM 7.0 extends this with TRAN substructures under PLAC, enabling translations or transliterations tied to specific LANG tags (e.g., "2 PLAC 千代田, 東京, 日本" for Japanese), fully leveraging UTF-8 for non-Latin scripts like Cyrillic, Arabic, or Chinese characters.^[10] Date handling accommodates cultural calendars through escape sequences. GEDCOM 5.5 uses DATE_CALENDAR_ESCAPE delimiters, such as @#DGREG@ for Gregorian, @#DJUL@ for Julian, and @#DHEBR@ for Hebrew (e.g., "@#DHEBR@ 1 Tishrei 5700"), supporting events in non-Gregorian systems like the Hebrew lunar calendar.^[5] French Revolutionary dates follow @#DFRNC R@.^[5] While standard calendars do not explicitly include lunisolar systems like the Chinese calendar, custom escapes allow representation of cultural events such as Chinese New Year via Unicode-encoded month names.^[5] In GEDCOM 7.0, calendars are formalized with URIs (e.g., g7:cal-GREGORIAN, g7:cal-HEBREW), and dual dates (e.g., Gregorian/Hebrew) use PHRASE structures for precision, with Hebrew months like תִּשְׁרֵי rendered in original script.^[10] GEDCOM 7.0 addresses advanced internationalization challenges by relying on UTF-8 and Unicode-aware libraries for right-to-left (RTL) text rendering, as seen in Hebrew or Arabic place names and dates, without proprietary extensions.^[10] Locale-specific sorting is guided by LANG tags, enabling applications to apply appropriate collation rules (e.g., ignoring diacritics in French or handling gematria in Hebrew), though implementations must use standard Unicode algorithms for consistency.^[10] These features ensure GEDCOM's viability for global genealogy, from European diacritics to East Asian ideographs.

Limitations

Multi-Person Events and Relationships

In GEDCOM, events involving multiple individuals, such as births (BIRT) and marriages (MARR), are structured to link participants through specific tags and pointers. The BIRT event, typically recorded under an individual's (INDI) record, can reference the family of origin via a FAMC (family as child) pointer, which connects the child to a family (FAM) record containing the parents as husband (HUSB) and wife (WIFE). Similarly, the MARR event is placed under a FAM record and directly includes HUSB and WIFE pointers to the involved individuals, with optional ROLE tags to specify roles like "Bride" or "Groom" (e.g., 1 ROLE Wife). These structures allow events to associate multiple people without embedding full details of all participants in a single event block.^[5] However, GEDCOM's design primarily emphasizes nuclear family units, limiting direct support for more complex multi-person scenarios. Family records (FAM) are constrained to one husband and one wife, requiring multiple FAM records for polygamous or serial relationships, which can fragment data. Adoption and step-relations are handled indirectly: adoptions use an ADOP event under the INDI's FAMC with enumerations like HUSB or WIFE to indicate which parent is adoptive, while step-relationships often rely on ASSO (association) tags with RELA (relationship) descriptors or custom underscore-prefixed tags (e.g., _STEPMOTHER) for non-standard ties. This approach, while flexible, often results in incomplete or vendor-specific representations of extended kinships.^[5] GEDCOM 7.0 introduces enhancements to better accommodate multi-person events and non-linear relationships. Semantic roles are expanded through the ROLE tag under ASSO, allowing explicit designations such as INFORM (informant) or WITN (witness) for participants beyond primary family members, enabling more precise graphing of interactions. The specification also improves support for complex topologies by combining multiple FAMS pointers with ASSO links and unique identifiers (UID), facilitating bidirectional connections in non-nuclear structures like blended families without excessive duplication. These changes aim to model relationships as a more interconnected graph rather than isolated hierarchies. However, as of November 2025, adoption of GEDCOM 7.0 remains limited, with version 5.5.1 continuing as the de facto standard in most genealogy software, meaning these enhancements are not yet widely realized.^[1]^[21] A common challenge in GEDCOM files arises from handling shared events across multiple individuals, leading to redundancy. For instance, a marriage event must be duplicated in the FAM record and referenced separately in each spouse's INDI record via FAMS, potentially creating inconsistent details if not managed carefully. This duplication can inflate file sizes and increase error risks during imports, though unique identifiers in later versions help mitigate some inconsistencies.^[5]

Specification Ambiguities

The GEDCOM standard, particularly in versions 5.5 and 5.5.1, contains several vague definitions that hinder consistent implementation across software applications. For instance, optional tags such as ADDR (address structure) and PLAC (place structure) within event details lack precise guidance on their hierarchical placement relative to other substructures, leading to variations in how location data is encoded and interpreted during file exchanges.^[5] Additionally, the specification does not enforce uniqueness for cross-reference identifiers (XREFs) beyond basic recommendations, allowing duplicate or conflicting pointers that can cause data loss or mislinking when importing files into different genealogy programs.^[5] Version drift exacerbates these issues through the widespread use of private tags, which begin with an underscore (_) to denote non-standard extensions. While the standard permits these for software-specific features, it provides no mechanisms for standardization or documentation, resulting in proprietary data that becomes inaccessible or corrupted in incompatible systems.^[5] This proliferation of undocumented private tags has led to significant interoperability challenges, as applications often ignore or mishandle unrecognized elements without clear fallback rules.^[14] Parsing variances further complicate data exchange, particularly with line continuations using CONC (concatenate) and CONT (continue) tags. The semantics specify that CONC joins text without inserting a newline or altering spacing, while CONT adds a newline to preserve paragraph breaks, but implementations vary in handling edge cases like leading spaces or multi-line payloads, often resulting in garbled notes or addresses.^[5] Similarly, date approximations such as ABT (about) are defined as indicating inexact timing, but the lack of quantitative guidance—e.g., whether "ABT 1900" implies a range of years or months—leads to inconsistent sorting, searching, and validation across tools.^[5] GEDCOM 7.0 addresses many of these ambiguities through stricter schemas and explicit validation guidelines. It mandates UTF-8 encoding exclusively, eliminates the CONC tag in favor of CONT for all continuations, and enforces unique XREFs document-wide to prevent duplication errors.^[10] Extensions are now formalized via URI-mapped schemas in a public registry, reducing version drift by encouraging standardized private tags, while detailed rules for date parsing—including clearer handling of approximations—improve overall compliance and testing via open-source validators. However, as of November 2025, adoption of GEDCOM 7.0 remains limited, with version 5.5.1 continuing as the de facto standard in most genealogy software, meaning these enhancements are not yet widely realized.^[10]^[14]^[21]

Undated Event Ordering

In GEDCOM files, events within individual (INDI) or family (FAM) records are assumed to be listed in chronological order according to the submitter's intent, serving as the default sequence without dedicated sort keys or explicit chronological enforcement in the specification.^[5] This convention relies on the order of equal-level tags to reflect preference, with the first occurrence typically deemed most important.^[5] Undated events, such as a residence or occupation lacking a specific year, pose significant challenges to this assumed timeline, as they lack date values for automated sorting and may appear in arbitrary positions depending on the importing software's reordering logic.^[22] Different genealogy programs handle these events variably—some place them at the end of lists or by entry order—potentially disrupting the intended sequence and leading to inconsistent interpretations across tools.^[23] Common workarounds include assigning approximate dates using phrases like "BET 1900 AND 1910" to estimate ranges or incorporating contextual details via NOTE structures to guide manual review.^[5] GEDCOM 7.0 introduces the SDATE substructure under EVENT_DETAIL, enabling a non-historical sorting hint (e.g., a normalized date for display purposes) to maintain intended order for undated or ambiguously dated events without compromising the primary DATE value. However, as of November 2025, adoption of GEDCOM 7.0 remains limited, with version 5.5.1 continuing as the de facto standard in most genealogy software, meaning these enhancements are not yet widely realized.^[9]^[21] These ordering issues affect the reliability of timeline visualizations and narrative generation in genealogy software, where precise event sequences are essential for constructing coherent life stories and avoiding logical inconsistencies in reports.^[23]

Extensions and Alternatives

GEDCOM X

GEDCOM X is an open specification developed by FamilySearch in 2012 for exchanging genealogical data essential to the research process. It defines a conceptual data model and serialization formats in XML and JSON to represent structured information about persons, relationships, sources, and conclusions. Unlike the line-based format of traditional GEDCOM, GEDCOM X incorporates RDF semantics to enable more flexible modeling of relationships and linked data.^[24]^[25]^[26] Key differences from GEDCOM 5.5 include its modular architecture, which allows extensions for specific elements such as places and conclusions, enabling more precise and extensible data representation without altering the core model. It maintains backward compatibility through dedicated converters that transform GEDCOM 5.5 files into GEDCOM X format losslessly, facilitating integration with legacy systems. This approach addresses the rigidity of traditional GEDCOM by providing a more adaptable framework for modern data exchange.^[27]^[28]^[29] Notable features encompass versioned resources via standardized headers for metadata like timestamps and revisions, as well as web-oriented design supporting REST APIs for seamless integration in networked environments. These elements make GEDCOM X suitable for API-driven applications while preserving the integrity of genealogical narratives.^[27]^[30] GEDCOM X has been adopted primarily within FamilySearch's developer platform, where it underpins APIs for data interchange and family tree management, though it serves as a complementary format rather than a complete replacement for GEDCOM.^[29]^[31]

Other Genealogical Data Formats

While GEDCOM remains the de facto standard for genealogical data interchange, several alternative formats have emerged to address its limitations in handling complex structures, multimedia, and modern data privacy needs. These alternatives often leverage XML or simpler structures like CSV and JSON, offering greater flexibility for specific use cases such as open-source software integration or web-based applications.^[32] The Gramps XML format, developed for the open-source Gramps genealogy software, provides a comprehensive XML-based structure for storing and exchanging genealogical data. It supports advanced features like detailed event relationships, multimedia embeddings, and custom attributes that GEDCOM struggles to represent without loss of information. As the native format for Gramps, it enables lossless backups and imports, making it ideal for users prioritizing data integrity in complex family trees.^[33] GenXML serves as another XML-centric alternative, designed specifically for data exchange between genealogy programs as a more extensible option to GEDCOM 5.5. Originating from European development efforts, it emphasizes structured schemas for persons, families, and sources. It is used in niche European software.^[34] Simpler open formats such as CSV and JSON have gained traction in web-based genealogy tools for their ease of use in data exports and imports, facilitating quick analysis or integration with spreadsheets, though these lack GEDCOM's hierarchical tagging for relationships and events. CSV exports, supported by tools like Gramps, enable bulk spreadsheet handling but require manual reconstruction of linkages, suiting lightweight web apps over full database migrations.^[35] In comparisons, GEDCOM's plain-text simplicity promotes broad compatibility across legacy software, but XML-based formats like Gramps XML and GenXML offer richer schemas for multimedia and relationships, reducing data loss during transfers. As of 2025, no single alternative has displaced GEDCOM's dominance, with adoption varying by software ecosystem—XML for robust desktop tools and CSV/JSON for agile web integrations.^[36]

References

[1]
GEDCOM - FamilySearch
GEDCOM is a data structure created by The Church of Jesus Christ of Latter-day Saints for storing and exchanging genealogical information.
[2]
The GEDCOM Standard Release 5.5: Introduction - RootsWeb
The GEDCOM Standard is a technical document written for computer programmers, system developers, and technically sophisticated users. It covers the following ...
[3]
GEDCOM X - FamilySearch
Jun 29, 2023 · GEDCOM X is a series of specifications for an open data model and an open serialization format to exchange data essential to the genealogical research process.
[4]
The History of FamilySearch Indexing
May 7, 2025 · The year 1978 saw further expansion of the indexing effort as volunteer members of the Church (both inside and outside of Utah) were asked to ...
[5]
[PDF] The GEDCOM Standard
GEDCOM was developed by the Family History Department of The Church of Je‐ sus Christ of Latter-day Saints to provide a flexible, ...
[6]
Chapter 6: Computers and Family History Research
The Church's Family History Department Develops and Maintains Computer Resources for Family History Research [6.2]. FamilySearch is a modern miracle. [6.2.1].Missing: origins | Show results with:origins
[7]
GEDCOM versions
GEDCOM Versions. date, version, status, brief notes. 1984, 1.0, proposed standard, First Version. 1985-12, 2.0, standard, PAF 2.0. 1987-02, 2.1, standard ...Missing: LDS | Show results with:LDS
[8]
GEDCOM Genealogy Tools - FamilySearch
Explore FamilySearch GEDCOM 7.0 - now with photo and file support via GEDZip for easier family tree sharing and improved genealogy tools.
[9]
[PDF] GEDCOM 4.0
individuals for whom temple work has been requested, for which the family ... a temple ordinance. HIST. HISTORY. Used to identify a Source: recorded ...
[10]
[PDF] GEDCOM 5.5.1
Nov 15, 2019 · The temple tag and code should always accompany temple ordinance dates. Sometimes the LDS_(ordinance)_DATE_STATUS is used to indicate that an.
[11]
[PDF] The GEDCOM 5.5.5 Specification with Annotations - webtrees
Oct 2, 2019 · The GEDCOM Standard is a technical document written for computer programmers, system ... FamilySearch GEDCOM 4.0 (1989 CE) through 5.5.1 (1999 CE) ...
[12]
Introducing FamilySearch GEDCOM 7
Jan 21, 2022 · Like the previous GEDCOM, FamilySearch GEDCOM 7.0 makes it possible to transfer family tree data from one application or website to another.Missing: 2010 open
[13]
[PDF] The FamilySearch GEDCOM Specification
Dec 31, 2020 · ... Level (p.11) and is used to encode substructure relationships. Any line with level 0 encodes a record or a record-like pseudo-structure. Any ...
[14]
The FamilySearch GEDCOM Specification
GEDCOM was developed by the Family History Department of The Church of Jesus Christ of Latter-day Saints to provide a flexible, uniform format for ...
[15]
[PDF] PRESS RELEASE – GEDCOM 5.5.5 IS A BETTER GEDCOM
Oct 2, 2019 · LEIDEN – 2 October 2019. GEDCOM version 5.5.5 is the first new version of GEDCOM in twenty years. GEDCOM 5.5.1 was introduced on 2 October 1999.
[16]
GEDCOM
GEDCOM is a file format for exchanging genealogical data between different systems. GEDCOM allows you to export your genealogy data from one application, and ...GEDCOM Samples · Gedcom faq · GEDCOM versions · GEDCOM SpecificationsMissing: history committee
[17]
https://gedcom.io/specifications/Gedcom5.0.pdf
[18]
What is the FamilySearch GEDCOM 7.0 standard?
May 29, 2025 · GEDCOM (an acronym for Genealogical Data Communications) was created by The Church of Jesus Christ of Latter-day Saints (the Church) in 1984 ...
[19]
FamilySearch GEDCOM Changelog
The now-deprecated use was common in 5.5.1 and is permitted in 7.0, but can prevent extension structures from being adopted as-is as new standard structures in ...
[20]
Releases · FamilySearch/GEDCOM - GitHub
Releases: FamilySearch/GEDCOM. Releases Tags. Releases · FamilySearch/GEDCOM ... 7.0.0 through 7.0.7. Note that days per month is defined by calendar; is ...
[21]
[PDF] GEDCOM 5.0
Sep 25, 1991 · This technical document is written for computer programmers, system developers, and user specialists. It defines a flexible format for ...
[22]
GEDCOM - PGVWiki
Jun 4, 2010 · On December 6, 2002 a beta version of GEDCOM 6.0 was released for developers to study. GEDCOM 6.0 will be the first version to store data in XML ...
[23]
[PDF] THE GEDCOM STANDARD
Aug 21, 1995 · Purposes for Version 5.x. Earlier versions of The GEDCOM Standard were released in October 1987 (3.0) and August. 1989 (4.0). Versions 1 and 2 ...
[24]
Including Scrapbook Items in a GEDCOM file - Ancestral Quest
You have had grandma Jones tell the story of some old photos into your microphone, and you've attached those digitized sound bites to the scanned photos.Missing: cases oral
[25]
GEDCOM Order of Children - Tamura Jones
Mar 26, 2013 · The GEDCOM specification suggests that GEDCOM export should order children chronologically by birth. The GEDCOM 5.5.Missing: undated | Show results with:undated
[26]
Sort of a Date « Louis Kessler's Behold Blog
Dec 4, 2011 · where ABT means “about” and is for inexact date. CAL is calculated mathematically, e.g. from an event date and age, and EST is estimated based ...
[27]
[PDF] Modeling Genealogical Domain - SciTePress
... 2012), pages 202-207. ISBN: 978-989-8565-30-3. Copyright c. 2012 ... At the same year, FamilySearch organization outlined a major new project called GEDCOM X.<|control11|><|separator|>
[28]
FamilySearch/gedcomx: An open data model and an open ... - GitHub
GEDCOM X defines an open data model and an open serialization format for exchanging the genealogical data essential to the genealogical research process.
[29]
GEDCOM X and RDF
Aug 31, 2011 · GEDCOM X is a set of open specifications for exchanging data essential to the genealogical research process.
[30]
GEDCOM X Specifications
The GEDCOM X Standard Header Set specifies the set of metadata terms that are recognized for genealogical resources and the mechanism for providing that ...Missing: block | Show results with:block
[31]
FamilySearch/gedcom5-conversion: Utilities for GEDCOM 5.5 to ...
This utility converts a GEDCOM 5.5 file to a GEDCOM X file. The utility leverages the GEDCOM 5.5 parsing library contributed by Dallan Quass.Missing: backward | Show results with:backward
[32]
GEDCOM X - FamilySearch API
GEDCOM X is a series of specifications for an open data model and an open serialization format to exchange data essential to the genealogical research process.
[33]
https://gramps-project.org/wiki/index.php/Gramps_XML
[34]
Frequently Asked Questions - GEDCOM X
Is GEDCOM X backwards-compatible? Can GEDCOM X be read by consumers of legacy GEDCOM? No, a new parser is needed. But there is a lossless conversion.
[35]
https://gramps-project.org/wiki/index.php/Gramps_5.1_Wiki_Manual_-_Manage_Family_Trees:_CSV_Import_and_Export
[36]
https://www.tamurajones.net/GEDCOMX.xhtml
[37]
GenXML 2.0 - COSOFT
GenXML is a file format for exchange of data between genealogy programs. It is an alternative to Gedcom 5.5. The idea of GenXML is that: it shall be easy to ...
[38]
Manage Family Trees: CSV Import and Export - Gramps
This format allows you to import/export a spreadsheet of data all at once. The spreadsheet must be in the Comma Separated Value (CSV) format.
[39]
GEDCOM X - Tamura Jones
Dec 12, 2011 · The de facto standard for exchange of genealogical data is GEDCOM , a file format created by FamilySearch, that they stopped maintaining more ...Missing: maintenance post-