Fact-checked by Grok 2 weeks ago

Chromosome 5

Chromosome 5 is one of the 23 pairs of chromosomes found in the nucleus of human cells, inherited one from each parent, and it spans approximately 181 million base pairs of DNA, representing nearly 6 percent of the total genetic material in cells. This metacentric chromosome is among the largest in the human genome and encodes around 900 protein-coding genes that provide instructions for producing proteins essential for various cellular functions, including immune response and cell signaling. Despite its size, chromosome 5 has one of the lowest gene densities in the genome, characterized by numerous intrachromosomal duplications that contribute to segmental repeats and lower coding sequence proportion compared to other chromosomes. Key structural features of chromosome 5 include a prominent cluster of interleukin genes on the long arm (5q), such as IL3, IL5, and IL13, which play critical roles in regulating immune and inflammatory responses. The long arm (5q) houses genes like APC, mutations in which are linked to familial adenomatous polyposis, a condition predisposing individuals to colorectal cancer, and RPS14, involved in ribosomal function. Deletions or abnormalities in chromosome 5 are associated with several genetic disorders; for instance, partial deletion of the short arm (5p) causes cri-du-chat syndrome, characterized by intellectual disability, delayed development, and a distinctive cat-like cry in infancy, while interstitial deletions on 5q lead to 5q- syndrome, a subtype of myelodysplastic syndrome featuring macrocytic anemia. Other notable conditions include 5q31.3 microdeletion syndrome, which results in developmental delays, hypotonia, and seizures, and rearrangements involving PDGFRB on 5q32 associated with chronic eosinophilic leukemia. Research on chromosome 5 has advanced through complete sequencing efforts, revealing its evolutionary and role in susceptibility, with ongoing studies exploring its contributions to cancer, immune disorders, and neurodevelopmental conditions using high-resolution genomic mapping and approaches.

General Characteristics

Size and Composition

Chromosome 5 is the fifth largest in the , spanning 181,538,259 base pairs in the GRCh38.p14 assembly and representing approximately 5.8% of the total haploid genome DNA. It exhibits a submetacentric structure, with the positioned slightly off-center to divide the chromosome into a shorter p arm and a longer q arm. The of chromosome 5 features a of 39.2%, slightly below the average of approximately 41% for the . Repetitive elements account for roughly 46% of its sequence, including substantial contributions from long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (), which are enriched in gene-poor regions and influence the chromosome's overall architecture. This , derived from the complete GRCh38 , underscores chromosome 5's relatively low gene density relative to other autosomes.

Karyotypic Features

Chromosome 5 is classified as submetacentric in the karyotype, featuring a positioned slightly offset from the midpoint that divides it into a shorter (short) of approximately and a longer q (long) of approximately 133 . This ratio contributes to its characteristic L-shaped appearance under microscopic examination. In the standard classification system, established for organizing chromosomes by size and position, chromosome 5 holds the fifth position overall and is grouped in category B alongside as one of the two largest submetacentric autosomes. During arrest in , when chromosomes are most condensed and visible, chromosome 5 stands out as one of the larger elements in the complement, typically measuring around 6-7% of the total haploid length. Distinct from acrocentric chromosomes (such as , , , 21, and ), which bear nucleolar organizer regions (NORs) on their short arms for production, chromosome 5 lacks such secondary constrictions, simplifying its structural profile. Its termini are capped by conventional sequences, consisting of repetitive TTAGGG motifs that protect against end-to-end fusions and degradation, maintaining chromosomal integrity across divisions. Cytogenetic banding techniques, such as , further facilitate its recognition by highlighting differential staining along the arms.

Genes

Number and Density

Chromosome 5 harbors approximately 900 protein-coding genes, accounting for about 4.5% of the estimated 20,000 protein-coding genes in the , even though the chromosome constitutes roughly 6% of the total genomic length of approximately 3.1 billion base pairs. This distribution underscores the uneven allocation of genetic material across chromosomes. The initial sequencing and analysis of 5 in identified 923 protein-coding genes, a figure that has been refined downward with subsequent annotations. The gene density on chromosome 5 stands at around 5 protein-coding genes per megabase (), calculated based on its of about 181 , which is notably lower than the average of approximately 6.8 genes per . This reduced density arises primarily from extensive intergenic regions and a higher proportion of segmental duplications, which occupy about 3.5% of the and contribute to gene-poor expanses. For context, examples of such gene-poor regions include large pericentromeric areas that span tens of megabases with minimal functional elements. Contemporary annotations from projects like Ensembl and GENCODE report a similar count of roughly 900 protein-coding genes, while the total number of annotated loci on chromosome 5 reaches approximately 1,700, encompassing genes and pseudogenes; however, only the protein-coding subset is considered broadly functional in terms of . These updates reflect ongoing refinements in annotation pipelines that integrate manual curation with computational predictions to enhance accuracy.

Notable Genes and Clusters

Chromosome 5 harbors several notable genes and gene clusters with significant biological roles. One prominent example is the gene, located at 5q22.2, which encodes a tumor suppressor protein that acts as an antagonist of the , regulating , adhesion, and migration through its interactions with β-catenin and . The protein's multidomain structure enables it to localize to multiple subcellular compartments, including the , nucleus, and kinetochores, where it facilitates proper chromosome segregation during . Another key gene is at 5q13.2, which produces the survival motor neuron protein essential for the assembly of small nuclear ribonucleoproteins (snRNPs) and spliceosomal complexes, thereby supporting pre-mRNA splicing and RNA processing across various cell types, with particularly high expression in s. This protein also contributes to broader cellular functions, such as and maintenance of neuromuscular junctions. A significant on chromosome 5 is the cytokine cluster at 5q31.1, encompassing genes such as IL3, IL4, IL5, and IL13, which encode Th2-type that orchestrate immune responses, including B-cell activation, differentiation, and IgE class switching to promote and allergic . These genes are tightly linked within a 500 kb region, reflecting evolutionary conservation for coordinated expression in immune regulation. At 5q35.1 lies the NPM1 gene, encoding nucleophosmin 1, a nucleolar that functions as a molecular chaperone in , facilitating the transport of ribosomal proteins and the maturation of ribosomal subunits while also regulating duplication and . Its ability to shuttle between the , nucleoplasm, and underscores its versatile role in maintaining cellular . Among other important genes, CTNND2 at 5p15.2 encodes δ-catenin, a protein critical for neuronal development, where it links cadherin-based cell adhesion to the actin cytoskeleton, supporting synaptic maturation, dendritic spine formation, and neuronal migration during brain development. Similarly, FBN2 at 5q23.1 produces fibrillin-2, a glycoprotein integral to the extracellular matrix that assembles into microfibrils, providing structural support for elastic fibers in connective tissues and contributing to tissue elasticity and organ morphogenesis.

Role in Disease

Gene-Specific Disorders

Familial adenomatous polyposis () is an autosomal dominant disorder caused by mutations in the gene located at 5q22.2, leading to the development of hundreds to thousands of colorectal adenomas typically starting in or early adulthood. These mutations are predominantly loss-of-function variants, including and frameshift alterations that truncate the APC protein, disrupting its role in the and promoting uncontrolled cell proliferation in the colonic epithelium. The condition affects approximately 1 in 8,300 individuals worldwide, with about 20-30% of cases arising from mutations and the remainder inherited from an affected parent, conferring a 50% risk to each offspring. Without intervention such as prophylactic , nearly all individuals with FAP develop by age 40, highlighting the critical need for genetic screening and surveillance. Spinal muscular atrophy (SMA) results from biallelic mutations in the SMN1 gene at 5q13.2, following an autosomal recessive inheritance pattern that requires pathogenic variants in both alleles for disease manifestation. The most common molecular pathology involves homozygous deletion of exon 7 in SMN1, accounting for about 95% of cases, while the remaining 5% feature compound heterozygosity with one deletion and a point mutation, such as missense or splice-site variants, leading to insufficient survival motor neuron (SMN) protein essential for motor neuron maintenance. SMA has a prevalence of approximately 1 in 10,000 live births, with carrier frequency around 1 in 50 in the general population, and disease severity is modulated by the copy number of the nearby SMN2 gene, where higher copies partially compensate for SMN1 loss. De novo mutations occur in about 2% of cases, slightly altering standard recurrence risks for unaffected carrier parents. Mutations in the FBN2 gene at 5q23.1 underlie (CCA), a Marfan-like autosomal dominant disorder characterized by joint contractures, , , and occasional cardiovascular features such as . Pathogenic variants, primarily missense mutations or small in-frame deletions affecting residues in the fibrillin-2 protein, disrupt microfibril assembly in , leading to tissue fragility; about 25-75% of clinically diagnosed cases harbor identifiable FBN2 mutations. CCA is rare, with prevalence estimated at less than 1 in 10,000 individuals, and most cases are inherited, though de novo mutations contribute significantly given variable expressivity. Unlike classic , CCA spares the aorta in most instances, but skeletal and ocular manifestations predominate, necessitating multidisciplinary management.

Chromosomal Abnormalities

Chromosomal abnormalities involving large-scale structural changes to chromosome 5, such as deletions, duplications, and aneuploidies, are associated with a variety of syndromes and conditions, particularly affecting neurodevelopment and hematopoiesis. These alterations often result from mutations or unbalanced translocations during or early embryonic development, leading to variable phenotypes depending on the size and location of the affected region. Diagnosis of such abnormalities typically relies on karyotyping to visualize gross chromosomal rearrangements and (FISH) to confirm specific deletions or duplications at targeted loci. One of the most well-characterized conditions is cri-du-chat syndrome, caused by a terminal deletion in the short arm at 5p15, typically spanning 5-20 Mb of genetic material. This deletion leads to a high-pitched, cat-like cry in infancy, , , and facial dysmorphisms, with phenotypes varying based on deletion size and involvement of multiple genes such as CTNND2, which contributes to severe mental retardation. The incidence is estimated at 1 in 15,000 to 50,000 live births, predominantly affecting females slightly more than males, and most cases arise without familial recurrence. The 5q- syndrome, a subtype of (MDS), results from an interstitial deletion at 5q31-33, disrupting hematopoiesis and leading to , normal or elevated platelet counts, and hypolobated megakaryocytes. This condition predominantly affects elderly women and carries a risk of progression to in approximately 10-20% of cases, with isolated 5q deletions occurring in about 10-15% of all MDS patients. The deletion affects multiple genes critical for and , contributing to the characteristic clinical features. 5q31.3 microdeletion syndrome is a rare condition caused by a small deletion on the long arm of chromosome 5 at 5q31.3, often spanning thousands to millions of base pairs and including the PURA gene. It is characterized by severely delayed speech and motor skills development, such as walking; weak muscle tone (); feeding difficulties; breathing problems; recurrent seizures; and distinctive facial features including a narrow forehead, , tented upper lip, , and micrognathia. Brain abnormalities, such as delayed production, are also common. Fewer than 10 cases have been reported, and it typically occurs in an autosomal dominant manner. Partial 5p, involving duplication of the short arm, is a rare condition often occurring in form, resulting in developmental delays, congenital heart defects, seizures, and craniofacial anomalies. Unlike full trisomy 5, which is typically lethal, partial 5p duplications are compatible with life but lead to significant morbidity, with fewer than 100 cases reported in the literature. The severity correlates with the extent of the duplicated segment, usually involving bands 5p13 to 5p15. Isochromosome 5q, denoted as i(5)(q10), involves loss of the entire short arm (5p) and duplication of the long arm (5q), and has been recurrently observed in myeloid malignancies such as MDS and . This abnormality leads to 5p and 5q, potentially contributing to leukemogenesis through of 5p genes and overexpression of 5q oncogenes, often in the context of complex karyotypes. It is less common than 5q deletions but shares prognostic implications in hematologic disorders. PDGFRB-associated chronic eosinophilic leukemia is a caused by chromosomal rearrangements involving the PDGFRB gene at 5q32, such as t(5;12)(q31-33;p12-13), which fuses with partner genes like ETV6 on 12. This results in a fusion gene leading to uncontrolled proliferation of and possibly other blood cells, causing persistent , enlarged or liver, and skin rashes from abnormal immune responses. The exact prevalence is unknown, but it shows a strong male predominance (up to 9:1). These somatic rearrangements are acquired and respond well to tyrosine kinase inhibitors like .

Cytogenetics and Mapping

Banding Patterns

Banding patterns of human chromosomes, including chromosome 5, were revolutionized in the early 1970s with the development of techniques that revealed distinct light and dark regions along the chromosome arms. , the most widely used method, was introduced by A. T. Sumner in 1971 through treatment of chromosomes with followed by Giemsa , producing a characteristic pattern of approximately 400 to 850 bands per haploid genome depending on chromosome condensation. This technique preferentially stains AT-rich, late-replicating heterochromatic regions as dark G-positive (G+) bands, which are typically gene-poor, while GC-rich, early-replicating euchromatic regions appear as light G-negative (G-) bands that are gene-rich. The International System for Human Cytogenomic Nomenclature (ISCN) standardizes the description of these bands for chromosome 5, numbering them sequentially from the to the on the short (p) arm and from the to the on the long (q) arm. On the p arm, bands progress from 5p15 at the (subdivided into 5p15.3, 5p15.2, and 5p15.1) through 5p14, 5p13, 5p12, to 5p11 adjacent to the ; on the q arm, they extend from 5q11 (subdivided into 5q11.2 and 5q11.1) through 5q12, 5q13, 5q14, 5q21, 5q22, 5q23.1, 5q23.2, 5q31 (subdivided into 5q31.1, 5q31.2, and 5q31.3), 5q32, 5q33, 5q34, to 5q35 at the . At high-resolution (850-band) level, these patterns enable precise localization of structural variants, with G+ bands often showing hierarchical splitting into darker and lighter sub-bands upon stretching, while G- bands remain uniform. Several bands on chromosome 5 are cytogenetically significant due to their association with critical regions in genetic disorders. The telomeric band 5p15 harbors the core critical region for cri-du-chat syndrome, spanning 5p15.3 to 5p15.2, where deletions produce the syndrome's characteristic phenotype. On the q arm, 5q22 contains the APC tumor suppressor gene locus, frequently mutated in familial adenomatous polyposis. Band 5q31 is notable for hosting a cytokine gene cluster (including IL3, IL4, IL5, and CSF2) and as the common deleted region in 5q- syndrome, a myelodysplastic disorder involving interstitial deletions typically from 5q31.1 to 5q31.3.

Sequencing and Genomic Analysis

The complete sequence of human chromosome 5 was published in 2004 by an international led by the Sanger Institute and the Department of Energy Joint Institute, marking a significant milestone in the . This effort assembled approximately 177.7 million base pairs of finished sequence, with an estimated total length of 181 megabases when including gaps and heterochromatic regions. The analysis identified 923 protein-coding s, alongside numerous pseudogenes and non-coding elements, highlighting the chromosome's relatively low gene density compared to other human chromosomes. Notably, the sequence revealed extensive intrachromosomal duplications, particularly in the 5q arm, including complex regions like the 5q13.3 locus associated with structural complexity. The sequencing employed a hierarchical approach, utilizing bacterial artificial (BAC) and fosmid clones from libraries such as RPCI-11 for high-depth coverage in duplicated regions, integrated with hybrid maps and overgo ization to resolve gaps. Public sequence data were incorporated to achieve a finished standard, with over 12-fold redundancy in euchromatic regions. Subsequent refinements occurred through the Reference Consortium's efforts, culminating in the GRCh38 assembly released in 2013, which extended the 5 sequence to 180,557,866 base pairs by filling gaps, correcting misassemblies, and incorporating alternate contigs for structural variants. Further patches, up to GRCh38.p14 in 2023, addressed minor errors and added novel sequences without altering core coordinates. Key findings from the initial and updated analyses underscore 's genomic architecture, including segmental duplications spanning 3.49% (about 6.26 megabases) of the assembled sequence—lower than the genome-wide average of 5.3%—with clusters of high-identity blocks (≥97.5%) in pericentromeric and telomeric areas. The exhibits a relatively low rate, contributing to its evolutionary stability, as evidenced by conserved non-coding elements and reduced variability in gene-poor duplicated segments. revealed strong evolutionary conservation, with large-scale synteny to portions of chromosomes 13 and 14, including conserved orders in regions like 5q31–q35 mapping to chromosome 13. These insights, refined in GRCh38, highlight inversion events and conserved regulatory elements across mammals. The chromosome 5 sequence has facilitated applications in , enabling high-resolution mapping of structural variants and integration with next-generation sequencing for population-scale variant discovery, as demonstrated in projects like the . This has supported bioinformatics tools for aligning reads to duplicated regions and detecting copy-number variations with improved accuracy.