Chromosome 5

Chromosome 5 is one of the 23 pairs of chromosomes found in the nucleus of human cells, inherited one from each parent, and it spans approximately 181 million base pairs of DNA, representing nearly 6 percent of the total genetic material in cells.^[1] This metacentric chromosome is among the largest in the human genome and encodes around 900 protein-coding genes that provide instructions for producing proteins essential for various cellular functions, including immune response and cell signaling.^[1] Despite its size, chromosome 5 has one of the lowest gene densities in the genome, characterized by numerous intrachromosomal duplications that contribute to segmental repeats and lower coding sequence proportion compared to other chromosomes.^[2] Key structural features of chromosome 5 include a prominent cluster of interleukin genes on the long arm (5q), such as IL3, IL5, and IL13, which play critical roles in regulating immune and inflammatory responses.^[1] The long arm (5q) houses genes like APC, mutations in which are linked to familial adenomatous polyposis, a condition predisposing individuals to colorectal cancer, and RPS14, involved in ribosomal function.^[1] Deletions or abnormalities in chromosome 5 are associated with several genetic disorders; for instance, partial deletion of the short arm (5p) causes cri-du-chat syndrome, characterized by intellectual disability, delayed development, and a distinctive cat-like cry in infancy, while interstitial deletions on 5q lead to 5q- syndrome, a subtype of myelodysplastic syndrome featuring macrocytic anemia.^[1] Other notable conditions include 5q31.3 microdeletion syndrome, which results in developmental delays, hypotonia, and seizures, and rearrangements involving PDGFRB on 5q32 associated with chronic eosinophilic leukemia.^[1] Research on chromosome 5 has advanced through complete sequencing efforts, revealing its evolutionary conservation and role in disease susceptibility, with ongoing studies exploring its contributions to cancer, immune disorders, and neurodevelopmental conditions using high-resolution genomic mapping and functional genomics approaches.^[2]

General Characteristics

Size and Composition

Chromosome 5 is the fifth largest autosome in the human genome, spanning 181,538,259 base pairs in the GRCh38.p14 assembly and representing approximately 5.8% of the total haploid genome DNA.^[3] It exhibits a submetacentric structure, with the centromere positioned slightly off-center to divide the chromosome into a shorter p arm and a longer q arm.^[4] The nucleotide composition of chromosome 5 features a GC content of 39.2%, slightly below the average of approximately 41% for the human genome.^[5] Repetitive elements account for roughly 46% of its sequence, including substantial contributions from long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs), which are enriched in gene-poor regions and influence the chromosome's overall architecture. This composition, derived from the complete GRCh38 assembly, underscores chromosome 5's relatively low gene density relative to other autosomes.^[3]

Karyotypic Features

Chromosome 5 is classified as submetacentric in the human karyotype, featuring a centromere positioned slightly offset from the midpoint that divides it into a shorter p (short) arm of approximately 48 Mb and a longer q (long) arm of approximately 133 Mb.^[6]^[1] This arm ratio contributes to its characteristic L-shaped appearance under microscopic examination. In the standard Denver classification system, established for organizing human chromosomes by size and centromere position, chromosome 5 holds the fifth position overall and is grouped in category B alongside chromosome 4 as one of the two largest submetacentric autosomes.^[7] During metaphase arrest in cell division, when chromosomes are most condensed and visible, chromosome 5 stands out as one of the larger elements in the complement, typically measuring around 6-7% of the total haploid genome length.^[8] Distinct from acrocentric chromosomes (such as 13, 14, 15, 21, and 22), which bear nucleolar organizer regions (NORs) on their short arms for ribosomal RNA production, chromosome 5 lacks such secondary constrictions, simplifying its structural profile.^[9] Its termini are capped by conventional telomere sequences, consisting of repetitive TTAGGG motifs that protect against end-to-end fusions and degradation, maintaining chromosomal integrity across divisions.^[10] Cytogenetic banding techniques, such as G-banding, further facilitate its recognition by highlighting differential staining along the arms.^[10]

Genes

Number and Density

Chromosome 5 harbors approximately 900 protein-coding genes, accounting for about 4.5% of the estimated 20,000 protein-coding genes in the human genome, even though the chromosome constitutes roughly 6% of the total genomic length of approximately 3.1 billion base pairs.^[11] This distribution underscores the uneven allocation of genetic material across chromosomes. The initial sequencing and analysis of chromosome 5 in 2004 identified 923 protein-coding genes, a figure that has been refined downward with subsequent annotations. The gene density on chromosome 5 stands at around 5 protein-coding genes per megabase (Mb), calculated based on its length of about 181 Mb, which is notably lower than the human genome average of approximately 6.8 genes per Mb.^[11] This reduced density arises primarily from extensive intergenic regions and a higher proportion of segmental duplications, which occupy about 3.5% of the chromosome and contribute to gene-poor expanses. For context, examples of such gene-poor regions include large pericentromeric areas that span tens of megabases with minimal functional elements. Contemporary annotations from projects like Ensembl and GENCODE report a similar count of roughly 900 protein-coding genes, while the total number of annotated loci on chromosome 5 reaches approximately 1,700, encompassing non-coding RNA genes and pseudogenes; however, only the protein-coding subset is considered broadly functional in terms of protein production.^[12]^[10] These updates reflect ongoing refinements in annotation pipelines that integrate manual curation with computational predictions to enhance accuracy.^[13]

Notable Genes and Clusters

Chromosome 5 harbors several notable genes and gene clusters with significant biological roles. One prominent example is the APC gene, located at 5q22.2, which encodes a tumor suppressor protein that acts as an antagonist of the Wnt signaling pathway, regulating cell proliferation, adhesion, and migration through its interactions with β-catenin and microtubules.^[14] The protein's multidomain structure enables it to localize to multiple subcellular compartments, including the cytoplasm, nucleus, and kinetochores, where it facilitates proper chromosome segregation during mitosis.^[15] Another key gene is SMN1 at 5q13.2, which produces the survival motor neuron protein essential for the assembly of small nuclear ribonucleoproteins (snRNPs) and spliceosomal complexes, thereby supporting pre-mRNA splicing and RNA processing across various cell types, with particularly high expression in motor neurons.^[16] This protein also contributes to broader cellular functions, such as axonal transport and maintenance of neuromuscular junctions.^[17] A significant gene cluster on chromosome 5 is the cytokine cluster at 5q31.1, encompassing genes such as IL3, IL4, IL5, and IL13, which encode Th2-type cytokines that orchestrate immune responses, including B-cell activation, eosinophil differentiation, and IgE class switching to promote humoral immunity and allergic inflammation.^[18] These genes are tightly linked within a 500 kb region, reflecting evolutionary conservation for coordinated expression in immune regulation.^[19] At 5q35.1 lies the NPM1 gene, encoding nucleophosmin 1, a nucleolar phosphoprotein that functions as a molecular chaperone in ribosome biogenesis, facilitating the transport of ribosomal proteins and the maturation of ribosomal subunits while also regulating centrosome duplication and DNA repair.^[20] Its ability to shuttle between the nucleolus, nucleoplasm, and cytoplasm underscores its versatile role in maintaining cellular homeostasis.^[21] Among other important genes, CTNND2 at 5p15.2 encodes δ-catenin, a protein critical for neuronal development, where it links cadherin-based cell adhesion to the actin cytoskeleton, supporting synaptic maturation, dendritic spine formation, and neuronal migration during brain development.^[22] Similarly, FBN2 at 5q23.1 produces fibrillin-2, a glycoprotein integral to the extracellular matrix that assembles into microfibrils, providing structural support for elastic fibers in connective tissues and contributing to tissue elasticity and organ morphogenesis.^[23]

Role in Disease

Gene-Specific Disorders

Familial adenomatous polyposis (FAP) is an autosomal dominant disorder caused by germline mutations in the APC gene located at 5q22.2, leading to the development of hundreds to thousands of colorectal adenomas typically starting in adolescence or early adulthood.^[24] These mutations are predominantly loss-of-function variants, including nonsense and frameshift alterations that truncate the APC protein, disrupting its role in the Wnt signaling pathway and promoting uncontrolled cell proliferation in the colonic epithelium.^[25] The condition affects approximately 1 in 8,300 individuals worldwide, with about 20-30% of cases arising from de novo mutations and the remainder inherited from an affected parent, conferring a 50% risk to each offspring.^[26] Without intervention such as prophylactic colectomy, nearly all individuals with FAP develop colorectal cancer by age 40, highlighting the critical need for genetic screening and surveillance.^[24] Spinal muscular atrophy (SMA) results from biallelic mutations in the SMN1 gene at 5q13.2, following an autosomal recessive inheritance pattern that requires pathogenic variants in both alleles for disease manifestation.^[27] The most common molecular pathology involves homozygous deletion of exon 7 in SMN1, accounting for about 95% of cases, while the remaining 5% feature compound heterozygosity with one deletion and a point mutation, such as missense or splice-site variants, leading to insufficient survival motor neuron (SMN) protein essential for motor neuron maintenance.^[28] SMA has a prevalence of approximately 1 in 10,000 live births, with carrier frequency around 1 in 50 in the general population, and disease severity is modulated by the copy number of the nearby SMN2 gene, where higher copies partially compensate for SMN1 loss.^[29] De novo mutations occur in about 2% of cases, slightly altering standard recurrence risks for unaffected carrier parents.^[27] Mutations in the FBN2 gene at 5q23.1 underlie congenital contractural arachnodactyly (CCA), a Marfan-like autosomal dominant connective tissue disorder characterized by joint contractures, arachnodactyly, scoliosis, and occasional cardiovascular features such as mitral valve prolapse.^[30] Pathogenic variants, primarily missense mutations or small in-frame deletions affecting cysteine residues in the fibrillin-2 protein, disrupt microfibril assembly in extracellular matrix, leading to tissue fragility; about 25-75% of clinically diagnosed cases harbor identifiable FBN2 mutations.^[31] CCA is rare, with prevalence estimated at less than 1 in 10,000 individuals, and most cases are inherited, though de novo mutations contribute significantly given variable expressivity.^[32] Unlike classic Marfan syndrome, CCA spares the aorta in most instances, but skeletal and ocular manifestations predominate, necessitating multidisciplinary management.^[30]

Chromosomal Abnormalities

Chromosomal abnormalities involving large-scale structural changes to chromosome 5, such as deletions, duplications, and aneuploidies, are associated with a variety of syndromes and conditions, particularly affecting neurodevelopment and hematopoiesis. These alterations often result from de novo mutations or unbalanced translocations during gametogenesis or early embryonic development, leading to variable phenotypes depending on the size and location of the affected region. Diagnosis of such abnormalities typically relies on karyotyping to visualize gross chromosomal rearrangements and fluorescence in situ hybridization (FISH) to confirm specific deletions or duplications at targeted loci.^[33]^[34] One of the most well-characterized conditions is cri-du-chat syndrome, caused by a terminal deletion in the short arm at 5p15, typically spanning 5-20 Mb of genetic material. This deletion leads to a high-pitched, cat-like cry in infancy, intellectual disability, microcephaly, and facial dysmorphisms, with phenotypes varying based on deletion size and involvement of multiple genes such as CTNND2, which contributes to severe mental retardation. The incidence is estimated at 1 in 15,000 to 50,000 live births, predominantly affecting females slightly more than males, and most cases arise de novo without familial recurrence.^[35]^[36]^[37] The 5q- syndrome, a subtype of myelodysplastic syndrome (MDS), results from an interstitial deletion at 5q31-33, disrupting hematopoiesis and leading to macrocytic anemia, normal or elevated platelet counts, and hypolobated megakaryocytes. This condition predominantly affects elderly women and carries a risk of progression to acute myeloid leukemia in approximately 10-20% of cases, with isolated 5q deletions occurring in about 10-15% of all MDS patients. The deletion affects multiple genes critical for erythropoiesis and thrombopoiesis, contributing to the characteristic clinical features.^[38]^[39]^[40] 5q31.3 microdeletion syndrome is a rare condition caused by a small deletion on the long arm of chromosome 5 at 5q31.3, often spanning thousands to millions of base pairs and including the PURA gene. It is characterized by severely delayed speech and motor skills development, such as walking; weak muscle tone (hypotonia); feeding difficulties; breathing problems; recurrent seizures; and distinctive facial features including a narrow forehead, hypertelorism, tented upper lip, high-arched palate, and micrognathia. Brain abnormalities, such as delayed myelin production, are also common. Fewer than 10 cases have been reported, and it typically occurs de novo in an autosomal dominant manner.^[41] Partial trisomy 5p, involving duplication of the short arm, is a rare condition often occurring in mosaic form, resulting in developmental delays, congenital heart defects, seizures, and craniofacial anomalies. Unlike full trisomy 5, which is typically lethal, partial 5p duplications are compatible with life but lead to significant morbidity, with fewer than 100 cases reported in the literature. The severity correlates with the extent of the duplicated segment, usually involving bands 5p13 to 5p15.^[42]^[43] Isochromosome 5q, denoted as i(5)(q10), involves loss of the entire short arm (5p) and duplication of the long arm (5q), and has been recurrently observed in myeloid malignancies such as MDS and acute myeloid leukemia. This abnormality leads to monosomy 5p and trisomy 5q, potentially contributing to leukemogenesis through haploinsufficiency of 5p genes and overexpression of 5q oncogenes, often in the context of complex karyotypes. It is less common than 5q deletions but shares prognostic implications in hematologic disorders.^[44]^[45] PDGFRB-associated chronic eosinophilic leukemia is a myeloproliferative neoplasm caused by chromosomal rearrangements involving the PDGFRB gene at 5q32, such as t(5;12)(q31-33;p12-13), which fuses PDGFRB with partner genes like ETV6 on chromosome 12. This results in a fusion gene leading to uncontrolled proliferation of eosinophils and possibly other blood cells, causing persistent eosinophilia, enlarged spleen or liver, and skin rashes from abnormal immune responses. The exact prevalence is unknown, but it shows a strong male predominance (up to 9:1). These somatic rearrangements are acquired and respond well to tyrosine kinase inhibitors like imatinib.^[46]

Cytogenetics and Mapping

Banding Patterns

Banding patterns of human chromosomes, including chromosome 5, were revolutionized in the early 1970s with the development of staining techniques that revealed distinct light and dark regions along the chromosome arms. G-banding, the most widely used method, was introduced by A. T. Sumner in 1971 through treatment of metaphase chromosomes with trypsin followed by Giemsa staining, producing a characteristic pattern of approximately 400 to 850 bands per haploid genome depending on chromosome condensation.^[47]^[48] This technique preferentially stains AT-rich, late-replicating heterochromatic regions as dark G-positive (G+) bands, which are typically gene-poor, while GC-rich, early-replicating euchromatic regions appear as light G-negative (G-) bands that are gene-rich.^[49] The International System for Human Cytogenomic Nomenclature (ISCN) standardizes the description of these bands for chromosome 5, numbering them sequentially from the telomere to the centromere on the short (p) arm and from the centromere to the telomere on the long (q) arm. On the p arm, bands progress from 5p15 at the telomere (subdivided into 5p15.3, 5p15.2, and 5p15.1) through 5p14, 5p13, 5p12, to 5p11 adjacent to the centromere; on the q arm, they extend from 5q11 (subdivided into 5q11.2 and 5q11.1) through 5q12, 5q13, 5q14, 5q21, 5q22, 5q23.1, 5q23.2, 5q31 (subdivided into 5q31.1, 5q31.2, and 5q31.3), 5q32, 5q33, 5q34, to 5q35 at the telomere.^[50]^[49] At high-resolution (850-band) level, these patterns enable precise localization of structural variants, with G+ bands often showing hierarchical splitting into darker and lighter sub-bands upon stretching, while G- bands remain uniform.^[48]^[49] Several bands on chromosome 5 are cytogenetically significant due to their association with critical regions in genetic disorders. The telomeric band 5p15 harbors the core critical region for cri-du-chat syndrome, spanning 5p15.3 to 5p15.2, where deletions produce the syndrome's characteristic phenotype.^[51] On the q arm, 5q22 contains the APC tumor suppressor gene locus, frequently mutated in familial adenomatous polyposis.^[52] Band 5q31 is notable for hosting a cytokine gene cluster (including IL3, IL4, IL5, and CSF2) and as the common deleted region in 5q- syndrome, a myelodysplastic disorder involving interstitial deletions typically from 5q31.1 to 5q31.3.^[53]^[54]

Sequencing and Genomic Analysis

The complete sequence of human chromosome 5 was published in 2004 by an international consortium led by the Wellcome Trust Sanger Institute and the Department of Energy Joint Genome Institute, marking a significant milestone in the Human Genome Project.^[55] This effort assembled approximately 177.7 million base pairs of finished sequence, with an estimated total length of 181 megabases when including gaps and heterochromatic regions.^[55] The analysis identified 923 protein-coding genes, alongside numerous pseudogenes and non-coding elements, highlighting the chromosome's relatively low gene density compared to other human chromosomes.^[55] Notably, the sequence revealed extensive intrachromosomal duplications, particularly in the 5q arm, including complex regions like the 5q13.3 locus associated with structural complexity.^[55] The sequencing employed a hierarchical shotgun approach, utilizing bacterial artificial chromosome (BAC) and fosmid clones from libraries such as RPCI-11 for high-depth coverage in duplicated regions, integrated with radiation hybrid maps and overgo hybridization to resolve gaps.^[55] Public sequence data were incorporated to achieve a finished quality standard, with over 12-fold redundancy in euchromatic regions.^[55] Subsequent refinements occurred through the Genome Reference Consortium's efforts, culminating in the GRCh38 assembly released in 2013, which extended the chromosome 5 sequence to 180,557,866 base pairs by filling gaps, correcting misassemblies, and incorporating alternate contigs for structural variants.^[56] Further patches, up to GRCh38.p14 in 2023, addressed minor errors and added novel sequences without altering core coordinates.^[3] Key findings from the initial and updated analyses underscore chromosome 5's genomic architecture, including segmental duplications spanning 3.49% (about 6.26 megabases) of the assembled sequence—lower than the genome-wide average of 5.3%—with clusters of high-identity blocks (≥97.5%) in pericentromeric and telomeric areas.^[55] The chromosome exhibits a relatively low single nucleotide polymorphism rate, contributing to its evolutionary stability, as evidenced by conserved non-coding elements and reduced variability in gene-poor duplicated segments.^[55] Comparative genomics revealed strong evolutionary conservation, with large-scale synteny to portions of mouse chromosomes 13 and 14, including conserved gene orders in regions like 5q31–q35 mapping to mouse chromosome 13.^[55] These insights, refined in GRCh38, highlight inversion events and conserved regulatory elements across mammals.^[57] The chromosome 5 sequence has facilitated applications in genomics, enabling high-resolution mapping of structural variants and integration with next-generation sequencing for population-scale variant discovery, as demonstrated in projects like the 1000 Genomes Project. This has supported bioinformatics tools for aligning reads to duplicated regions and detecting copy-number variations with improved accuracy.^[58]