Chromosome 6

Chromosome 6 is a metacentric autosome in the human genome, one of the 23 pairs of chromosomes found in the nucleus of most cells, spanning approximately 171 million base pairs and comprising about 6% of the total genomic DNA. It contains an estimated 1,050 protein-coding genes, along with thousands of non-coding genes and regulatory elements, and is characterized by a centromere located near its midpoint, facilitating proper segregation during cell division.^[1] A defining feature of chromosome 6 is the extended major histocompatibility complex (eMHC) region at band 6p21.3, a dense 7.6 megabase segment housing over 200 HLA genes with more than 42,000 known alleles (as of 2025), which play a central role in antigen presentation, immune recognition, and response to pathogens.^[2]^[3] This region contributes to individual variability in immune function and is pivotal in organ transplantation compatibility, where mismatches can lead to rejection. Beyond immunity, chromosome 6 harbors genes linked to diverse physiological processes, including neurodegeneration (e.g., PARK2 associated with early-onset Parkinson's disease) and connective tissue integrity (e.g., COL11A2 implicated in Stickler syndrome).^[2]^[4]^[5] Genes on chromosome 6 are implicated in over 120 major human diseases, spanning immune and inflammatory disorders (such as ankylosing spondylitis via HLA-B), cancers, cardiovascular conditions, infectious diseases, and neurological ailments like Alzheimer's disease.^[2] These associations underscore chromosome 6's broad impact on health, with ongoing research focusing on its role in genetic susceptibility and therapeutic targeting, particularly within the polymorphic HLA locus.^[2]

Overview and Characteristics

Physical properties

Human chromosome 6 is a metacentric autosome measuring approximately 171 million base pairs in length, constituting about 5.5% of the total DNA content in human cells.^[6] This chromosome features a short p-arm spanning roughly 61 Mb and a longer q-arm of about 108 Mb, with the centromere positioned between them to facilitate balanced segregation during cell division. Under G-banding cytogenetic staining, the p-arm is subdivided into bands from 6p25 distally to 6p11 proximally, while the q-arm extends from 6q11 proximally to 6q27 distally, providing a visual map for identifying structural features and abnormalities. As the sixth pair in the standard human karyotype, chromosome 6 ranks among the smaller autosomes and becomes discernible under light microscopy in its condensed form during metaphase of mitosis or meiosis. The nucleotide composition includes an average GC content of approximately 41%, with heterochromatin—characterized by repetitive, densely packed DNA—predominantly distributed in pericentromeric and telomeric regions, whereas gene-rich euchromatin occupies much of the arm interiors. A notable euchromatic region on the p-arm at 6p21 houses the major histocompatibility complex (MHC).

Functional significance

Chromosome 6 plays a pivotal role in human immunity through the major histocompatibility complex (MHC), a gene cluster at the 6p21.3 locus that encodes class I and class II proteins essential for antigen presentation. These proteins display peptide fragments on cell surfaces to T cells, facilitating immune recognition of pathogens and abnormal cells while promoting T-cell activation in adaptive responses.^[7]^[8] The MHC's polymorphic nature allows for diverse antigen-binding capabilities across individuals, underpinning transplant compatibility and disease susceptibility.^[9] The association of the MHC with chromosome 6 was established in the early 1970s via somatic cell hybridization experiments and family-based linkage studies, which mapped the HLA loci to this chromosome. These findings built on earlier serological observations of HLA antigens, confirming their chromosomal location and linkage to immune function.^[10] In addition to immunity, chromosome 6 supports fundamental cellular processes, including DNA repair, cell cycle control, and metabolism, thereby maintaining genomic stability. Genes encoding components of the SMC5/6 complex, such as NSMCE3, facilitate DNA damage repair and replication fork progression, preventing chromosomal breakage.^[11] The CDKN1A gene produces p21, a cyclin-dependent kinase inhibitor that regulates cell cycle checkpoints in response to stress. For metabolism, MTHFD1L encodes an enzyme in the folate synthesis pathway, supporting one-carbon transfer reactions vital for nucleotide production and cellular homeostasis. These contributions highlight chromosome 6's broad influence on physiological integrity. Approximately 1,052 protein-coding genes reside on chromosome 6, comprising about 1-2% of its sequence, consistent with the human genome's overall coding proportion.^[12] The remaining non-coding DNA harbors regulatory elements, including enhancers that modulate gene expression, particularly within immune-related regions like the MHC.^[1]

Structural and Cytogenetic Features

Banding and karyotype

Chromosome 6 exhibits a distinct banding pattern when visualized using standard cytogenetic techniques, facilitating its identification and analysis in karyotypes. G-banding, the most commonly employed method, involves pretreatment of metaphase chromosomes with trypsin followed by staining with Giemsa, producing alternating light (G-light, gene-rich) and dark (G-dark, gene-poor) bands that reflect differences in AT/GC content and chromatin condensation. High-resolution G-banding ideograms, as standardized by the International System for Human Cytogenomic Nomenclature (ISCN 2024), delineate approximately 23 bands on the short arm (6p) and 27 bands on the long arm (6q) at the 850-band resolution level, allowing precise mapping of structural features from pter to qter.^[13] The normal karyotype for individuals with two copies of chromosome 6 is denoted as 46,XX for females or 46,XY for males, indicating no visible abnormalities under standard banding. Polymorphic variants, such as 6qh+ (enlarged heterochromatic region in the proximal long arm near the centromere), are benign heterochromatin expansions observed in a small percentage of the population and are noted in karyotype descriptions like 46,XX,6qh+ when they deviate from the standard size. These variants do not typically affect phenotype but are important for distinguishing normal diversity from pathological changes.^[14] Additional staining techniques complement G-banding for targeted analysis of chromosome 6. C-banding, which uses alkali treatment and Giemsa staining to highlight constitutive heterochromatin, particularly stains the centromeric and secondary constriction regions (6qh) rich in satellite DNA repeats. Fluorescence in situ hybridization (FISH) employs locus-specific probes, such as those targeting centromeric alpha-satellite sequences (D6Z1) or subtelomeric regions, to detect numerical or structural anomalies at higher sensitivity than banding alone, often using fluorophore-labeled DNA sequences that hybridize to denatured chromosomal DNA.^[15] In clinical cytogenetics, banding and related techniques play a crucial role in identifying aneuploidies involving chromosome 6, such as trisomy 6 (47,XX,+6 or 47,XY,+6), which is exceedingly rare and usually presents as mosaicism due to post-zygotic nondisjunction. Mosaic trisomy 6 is frequently detected prenatally via G-banding of amniocytes or chorionic villi, with confirmation by FISH to assess the proportion of affected cells, though full trisomy 6 is typically lethal in utero.^[16]

Centromere and repetitive elements

The centromere of human chromosome 6 is positioned at the boundary between cytogenetic bands 6p11.1 and 6q11.1, serving as the constricted region that facilitates kinetochore assembly and chromosome segregation during cell division. This centromere spans extensive arrays of alpha-satellite DNA, a major satellite repeat family composed of tandemly repeated 171-bp monomers organized into higher-order repeats (HORs) that can extend 2-4 megabases in length. These alphoid sequences are AT-rich and form the structural foundation for centromeric function, with specific HOR units, such as D6Z1 on chromosome 6, recruiting the centromere-specific histone variant CENP-A to nucleosomes, thereby establishing an epigenetic mark for kinetochore formation and microtubule attachment.^[17]^[18]^[19] Beyond the centromere, chromosome 6 harbors diverse repetitive elements that constitute a significant portion of its ~171 Mb length, including satellite DNAs, transposable elements, and segmental duplications. Alpha-satellite DNA predominates at the centromere, while other satellites like beta- and gamma-satellites flank pericentromeric regions. Long interspersed nuclear elements (LINEs), primarily LINE-1 sequences averaging 6 kb, and short interspersed nuclear elements (SINEs), dominated by ~300-bp Alu repeats, together account for over 25% of the chromosome's sequence, with LINEs more abundant in AT-rich, gene-poor areas and SINEs enriched in GC-rich, gene-dense segments. Segmental duplications—blocks of >1 kb sequence with ≥90% identity—cover approximately 5% of chromosome 6, clustering near the centromere and telomeres, where they mediate structural instability and copy number variations via unequal recombination.^[20]^[21]^[22] The telomeres capping the p and q arms of chromosome 6 consist of canonical TTAGGG hexameric repeats arrayed in tandem for 5-15 kb, forming a protective overhang that prevents end-to-end fusions and maintains chromosomal stability through telomerase activity. Flanking these telomeric tracts are subtelomeric regions, typically 100-500 kb long, characterized by low gene density but enriched in regulatory elements such as enhancers, silencers, and non-coding RNAs that modulate distal gene expression, alongside duplicated blocks that foster evolutionary plasticity.^[23]^[24] These repetitive elements fulfill essential structural and regulatory roles on chromosome 6. Pericentromeric alpha-satellite and satellite repeats promote sister chromatid cohesion by providing a heterochromatic scaffold that enhances cohesin complex entrapment and stabilization, particularly in regions requiring robust bipolar spindle attachments during mitosis. Meanwhile, interspersed repeats like LINEs, SINEs, and segmental duplications serve as hotspots for meiotic recombination, where their sequence homology drives double-strand break repair and crossover formation, thereby generating genetic variation, though at the cost of potential rearrangements.^[25]^[26]

Genes and Genomic Content

Gene count and density

Human chromosome 6 harbors approximately 1,002–1,034 protein-coding genes, representing a substantial portion of the genome's estimated 19,000–20,000 such genes overall.^[2] In addition to these, the chromosome contains approximately 1,900 genes in total.^[1] These figures are derived from comprehensive annotations in Ensembl and GENCODE releases up to 2025, which integrate manual curation, RNA-seq data, and comparative genomics to refine gene models.^[27] Gene density on chromosome 6 varies markedly along its length, with an overall average of about 6 protein-coding genes per megabase (Mb) across its 171 Mb span, lower than the genome-wide average due to extensive repetitive regions.^[1] The short arm (6p) exhibits higher density, particularly in the major histocompatibility complex (MHC) at 6p21, where over 200 genes are packed into approximately 4 Mb, yielding a density exceeding 50 genes/Mb and underscoring the region's role in immune-related gene concentration.^[28] In contrast, the long arm (6q), especially pericentromeric areas rich in repeats, shows reduced density, often below 4 genes/Mb, reflecting structural constraints on gene placement. Recent advances in genome assembly, notably the telomere-to-telomere (T2T-CHM13) reference released in 2022, have resolved previous gaps in repetitive sequences on chromosome 6, including centromeric and pericentromeric regions previously underrepresented in GRCh38. These improvements facilitated updated annotations in GENCODE and Ensembl by 2025, incorporating additional evidence from long-read sequencing and proteomics to identify and validate novel genes, with chromosome 6 benefiting from better handling of repeats in these regions.^[27] The chromosome also includes pseudogenes, such as those from the olfactory receptor cluster on 6p, where multiple paralogous copies contribute to pseudogene accumulation via incomplete processing or mutations. This reflects ongoing refinements in annotation, distinguishing functional relics from true non-coding elements.^[29]

Key gene clusters and notable genes

The Major Histocompatibility Complex (MHC), located at 6p21.3 on the short arm of chromosome 6, represents one of the most gene-dense and polymorphic regions in the human genome, spanning approximately 4 megabases and containing over 200 genes primarily involved in immune function. This cluster includes the human leukocyte antigen (HLA) genes, which are subdivided into class I (HLA-A, HLA-B, and HLA-C) and class II (HLA-DR, HLA-DQ, and HLA-DP) subregions; class I genes encode proteins that present antigens to cytotoxic T cells, while class II genes facilitate antigen presentation to helper T cells. The MHC exhibits extreme polymorphism, with over 42,000 documented alleles across HLA loci as of September 2025, enabling diverse immune responses to pathogens but also influencing susceptibility to immune-mediated conditions.^[30]^[31]^[32] Genomic architecture within the MHC features structural variations, including polymorphic inversions that alter gene order and potentially regulatory elements in specific human populations, as observed in comparative haplotype analyses. Beyond the MHC, chromosome 6 hosts other significant gene clusters and individual loci with specialized functions. The PARK2 gene (also known as PRKN) at 6q26 encodes parkin, a RING-between-RING E3 ubiquitin ligase that ubiquitinates target proteins for proteasomal degradation and regulates mitochondrial autophagy (mitophagy). The ESR1 gene at 6q25.1 encodes estrogen receptor alpha (ERα), a nuclear receptor that binds estrogen ligands to modulate gene transcription in processes such as reproductive development and cellular proliferation.^[33] Notable single genes on chromosome 6 include FOXC1 at 6p25.3, which encodes a forkhead box transcription factor critical for the development of ocular structures, particularly the anterior segment of the eye through regulation of neural crest-derived tissues.^[34] Additionally, the PLG gene at 6q26 encodes plasminogen, a glycoprotein zymogen that is cleaved to form plasmin, the primary enzyme in the fibrinolytic system responsible for dissolving blood clots and maintaining vascular patency.^[35]

Evolution and Comparative Aspects

Centromere evolution

The centromere of human chromosome 6 represents an evolutionary-new centromere that repositioned from an ancestral location at band 6p22.1 to its current pericentromeric position between 17 and 23 million years ago in the common ancestor of hominoids. This neocentromere formation involved a "jump" that inactivated the original site, leaving behind a fossil centromere marked by pericentromeric heterochromatin and repetitive elements, while activating a new alpha-satellite array at the modern locus. Orthologous regions in great apes, such as chimpanzees and gorillas, retain latent centromere-forming potential, as demonstrated by rare variant human chromosomes where the centromere reactivates at the ancestral 6p22.1 position, suggesting conserved epigenetic competence despite millions of years of divergence. The alpha-satellite DNA comprising the functional centromere of chromosome 6 has undergone significant diversification in higher-order repeats (HORs) following the human-chimpanzee split approximately 6 million years ago. Phylogenetic analyses of these HORs reveal chromosome-specific evolutionary trajectories, with human chromosome 6 exhibiting unique 15- and 18-monomer HOR structures that differ from those in other primates, indicating rapid post-divergence expansion and sequence homogenization within the human lineage. These HORs form the core of the active centromeric array, spanning several megabases and showing structural plasticity, such as shifts in kinetochore positioning observed between human genome assemblies. Inactivation of the ancestral centromere on chromosome 6 involved epigenetic silencing mechanisms that established pericentromeric heterochromatin, characterized by histone modifications like H3K9me3 and DNA methylation to suppress centromeric activity at the old site. Studies mapping complete centromeric regions confirm this through the absence of CENP-A nucleosomes at the fossil locus and enrichment of repressive marks, with the inactivated array persisting as a stable heterochromatic block. Updated genomic and epigenetic profiling in the 2020s reinforces that such silencing prevents reactivation, maintaining evolutionary stability despite latent potential. Population-level analyses reveal minor variations in chromosome 6 centromere size and HOR array length across human ethnic groups, with differences up to several hundred kilobases linked to single-nucleotide polymorphisms in flanking regions. These variations are hypothesized to influence meiotic drive, where stronger or larger centromeres may bias segregation in female meiosis, contributing to subtle allele transmission advantages observed in diverse populations. Such dynamics underscore the ongoing evolutionary pressures on centromeric sequences even within modern humans.

Cross-species comparisons

In comparative genomics, human chromosome 6 (HSA6) exhibits high syntenic conservation with its orthologs in nonhuman primates, particularly within the great apes. HSA6 directly corresponds to chimpanzee chromosome 6 (PTR6), with approximately 98% sequence identity across aligned regions, though structural variations such as pericentric inversions disrupt synteny in the major histocompatibility complex (MHC) region on the short arm.^[36] Similar patterns hold for gorilla chromosome 6 (GGO6) and orangutan chromosome 6 (PON6), where arm ratios (q/p) remain comparable to HSA6 at around 1.5-1.6, preserving overall karyotypic morphology despite minor rearrangements.^[37] These observations stem from haplotype-resolved genome assemblies that highlight shared ancestry dating back to the last common ancestor of great apes approximately 12-16 million years ago.^[38] In contrast, synteny breaks down significantly in rodents, where HSA6 content is fragmented across multiple chromosomes. The human MHC region on HSA6p21 maps primarily to the distal portion of mouse chromosome 17 (MMU17), encompassing the orthologous H2 complex, which includes class I (H2-K, H2-D) and class II (H2-I) genes as direct counterparts to human HLA loci.^[28] Additional HSA6 segments, including those near the long arm telomere, are dispersed to MMU10 and other autosomes, reflecting extensive rearrangements since the boreoeutherian split around 90 million years ago.^[39] Gene orthologs such as HLA-A/B/C align with H2-K/D/L in mice, maintaining functional equivalence in antigen presentation despite the chromosomal fragmentation.^[40] Evolutionary breakpoints on HSA6 reveal at least 10 major rearrangement hotspots across mammalian lineages, identified through pairwise synteny alignments between human, mouse, and dog genomes. These hotspots, often enriched in segmental duplications and repetitive elements, account for fission, fusion, and inversion events that reshaped the chromosome since the eutherian radiation.^[41] In carnivores, for instance, portions of HSA6 show partial synteny with dog chromosome 12 (CFA12), including a fusion breakpoint near the HSA6q23 region that links immune-related genes, contrasting with the more intact primate orthologs.^[42] Such breakpoints cluster in pericentromeric and telomeric zones, driving lineage-specific chromosomal evolution.^[43] Functional divergence in HSA6 is pronounced in immune gene content, with the MHC locus showing expansion in primates relative to non-primates. Human and great ape MHC regions contain over 200 protein-coding genes, including duplicated class I and II loci, compared to the more contracted H2 complex in rodents (fewer than 50 functional equivalents).^[44] Recent 2025 pangenome analyses of ape genomes confirm this primate-specific amplification, attributing it to segmental duplications that enhanced adaptive immunity, while non-primate mammals exhibit gene loss or pseudogenization in homologous regions.^[37] This divergence underscores HSA6's role in species-specific immune evolution.^[45]

Diseases and Clinical Associations

Monogenic disorders

Monogenic disorders associated with chromosome 6 arise from mutations, deletions, or imprinting defects in specific genes or regions, leading to single-locus inheritance patterns such as autosomal dominant or recessive traits. These conditions often manifest in early development or childhood, with phenotypes ranging from ocular and neurological abnormalities to metabolic disruptions. Common mechanisms include point mutations that alter protein function, microdeletions causing haploinsufficiency, and imprinting errors that disrupt gene dosage from the paternal allele. Diagnosis typically involves chromosomal microarray (array CGH) for copy number variants or targeted sequencing for point mutations, with inheritance patterns varying from de novo events to familial transmission.^[46]^[47] Axenfeld-Rieger syndrome type 3 (RIEG3), an autosomal dominant disorder, results from heterozygous mutations or deletions in the FOXC1 gene at 6p25.3, leading to anterior segment dysgenesis of the eye, including iris hypoplasia, corneal abnormalities, and a high risk of glaucoma. These variants, such as nonsense or frameshift mutations, impair FOXC1's role as a transcription factor essential for ocular development, with affected individuals often exhibiting dental anomalies and redundant periumbilical skin. The condition has a prevalence of approximately 1 in 200,000 people and is frequently identified through ophthalmic examination followed by genetic testing.^[48]^[49]^[50] Microdeletions encompassing 6p25, known as 6p25 deletion syndrome or 6pter-p24 deletion syndrome, cause contiguous gene syndromes with intellectual disability, hearing loss, and ocular features like coloboma or glaucoma due to haploinsufficiency of multiple genes including FOXC1 and GCNT6. These terminal or interstitial deletions, typically 1-5 Mb in size, occur de novo in most cases and are detected via array CGH, revealing breakpoints within 6p25.3. The syndrome is rare, with fewer than 100 reported cases, and manifests with hypotonia, seizures, and craniofacial dysmorphism from infancy.^[46]^[51] Parkinson disease type 2 (PARK2), an autosomal recessive form of early-onset parkinsonism, stems from biallelic mutations or deletions in the PRKN (Parkin) gene at 6q26, disrupting mitochondrial quality control and leading to dopaminergic neuron loss. Pathogenic variants include exon rearrangements, point mutations, and homozygous deletions, with onset typically before age 40 and symptoms responsive to levodopa but prone to dyskinesias. PRKN mutations are found in approximately 5-15% of cases of early-onset Parkinson's disease and is confirmed through multiplex ligation-dependent probe amplification or sequencing.^[52]^[53]^[54] Transient neonatal diabetes mellitus type 1 (TNDM1) is an imprinting disorder at 6q24 involving overexpression of the paternally inherited PLAGL1 (ZAC) gene, often due to paternal uniparental disomy, duplication, or hypomethylation of the maternal allele. This leads to transient hyperglycemia requiring insulin in the neonatal period, with macroglossia and intrauterine growth restriction as common features; remission occurs within months, but 50-70% relapse as type 2 diabetes in adolescence. The condition follows complex inheritance tied to imprinting centers and has a prevalence of about 1 in 200,000-500,000 live births, diagnosed via methylation-specific PCR or array CGH.^[47]^[55]^[56] Terminal 6q deletions, including those at 6q26-q27, represent another copy number mechanism, resulting in autosomal dominant syndromes with intellectual disability, short stature, and facial dysmorphism due to haploinsufficiency of genes like PRKN and PDE10A. These de novo deletions, spanning 5-20 Mb, are detected by array CGH and occur in fewer than 1 in 1,000,000 births, often presenting with hypotonia and structural anomalies.^[57]^[58]^[59] Recent advances include CRISPR-Cas9 models in zebrafish that dissect foxc1 regulatory elements, confirming that loss-of-function variants cause iris stromal hyperplasia and anterior segment defects akin to human Axenfeld-Rieger syndrome, enhancing understanding of dosage-sensitive pathways.^[60]

Complex diseases and cancer links

Chromosome 6 plays a pivotal role in complex diseases through its genetic variants, particularly in the major histocompatibility complex (MHC) region, which encompasses the human leukocyte antigen (HLA) genes and influences immune regulation. Genome-wide association studies (GWAS) have identified strong links between HLA alleles on chromosome 6 and over 100 autoimmune conditions, including type 1 diabetes, celiac disease, rheumatoid arthritis, multiple sclerosis, and systemic lupus erythematosus. For instance, the HLA-DR3/DR4-DQ2/DQ8 genotype confers substantial risk for type 1 diabetes, with high-risk haplotypes yielding odds ratios (OR) of approximately 20-30 in affected populations. Similarly, the HLA-DQ2 allele is a primary susceptibility factor for celiac disease, associated with ORs ranging from 3 to 10 across diverse cohorts, highlighting the MHC's central role in immune-mediated pathologies.^[61]^[62]^[2]^[63] In cancer, structural alterations on chromosome 6 contribute to oncogenesis and progression across multiple tumor types. Deletions at 6q, often spanning several megabases, are recurrent in prostate and ovarian cancers. Amplifications of 6p occur in up to 30% of melanomas, promoting tumor progression and associating with reduced overall survival, potentially through overexpression of oncogenes in the region. Additionally, chromosome 6 aneuploidy, such as trisomy 6, is observed in approximately 0.4-1.4% of acute myeloid leukemias and myelodysplastic syndromes, sometimes as a sole abnormality, with context-dependent prognostic implications ranging from neutral to adverse outcomes.^[64]^[65]^[66] Beyond autoimmunity and cancer, variants on chromosome 6 influence heart disease, infectious disease susceptibility, and other multifactorial conditions. For example, polymorphisms near TNFAIP3 at 6q23, which encodes the A20 protein regulating NF-κB signaling and inflammation, are associated with rheumatoid arthritis (e.g., rs6920220, OR ≈1.3) and psoriasis (OR ≈1.2-1.5), exacerbating chronic inflammatory responses. The 2014 Human Proteome Organization Chromosome 6 Consortium documented over 120 associations between chromosome 6 loci and major human diseases, encompassing immune disorders, cancers, cardiovascular conditions like atherosclerosis, and responses to infections such as HIV. Updates from the Human Pangenome Reference Consortium in 2025, incorporating diverse global haplotypes, have refined these findings by improving variant detection in structurally complex regions like the MHC, revealing finer-scale risk alleles for polygenic traits.^[67]^[68]^[2]^[69] Therapeutically, the HLA region's variability on chromosome 6 necessitates precise matching in organ and stem cell transplants to mitigate alloimmune responses. Mismatches at HLA loci increase acute graft rejection risk by 20-30% in kidney transplants, with HLA-DR and HLA-DQ disparities showing the strongest correlations (relative risk up to 2.5-fold), while HLA-C mismatches elevate graft failure odds by over 3-fold in hematopoietic stem cell transplantation. These quantified risks underscore the importance of high-resolution HLA typing for optimizing long-term graft survival and patient outcomes.^[70]^[71]^[72]