The ROSA26 locus, also known as Gt(ROSA)26Sor, is a genomic site on mousechromosome 6 at position 113,044,389–113,054,205 (GRCm39 assembly) that functions as a safe harbor for stable transgeneintegration in genetic engineering.[1] Discovered in 1991 through a retroviral gene-trap screen in embryonic stem cells, it was identified as a neutral insertion point where proviral integration did not produce observable phenotypic effects or disrupt essential gene functions.[2] The locus spans approximately 9 kb, consists of three exons, and primarily encodes non-coding RNAs with ubiquitous transcriptional activity across tissues and developmental stages, enabling consistent, copy-number-dependent expression of inserted sequences without variegation or silencing.Subsequent characterization revealed that the ROSA26 locus produces multiple overlapping transcripts, including a nuclear RNA with broad expression, which supports its utility for driving reporter genes like β-galactosidase in trapped alleles. Its popularity in mouse genetics stems from the lack of adverse effects on viability, fertility, or development upon targeting, making it ideal for generating transgenic models via homologous recombination or zinc finger nucleases. Common applications include the creation of knock-in lines for Cre/loxP recombinase expression to enable conditional gene manipulation, fluorescent reporters for lineage tracing, and overexpression of genes or non-coding RNAs for functional studies in development, immunology, and disease modeling.[3] Over 130 such targeted mouse lines have been reported, with modern CRISPR/Cas9 methods further enhancing targeting efficiency in embryonic stem cells and beyond.[4] While primarily established in mice, orthologous ROSA26 sites have been identified and utilized in other species like rabbits and pigs for similar purposes.[4]
Discovery and History
Initial Identification
The ROSA26 locus was first identified in 1991 by G. Friedrich and P. Soriano in Philippe Soriano's group at the Roche Institute of Molecular Biology through a promoter-trap gene screen in mouse embryonic stem cells.[2]The experimental setup involved infection of embryonic stem cells with the ROSAβgeo retroviral gene-trap vector to detect insertion sites associated with active promoters during development. This vector included a splice acceptor sequence fused upstream to a reporter gene encoding β-galactosidase (lacZ) and neomycin resistance, enabling both selection of stable integrants and visualization of expression patterns.[2][5]The ROSA26 locus was initially isolated as one of the integration sites where the vector inserted (from gene trap pool #26), leading to expression of the lacZ reporter gene under the control of endogenous regulatory elements.[2][5]Early observations in chimeric mice derived from these targeted embryonic stem cells revealed ubiquitous lacZ expression, indicating broad transcriptional activity at the ROSA26 locus across various tissues.[2]
Gene Trap Screen and Early Studies
The ROSA26 locus emerged from a gene-trap mutagenesis screen in mouse embryonic stem cells, employing a promoter trap vector designed for integration into actively transcribed genes. This vector featured a splice acceptor sequence upstream of the β-geo reporter gene—a fusion protein combining β-galactosidase (lacZ) for histochemical detection and neomycin phosphotransferase for selectable resistance—along with a promoterless neomycin selection cassette to enable enrichment of trapped integrations.[6] The design allowed the vector to splice into host introns, harnessing endogenous promoters to drive constitutive expression of the fusion reporter without requiring an internal promoter.[6]Initial studies by the Soriano group in 1991 validated the functionality of the ROSA26 insertion through generation of chimeric mice and germline transmission. X-gal staining revealed ubiquitous β-galactosidase activity in preimplantation embryos, postimplantation stages (e.g., E9.5), and adult tissues, including brain, heart, kidney, liver, spleen, and hematopoietic cells such as B cells, T cells, and myeloid lineages.[6] These findings confirmed broad, constitutive expression across developmental stages and multiple tissue types, with staining intensity varying by cell type but present in all examined samples.[6] Notably, newborn pups displayed intense blue staining in virtually all tissues, underscoring the locus's potential as a neutral reporter platform.[6]Refinements in the Soriano laboratory involved screening multiple independent ES cell clones from the initial gene-trap library, identifying the ROSAβ-geo26 line with single or low-copy insertions that consistently yielded ubiquitous reporter expression without variegation.[6] This repeatability across clones established the locus's reliability for stable, non-silenced transgene activity. Further characterization in 1997 revealed that the insertion at ROSA26 disrupted only overlapping non-coding transcripts without affecting nearby protein-coding genes or producing an overt phenotype in homozygous mice, indicating a neutral integration site suitable for genetic engineering.[5]
Genomic Location and Structure
Chromosomal Position
The ROSA26 locus, officially designated Gt(ROSA)26Sor, is situated on mouse chromosome 6 within the cytogenetic band 6 A3.1. In the GRCm39 reference genome assembly, it occupies genomic coordinates from 113,044,389 to 113,054,205 bp on the reverse strand, corresponding to a position of approximately 113 Mb from the centromere and spanning roughly 9.8 kb. This location places it in a gene-poor region between the genes Thumpd3 and Nutf2, facilitating its use without disrupting essential nearby genes.[1][7][8]The mapping of ROSA26 to chromosome 6 was established through genetic linkage analysis in 1997, which showed no recombination with the microsatellite marker D6Mit10 in a panel of intersubspecific backcross progeny. Earlier efforts in the 1990s utilized Southern blotting to verify insertional events during gene trapping and fluorescence in situ hybridization (FISH) to confirm its chromosomal assignment, building on the locus's initial isolation via retroviral gene trap mutagenesis in embryonic stem cells. These pre-sequencing methods provided the foundational localization, which was subsequently refined to nucleotide resolution by the Mouse Genome Sequencing Consortium in 2002.[5]Syntenic conservation of the ROSA26 region extends to other mammals, with a homolog identified on human chromosome 3p25.3 (THUMPD3-AS1; GRCh38.p14 coordinates chr3:9,385,264–9,397,494). The human ortholog has been targeted for transgene expression but exhibits some differences in regulatory features compared to the mouse locus.[9][10]
Sequence Features and Organization
The Gt(ROSA)26Sor locus, also known as Thumpd3as1, officially symbolized as Gt(ROSA)26Sor with NCBI Gene ID 14910, is a non-protein-coding gene that spans approximately 9.8 kb (from 113,044,389 to 113,054,205 on the complement strand of chromosome 6 in the GRCm39 assembly). It produces a long non-coding RNA (lncRNA) lacking any protein-coding potential, with three primary RefSeq transcripts (NR_027008.1, NR_027009.1, and NR_027010.1) generated through alternative splicing.[1][8]The genomic organization features a multi-exon structure comprising at least three exons, with the gene trap insertion originally occurring within the first intron of the sense-oriented transcripts. Multiple splice sites enable the production of distinct lncRNA variants, including a 1.17 kb transcript (sharing exon 1 with shorter isoforms) and a 0.41 kb transcript that splices exon 1 to a downstream exon. An antisense transcript overlaps the locus but is not disrupted by typical targeting strategies. The endogenous promoter, located upstream of exon 1, is constitutive and of medium strength, characterized by the absence of a TATA box but the presence of GC-rich regions and CAAT boxes, supporting housekeeping-like transcription initiation at multiple start sites.[5][11]Key structural elements include a polyadenylation signal positioned about 20 nucleotides upstream of the poly(A) tail in the primary sense transcript, ensuring efficient 3' end formation. The locus maintains an open chromatin configuration with minimal strong regulatory elements such as enhancers or insulators, which promotes stable, ubiquitous accessibility without position-effect variegation for inserted sequences.[5][12]
Biological Function
Expression Pattern
The ROSA26 locus exhibits ubiquitous and constitutive expression in mice, driven by its endogenous promoter, which was first observed through gene trap insertions that revealed widespread reporter activity in embryonic stem (ES) cells and early embryos.[2] This expression initiates as early as the morula-to-blastocyst stage, corresponding to approximately embryonic day 3.5 (E3.5), and persists without silencing through embryonic development, neonatal stages, and into adulthood.[5] In ES cells derived from blastocysts, the locus supports active transcription, enabling reliable reporter detection from the outset of in vitro culture.[2]Spatially, ROSA26 expression is detected across virtually all tissues examined, including brain, heart, liver, kidney, lung, pancreas, intestine, muscle, skin, spleen, thymus, bone marrow, cartilage, submandibular gland, trachea, and urinary bladder.[5] Reporter assays indicate particularly robust activity in neural tissues, such as the brain (with the exception of olfactory bulb granule cells), and in the germline, including testis, where staining is uniform in most cell types.[5] However, recent RNA-seq studies have shown that endogenous transcription of the ROSA26 lncRNA is absent in the spermatogonial lineage and Sertoli cells of the adult testis, suggesting that reporter expression from gene traps may not fully reflect endogenous activity in these germ cell types.[13] Hematopoietic and lymphopoietic cells, from bone marrow to peripheral blood nucleated cells, also show consistent expression, though mature erythrocytes lack detectable activity due to the absence of nuclei.[5]Quantitative assessments reveal moderate basal expression levels from the ROSA26 promoter, typically providing 3-fold higher β-galactosidase activity in ES cells compared to the phosphoglycerate kinase (PGK) promoter, a common housekeeping benchmark, while remaining lower than strong viral promoters in targeted transgenes.[5] This moderate output, often described as sufficient for reliable detection but not overexpression, maintains stability across multiple generations in homozygous mice, with no observed variegation or loss of pattern in fertile lineages.[5][14]Validation of the expression pattern has relied primarily on β-galactosidase reporter assays using X-Galstaining, which demonstrates intense blue precipitation in embryos from E9.5 onward, neonates, and adult sections across diverse tissues.[5][2] Complementary RT-PCR analyses from the 1990s confirmed the presence of locus-derived transcripts in ES cells and embryonic tissues, correlating with stainingintensity and verifying the disruption of non-coding RNAs without altering the ubiquitous profile.[5] These methods, established in early gene trap studies, have provided a foundational dataset for the locus's reliability in developmental and tissue-wide contexts.[2]
Endogenous Role and Non-Coding RNA
The ROSA26 locus encodes a long non-coding RNA (lncRNA) transcript of approximately 1.8 kb, consisting of multiple isoforms such as ENSMUST00000242415 (1,845 bp) and ENSMUST00000332449 (1,738 bp), which lack protein-coding potential.[15] This lncRNA is transcribed from a constitutive promoter and exhibits ubiquitous expression across mouse tissues, though its precise biological role remains largely unknown.[1][8]Studies investigating the endogenous function of the ROSA26 lncRNA have primarily relied on gene trap insertions and targeted disruptions, which mimic or abolish its expression. These experiments demonstrate that the locus has no essential role in development or homeostasis, as homozygous disruptions result in viable, fertile mice with no gross phenotypic abnormalities.[7][16] For instance, mice carrying the original Gt(ROSA)26Sor gene trap allele or engineered knock-in modifications at the locus display normal size, behavior, and Mendelian inheritance ratios, indicating functional redundancy or neutrality of the lncRNA.[7][17] Similarly, CRISPR/Cas9-mediated targeting of ROSA26 in embryonic stem cells and subsequent mouse generation yields animals without reported defects, reinforcing its classification as a "neutral" genomic site.[18]Post-2010 analyses, including RNA-seq profiling across diverse tissues and cell types, have confirmed the lncRNA's low-level transcription in many tissues without identifying specific regulatory impacts, though a 2023 study revealed transcriptional inactivity specifically in spermatogenic cells of the adult testis.[13] While general lncRNA mechanisms suggest potential involvement in chromatin organization or epigenetic scaffolding, no direct evidence links the ROSA26 transcript to processes like X-chromosome inactivation or genomic imprinting; instead, RNA-seq data portray it as transcriptionally active yet functionally insignificant in these contexts.[4] This lack of phenotype upon deletion underscores the locus's suitability for exogenous genetic modifications, as its endogenous output does not impose selective constraints.[19]
Applications in Genetic Engineering
Safe Harbor Properties
The ROSA26 locus is regarded as a safe harbor in the mouse genome due to its suitability as a neutral insertion site for transgenes, minimizing disruptions to endogenous gene function, position-effect variegation, and transcriptional silencing. Safe harbor loci are characterized by specific criteria, including a well-defined genomic position in intergenic or non-essential regions, an active chromatin environment that supports stable expression, and the absence of nearby genes whose disruption could lead to deleterious effects or oncogenesis. Insertions at ROSA26 fulfill these standards, as the locus resides in a non-protein-coding region that permits transgene integration without altering critical cellular processes or viability.[20][21][22]Several advantages contribute to the reliability of ROSA26 for genetic engineering. Its endogenous promoter enables ubiquitous and constitutive expression across all cell types and developmental stages, from embryonic morula to adult tissues, ensuring predictable transgene output without tissue-specific variability. The locus maintains an open chromatin conformation, enriched with active histone marks like H3K4me3 at promoter regions, which resists silencing and facilitates consistent accessibility for transcriptional machinery. Furthermore, targeted modifications at ROSA26 support efficient germline transmission, producing non-mosaic progeny that inherit the insertion uniformly, which is essential for generating stable transgenic lines.[20][21][23]Empirical evidence underscores ROSA26's safety and efficacy, with over 1,000 knock-in mouse strains documented since its identification in the 1990s (as of November 2025), all exhibiting reliable transgene expression without associated oncogenic risks or gross phenotypic abnormalities in heterozygous or homozygous animals.[20][21][22][7] Initial studies using gene trap vectors demonstrated widespread β-galactosidase reporter activity in embryos and hematopoietic cells, confirming the locus's neutrality and broad utility for transgenesis. No adverse health impacts have been reported across these models, reinforcing its role as a preferred site for long-term genetic studies.[21][18]A notable limitation of using ROSA26 is the risk of multiple or unintended integrations, particularly with homology-independent methods, which can occur at rates up to 32% and require rigorous validation to confirm precise, single-copy targeting.[20]
Knock-in and Transgene Insertion Techniques
The Rosa26 locus has been widely utilized for knock-in strategies through traditional homologous recombination (HR), which involves designing targeting vectors that integrate transgenes via sequence homology arms flanking the insertion site. A common approach employs the Rosa26-loxP-STOP-loxP (LSL) system, where a loxP-flanked transcriptional stop cassette (often containing polyA signals) prevents expression until removed by Cre recombinase, enabling conditional transgene activation. This system was pioneered for inserting reporter genes like enhanced yellow fluorescent protein (EYFP) and enhanced cyan fluorescent protein (ECFP) downstream of a weak splice acceptor in the first intron of Rosa26, allowing ubiquitous expression post-Cre recombination without disrupting endogenous gene function. Targeting vectors are typically linearized and electroporated into embryonic stem (ES) cells, such as those on a C57BL/6 background, followed by selection using neomycin resistance conferred by the vector. Positive clones are identified via PCR amplification of junction fragments (yielding ~1.5 kb products) and confirmed by Southern blotting with probes detecting expected band shifts (e.g., 4 kb 5' and 9.6 kb 3' fragments after EcoRV digestion), achieving targeting efficiencies of 30% or higher among resistant clones.Advancements in CRISPR/Cas9 technology since 2013 have streamlined Rosa26 knock-ins by introducing double-strand breaks at the locus via guide RNAs (gRNAs), promoting homology-directed repair (HDR) with donor templates. gRNAs are designed to target specific sites, such as the XbaI restriction site in Rosa26 intron 1, ensuring precise cuts compatible with standard homology arms of 0.5–1 kb flanking the transgene. In ES cells, CRISPR/Cas9 ribonucleoprotein (RNP) complexes delivered by electroporation yield knock-in efficiencies exceeding 50% when combined with drug selection, such as puromycin or G418, and circular donor plasmids to minimize non-homologous end joining. For instance, single electroporation of Cas9 RNP, gRNA, and donor vector into murine ES cells has resulted in up to 87% targeted clones for reporter integration, with dual knock-ins at Rosa26 and other loci reaching 65% efficiency.[24] These methods reduce the timeline for generating knock-in mice from months to weeks compared to traditional HR, often bypassing extensive ES cell screening.Common constructs inserted at Rosa26 include fluorescent reporters like tdTomato or mT/mG dual reporters for lineage tracing, Cre recombinase drivers for conditional genetics, and transgenes modeling diseases such as optogenetic tools (e.g., halorhodopsin for neural silencing) or oncogenic alleles (e.g., KRAS mutations in lung adenocarcinoma models). The LSL configuration remains prevalent in CRISPR approaches, as in Rosa26^{LSL-Cas9} lines for secondary editing or Rosa26^{LSL-PD-L1} for immune checkpoint studies. A typical protocol begins with constructing the donor vector via Gateway recombination or In-Fusion cloning, incorporating elements like a CAG promoter, the transgene, and selection markers. The vector is then co-delivered with CRISPR components into ES cells by electroporation (e.g., 500 V, 3 ms pulse in ES cell buffer), followed by 7–10 days of puromycin selection (1–2 μg/mL). Surviving clones are expanded, genotyped by PCR/Southern blot or sequencing, and validated ES cells (typically 10–20 high-quality lines) are injected into blastocysts for chimera generation and germline transmission, yielding founder mice in 3–6 months. This process leverages Rosa26's safe harbor status to ensure stable, heritable expression across tissues.
Extensions to Other Organisms
Use in Non-Mouse Models
The rat ortholog of the ROSA26 locus, located on chromosome 4, serves as a safe harbor for genetic modifications and has been targeted using CRISPR/Cas9-mediated knock-in strategies since 2015 to generate reporter lines and facilitate conditional gene expression in various research contexts, including cardiovascular disease modeling.[25][26] For instance, Cre reporter rat strains inserted at this locus enable monitoring of Cre-loxP-mediated lineage tracing, supporting applications in tissue-specific studies relevant to cardiac function and pathology.[26]In pigs, targeting of the ROSA26 locus has proven valuable for xenotransplantation research, with early 2014 studies demonstrating its utility for stable overexpression of transgenes, including human-compatible factors to mitigate immune rejection.[27] These approaches allow integration of multiple genetic modifications, such as anti-inflammatory or complement-regulatory genes, to produce donor organs more suitable for human transplantation while maintaining ubiquitous and consistent expression.[28] Subsequent optimizations have enhanced efficiency, enabling Cre-mediated lineage tracing in porcine models to track cell fate during organ development and rejection processes.[27]Zebrafish, lacking a direct ROSA26 homolog, employ analogous safe harbor sites for targeted transgene insertion, which support reproducible transgenesis without disrupting endogenous gene regulation. These loci facilitate efficient integration via CRISPR/Cas9, allowing stable expression of reporters or disease-relevant genes in developmental and disease models.[29] Similar targeting has been achieved in livestock species like cattle using CRISPR/Cas9 for agricultural and biomedical models.[30]The rabbit ROSA26 ortholog, identified through sequence homology, has been successfully targeted for knock-in of reporter genes like EGFP and Cre, providing a platform for stable, ubiquitous transgene expression in biotechnology applications, including models for therapeutic protein production such as antibodies.[4] This locus's properties enable high-fidelity integration, supporting rabbit lines used in preclinical testing of biologics.[4]A key challenge in adapting ROSA26 targeting to non-mouse models, especially large animals like pigs and rabbits, is reduced efficiency compared to rodents, with CRISPR/Cas9 knock-in rates at this locus typically ranging from 10-30% in primary fibroblasts after optimization.[31] Viral vector delivery, such as AAV, can achieve targeted integration success rates of approximately 20-30% but often requires somatic cell nuclear transfer for germline transmission, increasing complexity and cost.[32]
Human ROSA26 Homolog and Targeting
The human homolog of the mouse ROSA26 locus, designated hROSA26, resides on chromosome 3p25.3 within an intron of the long non-coding RNA gene THUMPD3-AS1 (Gene ID: 440944).[33] This locus was identified in 2007 through computational homology searches aligning mouse Rosa26 sequences to the human genome, followed by validation via bacterial artificial chromosome (BAC) cloning and targeting experiments in human embryonic stem (hES) cells.[34] Although it shares partial syntenic conservation with the mouse locus, including an open chromatin structure conducive to ubiquitous transcription, hROSA26 exhibits weaker baseline expression and reduced stability compared to its murine counterpart.[34][33]Targeting of hROSA26 in human cells has been facilitated by homologous recombination and, more recently, CRISPR/Cas9-mediated knock-in strategies, establishing it as a genomic safe harbor for transgene integration.[34] In induced pluripotent stem cells (iPSCs), hROSA26 serves as an alternative to the AAVS1 locus for stable insertion of therapeutic transgenes, offering multilineage expression potential while minimizing disruption to endogenous genes; however, unlike AAVS1, which is often preferred for its PPP1R12C intron location and consistent activity, hROSA26 requires careful promoter selection to mitigate variability.[33]CRISPR/Cas9 targeting at hROSA26 achieves knock-in efficiencies of approximately 2-7% in HEK293 cells under optimized conditions, such as enhanced homology-directed repair protocols.[35]Applications of hROSA26 targeting have expanded into gene therapy for various diseases. These approaches leverage hROSA26's broad accessibility for ex vivo correction of hematopoietic stem cells in preclinical studies. Despite its utility, hROSA26 is more susceptible to transgene silencing via epigenetic mechanisms than the mouse Rosa26 locus, prompting ongoing research into insulator elements and alternative safe harbors for optimization in therapeutic contexts.[33][36]