Fact-checked by Grok 2 weeks ago

ROSA26

The ROSA26 locus, also known as Gt(ROSA)26Sor, is a genomic site on at position 113,044,389–113,054,205 (GRCm39 assembly) that functions as a safe harbor for stable in . Discovered in 1991 through a retroviral gene-trap screen in embryonic stem cells, it was identified as a neutral insertion point where proviral did not produce observable phenotypic effects or disrupt essential functions. The locus spans approximately 9 kb, consists of three exons, and primarily encodes non-coding RNAs with ubiquitous transcriptional activity across tissues and developmental stages, enabling consistent, copy-number-dependent expression of inserted sequences without or . Subsequent characterization revealed that the ROSA26 locus produces multiple overlapping transcripts, including a nuclear RNA with broad expression, which supports its utility for driving reporter genes like in trapped alleles. Its popularity in mouse genetics stems from the lack of adverse effects on viability, fertility, or development upon targeting, making it ideal for generating transgenic models via or zinc finger nucleases. Common applications include the creation of knock-in lines for Cre/loxP recombinase expression to enable conditional gene manipulation, fluorescent reporters for lineage tracing, and overexpression of genes or non-coding RNAs for functional studies in development, , and disease modeling. Over 130 such targeted mouse lines have been reported, with modern / methods further enhancing targeting efficiency in embryonic stem cells and beyond. While primarily established in mice, orthologous ROSA26 sites have been identified and utilized in other species like rabbits and pigs for similar purposes.

Discovery and History

Initial Identification

The ROSA26 locus was first identified in 1991 by G. Friedrich and P. Soriano in Philippe Soriano's group at the Roche Institute of Molecular Biology through a promoter-trap gene screen in embryonic cells. The experimental setup involved infection of embryonic cells with the ROSAβgeo retroviral gene-trap vector to detect insertion sites associated with active promoters during development. This vector included a acceptor fused upstream to a encoding β-galactosidase (lacZ) and neomycin resistance, enabling both selection of stable integrants and visualization of expression patterns. The ROSA26 locus was initially isolated as one of the integration sites where the inserted (from gene trap pool #26), leading to expression of the lacZ under the control of endogenous regulatory elements. Early observations in chimeric mice derived from these targeted embryonic stem cells revealed ubiquitous lacZ expression, indicating broad transcriptional activity at the ROSA26 locus across various tissues.

Gene Trap Screen and Early Studies

The ROSA26 locus emerged from a gene-trap screen in embryonic cells, employing a promoter trap designed for into actively transcribed genes. This featured a acceptor sequence upstream of the β-geo —a combining β-galactosidase (lacZ) for histochemical detection and neomycin phosphotransferase for selectable resistance—along with a promoterless neomycin selection cassette to enable enrichment of trapped integrations. The design allowed the to into host introns, harnessing endogenous promoters to drive constitutive expression of the without requiring an internal promoter. Initial studies by the Soriano group in 1991 validated the functionality of the ROSA26 insertion through generation of chimeric mice and germline transmission. X-gal staining revealed ubiquitous β-galactosidase activity in preimplantation embryos, postimplantation stages (e.g., E9.5), and adult tissues, including brain, heart, kidney, liver, spleen, and hematopoietic cells such as B cells, T cells, and myeloid lineages. These findings confirmed broad, constitutive expression across developmental stages and multiple tissue types, with staining intensity varying by cell type but present in all examined samples. Notably, newborn pups displayed intense blue staining in virtually all tissues, underscoring the locus's potential as a neutral reporter platform. Refinements in the Soriano laboratory involved screening multiple independent ES cell clones from the initial gene-trap library, identifying the ROSAβ-geo26 line with single or low-copy insertions that consistently yielded ubiquitous reporter expression without . This repeatability across clones established the locus's reliability for stable, non-silenced activity. Further in 1997 revealed that the insertion at ROSA26 disrupted only overlapping non-coding transcripts without affecting nearby protein-coding genes or producing an overt in homozygous mice, indicating a neutral integration site suitable for .

Genomic Location and Structure

Chromosomal Position

The ROSA26 locus, officially designated Gt(ROSA)26Sor, is situated on chromosome 6 within the cytogenetic 6 A3.1. In the GRCm39 assembly, it occupies genomic coordinates from 113,044,389 to 113,054,205 bp on the reverse strand, corresponding to a position of approximately 113 Mb from the and spanning roughly 9.8 kb. This location places it in a gene-poor region between the genes Thumpd3 and Nutf2, facilitating its use without disrupting essential nearby s. The mapping of ROSA26 to was established through analysis in 1997, which showed no recombination with the marker D6Mit10 in a panel of intersubspecific backcross progeny. Earlier efforts in the 1990s utilized Southern blotting to verify insertional events during gene trapping and (FISH) to confirm its chromosomal assignment, building on the locus's initial isolation via retroviral gene trap in embryonic stem cells. These pre-sequencing methods provided the foundational localization, which was subsequently refined to resolution by the Mouse Genome Sequencing Consortium in 2002. Syntenic conservation of the ROSA26 region extends to other mammals, with a homolog identified on chromosome 3p25.3 (THUMPD3-AS1; GRCh38.p14 coordinates chr3:9,385,264–9,397,494). The ortholog has been targeted for expression but exhibits some differences in regulatory features compared to the locus.

Sequence Features and Organization

The Gt(ROSA)26Sor locus, also known as Thumpd3as1, officially symbolized as Gt(ROSA)26Sor with NCBI Gene ID 14910, is a non-protein-coding that spans approximately 9.8 kb (from 113,044,389 to 113,054,205 on the complement strand of in the GRCm39 assembly). It produces a (lncRNA) lacking any protein-coding potential, with three primary transcripts (NR_027008.1, NR_027009.1, and NR_027010.1) generated through . The features a multi-exon comprising at least three s, with the gene trap insertion originally occurring within the first of the sense-oriented transcripts. Multiple splice sites enable the production of distinct lncRNA variants, including a 1.17 transcript (sharing 1 with shorter isoforms) and a 0.41 transcript that splices 1 to a downstream . An antisense transcript overlaps the locus but is not disrupted by typical targeting strategies. The endogenous promoter, located upstream of 1, is constitutive and of medium strength, characterized by the absence of a but the presence of GC-rich regions and CAAT boxes, supporting housekeeping-like transcription initiation at multiple start sites. Key structural elements include a signal positioned about 20 upstream of the poly(A) tail in the primary sense transcript, ensuring efficient 3' end formation. The locus maintains an open configuration with minimal strong regulatory elements such as enhancers or insulators, which promotes stable, ubiquitous accessibility without position-effect for inserted sequences.

Biological Function

Expression Pattern

The ROSA26 locus exhibits ubiquitous and constitutive expression in mice, driven by its endogenous promoter, which was first observed through gene trap insertions that revealed widespread reporter activity in embryonic stem (ES) cells and early embryos. This expression initiates as early as the morula-to-blastocyst stage, corresponding to approximately embryonic day 3.5 (E3.5), and persists without silencing through embryonic development, neonatal stages, and into adulthood. In ES cells derived from blastocysts, the locus supports active transcription, enabling reliable reporter detection from the outset of in vitro culture. Spatially, ROSA26 expression is detected across virtually all tissues examined, including brain, heart, liver, kidney, lung, pancreas, intestine, muscle, skin, spleen, thymus, bone marrow, cartilage, submandibular gland, trachea, and urinary bladder. Reporter assays indicate particularly robust activity in neural tissues, such as the brain (with the exception of olfactory bulb granule cells), and in the germline, including testis, where staining is uniform in most cell types. However, recent RNA-seq studies have shown that endogenous transcription of the ROSA26 lncRNA is absent in the spermatogonial lineage and Sertoli cells of the adult testis, suggesting that reporter expression from gene traps may not fully reflect endogenous activity in these germ cell types. Hematopoietic and lymphopoietic cells, from bone marrow to peripheral blood nucleated cells, also show consistent expression, though mature erythrocytes lack detectable activity due to the absence of nuclei. Quantitative assessments reveal moderate basal expression levels from the ROSA26 promoter, typically providing 3-fold higher β-galactosidase activity in ES cells compared to the (PGK) promoter, a common housekeeping benchmark, while remaining lower than strong viral promoters in targeted transgenes. This moderate output, often described as sufficient for reliable detection but not overexpression, maintains stability across multiple generations in homozygous mice, with no observed or loss of pattern in fertile lineages. Validation of the expression pattern has relied primarily on reporter assays using , which demonstrates intense blue precipitation in embryos from E9.5 onward, neonates, and adult sections across diverse tissues. Complementary RT-PCR analyses from the confirmed the presence of locus-derived transcripts in cells and embryonic tissues, correlating with and verifying the disruption of non-coding RNAs without altering the ubiquitous profile. These methods, established in early gene trap studies, have provided a foundational for the locus's reliability in developmental and tissue-wide contexts.

Endogenous Role and Non-Coding RNA

The ROSA26 locus encodes a (lncRNA) transcript of approximately 1.8 kb, consisting of multiple isoforms such as ENSMUST00000242415 (1,845 bp) and ENSMUST00000332449 (1,738 bp), which lack protein-coding potential. This lncRNA is transcribed from a constitutive promoter and exhibits ubiquitous expression across tissues, though its precise biological role remains largely unknown. Studies investigating the endogenous function of the ROSA26 lncRNA have primarily relied on gene trap insertions and targeted disruptions, which mimic or abolish its expression. These experiments demonstrate that the locus has no essential role in development or homeostasis, as homozygous disruptions result in viable, fertile mice with no gross phenotypic abnormalities. For instance, mice carrying the original Gt(ROSA)26Sor gene trap allele or engineered knock-in modifications at the locus display normal size, behavior, and Mendelian inheritance ratios, indicating functional redundancy or neutrality of the lncRNA. Similarly, CRISPR/Cas9-mediated targeting of ROSA26 in embryonic stem cells and subsequent mouse generation yields animals without reported defects, reinforcing its classification as a "neutral" genomic site. Post-2010 analyses, including profiling across diverse tissues and cell types, have confirmed the lncRNA's low-level transcription in many tissues without identifying specific regulatory impacts, though a 2023 study revealed transcriptional inactivity specifically in spermatogenic cells of the adult testis. While general lncRNA mechanisms suggest potential involvement in organization or epigenetic , no direct evidence links the ROSA26 transcript to processes like X-chromosome inactivation or ; instead, data portray it as transcriptionally active yet functionally insignificant in these contexts. This lack of upon deletion underscores the locus's suitability for exogenous genetic modifications, as its endogenous output does not impose selective constraints.

Applications in Genetic Engineering

Safe Harbor Properties

The ROSA26 locus is regarded as a safe harbor in the genome due to its suitability as a neutral insertion site for , minimizing disruptions to endogenous gene function, position-effect variegation, and transcriptional silencing. Safe harbor loci are characterized by specific criteria, including a well-defined genomic position in intergenic or non-essential , an active environment that supports stable expression, and the absence of nearby genes whose disruption could lead to deleterious effects or oncogenesis. Insertions at ROSA26 fulfill these standards, as the locus resides in a non-protein-coding that permits transgene integration without altering critical cellular processes or viability. Several advantages contribute to the reliability of ROSA26 for . Its endogenous promoter enables ubiquitous and constitutive expression across all cell types and developmental stages, from embryonic morula to adult tissues, ensuring predictable output without tissue-specific variability. The locus maintains an open conformation, enriched with active marks like at promoter regions, which resists silencing and facilitates consistent accessibility for transcriptional machinery. Furthermore, targeted modifications at ROSA26 support efficient transmission, producing non-mosaic progeny that inherit the insertion uniformly, which is essential for generating stable transgenic lines. Empirical evidence underscores ROSA26's safety and efficacy, with over 1,000 knock-in strains documented since its identification in the 1990s (as of November 2025), all exhibiting reliable expression without associated oncogenic risks or gross phenotypic abnormalities in heterozygous or homozygous animals. Initial studies using trap vectors demonstrated widespread β- reporter activity in embryos and hematopoietic cells, confirming the locus's neutrality and broad utility for transgenesis. No adverse health impacts have been reported across these models, reinforcing its role as a preferred site for long-term genetic studies. A notable limitation of using ROSA26 is the risk of multiple or unintended integrations, particularly with homology-independent methods, which can occur at rates up to % and require rigorous validation to confirm precise, single-copy targeting.

Knock-in and Transgene Insertion Techniques

The Rosa26 locus has been widely utilized for knock-in strategies through traditional (), which involves designing targeting vectors that integrate via sequence homology arms flanking the insertion site. A common approach employs the Rosa26-loxP-STOP-loxP (LSL) system, where a loxP-flanked transcriptional stop cassette (often containing polyA signals) prevents expression until removed by , enabling conditional transgene activation. This system was pioneered for inserting reporter genes like enhanced (EYFP) and enhanced cyan fluorescent protein (ECFP) downstream of a weak splice acceptor in the first of Rosa26, allowing ubiquitous expression post-Cre recombination without disrupting endogenous gene function. Targeting vectors are typically linearized and electroporated into embryonic (ES) cells, such as those on a C57BL/6 background, followed by selection using neomycin resistance conferred by the vector. Positive clones are identified via amplification of junction fragments (yielding ~1.5 kb products) and confirmed by Southern blotting with probes detecting expected band shifts (e.g., 4 kb 5' and 9.6 kb 3' fragments after EcoRV digestion), achieving targeting efficiencies of 30% or higher among resistant clones. Advancements in / technology since 2013 have streamlined Rosa26 knock-ins by introducing double-strand breaks at the locus via guide RNAs (gRNAs), promoting () with donor templates. gRNAs are designed to target specific sites, such as the XbaI restriction site in Rosa26 1, ensuring precise cuts compatible with standard homology arms of 0.5–1 kb flanking the . In cells, / ribonucleoprotein (RNP) complexes delivered by yield knock-in efficiencies exceeding 50% when combined with drug selection, such as or , and circular donor plasmids to minimize . For instance, single of Cas9 RNP, gRNA, and donor vector into murine cells has resulted in up to 87% targeted clones for reporter , with dual knock-ins at Rosa26 and other loci reaching 65% efficiency. These methods reduce the timeline for generating knock-in mice from months to weeks compared to traditional , often bypassing extensive ES cell screening. Common constructs inserted at Rosa26 include fluorescent reporters like tdTomato or mT/mG dual reporters for lineage tracing, drivers for conditional genetics, and transgenes modeling diseases such as optogenetic tools (e.g., halorhodopsin for neural silencing) or oncogenic alleles (e.g., mutations in lung adenocarcinoma models). The LSL configuration remains prevalent in approaches, as in Rosa26^{LSL-Cas9} lines for secondary editing or Rosa26^{LSL-PD-L1} for studies. A typical begins with constructing the donor vector via Gateway recombination or In-Fusion , incorporating elements like a , the , and selection markers. The vector is then co-delivered with components into ES cells by (e.g., 500 V, 3 ms pulse in ES cell buffer), followed by 7–10 days of puromycin selection (1–2 μg/mL). Surviving clones are expanded, genotyped by /Southern blot or sequencing, and validated ES cells (typically 10–20 high-quality lines) are injected into blastocysts for chimera generation and germline transmission, yielding founder mice in 3–6 months. This process leverages Rosa26's safe harbor status to ensure stable, heritable expression across tissues.

Extensions to Other Organisms

Use in Non-Mouse Models

The rat ortholog of the ROSA26 locus, located on , serves as a safe harbor for genetic modifications and has been targeted using /Cas9-mediated knock-in strategies since 2015 to generate reporter lines and facilitate conditional in various research contexts, including modeling. For instance, Cre reporter strains inserted at this locus enable monitoring of Cre-loxP-mediated lineage tracing, supporting applications in tissue-specific studies relevant to cardiac and . In pigs, targeting of the ROSA26 locus has proven valuable for xenotransplantation research, with early 2014 studies demonstrating its utility for stable overexpression of transgenes, including human-compatible factors to mitigate immune rejection. These approaches allow integration of multiple genetic modifications, such as anti-inflammatory or complement-regulatory genes, to produce donor organs more suitable for transplantation while maintaining ubiquitous and consistent expression. Subsequent optimizations have enhanced efficiency, enabling Cre-mediated lineage tracing in porcine models to track cell fate during organ development and rejection processes. Zebrafish, lacking a direct ROSA26 homolog, employ analogous safe harbor sites for targeted insertion, which support reproducible transgenesis without disrupting endogenous gene regulation. These loci facilitate efficient integration via /, allowing stable expression of reporters or disease-relevant genes in developmental and disease models. Similar targeting has been achieved in livestock species like using / for agricultural and biomedical models. The ROSA26 ortholog, identified through , has been successfully targeted for knock-in of reporter genes like EGFP and Cre, providing a platform for stable, ubiquitous expression in applications, including models for therapeutic such as antibodies. This locus's properties enable high-fidelity , supporting rabbit lines used in preclinical testing of biologics. A key challenge in adapting ROSA26 targeting to non-mouse models, especially large animals like pigs and rabbits, is reduced efficiency compared to , with / knock-in rates at this locus typically ranging from 10-30% in primary fibroblasts after optimization. Viral vector delivery, such as AAV, can achieve targeted integration success rates of approximately 20-30% but often requires for germline transmission, increasing complexity and cost.

Human ROSA26 Homolog and Targeting

The human homolog of the mouse ROSA26 locus, designated hROSA26, resides on chromosome 3p25.3 within an intron of the long non-coding RNA gene THUMPD3-AS1 (Gene ID: 440944). This locus was identified in 2007 through computational homology searches aligning mouse Rosa26 sequences to the human genome, followed by validation via bacterial artificial chromosome (BAC) cloning and targeting experiments in human embryonic stem (hES) cells. Although it shares partial syntenic conservation with the mouse locus, including an open chromatin structure conducive to ubiquitous transcription, hROSA26 exhibits weaker baseline expression and reduced stability compared to its murine counterpart. Targeting of hROSA26 in human cells has been facilitated by and, more recently, /Cas9-mediated knock-in strategies, establishing it as a genomic safe harbor for integration. In induced pluripotent stem cells (iPSCs), hROSA26 serves as an alternative to the AAVS1 locus for stable insertion of therapeutic , offering multilineage expression potential while minimizing disruption to endogenous genes; however, unlike AAVS1, which is often preferred for its PPP1R12C location and consistent activity, hROSA26 requires careful promoter selection to mitigate variability. /Cas9 targeting at hROSA26 achieves knock-in efficiencies of approximately 2-7% in HEK293 cells under optimized conditions, such as enhanced protocols. Applications of hROSA26 targeting have expanded into for various diseases. These approaches leverage hROSA26's broad accessibility for correction of hematopoietic stem cells in preclinical studies. Despite its utility, hROSA26 is more susceptible to transgene silencing via epigenetic mechanisms than the Rosa26 locus, prompting ongoing research into elements and alternative safe harbors for optimization in therapeutic contexts.