Fact-checked by Grok 2 weeks ago

Open reading frame

In molecular biology, an open reading frame (ORF) is a continuous stretch of codons in a DNA or RNA sequence that begins with a start codon—typically AUG in messenger RNA—and terminates at an in-frame stop codon (UAA, UAG, or UGA), without any intervening stop codons that would halt translation. This sequence has the potential to be translated by ribosomes into a functional polypeptide or protein, representing a key unit in the genetic code. ORFs occur in all reading frames of a nucleic acid strand—three forward and three reverse for double-stranded DNA—and their identification relies on the triplet nature of the genetic code, where each set of three nucleotides specifies an amino acid or stop signal. The concept of ORFs emerged as a fundamental tool in during the era of genome sequencing projects, enabling the prediction of protein-coding regions in prokaryotic and eukaryotic alike. In prokaryotes, where genes lack introns, long ORFs often directly correspond to functional genes, facilitating rapid of bacterial and archaeal . In eukaryotes, ORFs are more complex due to splicing, but they still serve as primary indicators of exons and potential coding sequences, often verified through or expression data. The length of an ORF is a critical factor; while prokaryotic ORFs are typically over 100 codons to distinguish them from random sequences, shorter ones—known as small ORFs (sORFs)—have gained recognition for encoding regulatory peptides or microproteins that influence cellular processes. Beyond , ORFs play a pivotal role in , viral studies, and , where they are scanned in metagenomic data to uncover novel enzymes or antigens. Algorithms for ORF detection, such as those integrated into tools like or Glimmer, account for organism-specific codon biases and evolutionary conservation to prioritize biologically relevant frames. Recent advances, including , have revealed that many non-canonical ORFs, previously overlooked, contribute to diversity and disease mechanisms, underscoring their ongoing significance in molecular research.

Definition and Properties

Core definition

An open reading frame (ORF) is defined as a continuous stretch of codons within a DNA or RNA sequence that begins with a start codon, typically ATG in DNA or AUG in RNA, and terminates at an in-frame stop codon, such as TAA, TAG, or TGA in DNA (corresponding to UAA, UAG, or UGA in RNA), without any intervening stop codons. This structure ensures that the sequence can potentially be read by the translational machinery without premature interruption. ORFs represent segments of genetic material with the capacity for translation into polypeptides by ribosomes, serving as key indicators of potential protein-coding regions. In genomic DNA, ORFs are scanned across sequences to predict genes, whereas in RNA transcripts like mRNA, they directly correspond to translatable portions following transcription. ORFs exist within one of the three possible reading frames on each strand of DNA or RNA, framing the genetic code into triplets. The concept of ORFs emerged in the 1970s amid pioneering efforts, notably through analysis of the bacteriophage φX174 genome, where stretches free of stop codons revealed overlapping genes and potential coding sequences. Regarding length, viable ORFs encoding functional proteins are generally at least 100 codons (300 ) long, though shorter variants known as small ORFs (smORFs) occur and may contribute to cellular processes.

Reading frames and structure

In , a refers to the partitioning of a into consecutive, non-overlapping triplets known as codons, each consisting of three that specify an or a stop signal during . For a single strand of DNA or RNA, three distinct reading frames are possible, determined by the starting position of the first codon: the first frame begins at nucleotide position 1, the second at position 2, and the third at position 3. This division ensures that the sequence is read in phase, maintaining consistent codon boundaries throughout translation. Because genomic DNA is double-stranded, with a (coding) strand and an antisense () strand, the total number of possible reading frames doubles to six: three forward frames on the (reading from 5' to 3') and three reverse frames on the antisense strand (also read 5' to 3', equivalent to 3' to 5' on the ). These frames allow for the identification of potential coding regions on both strands, as genes can be encoded in either . Within a specific reading frame, an open reading frame (ORF) is structurally defined as a continuous stretch of codons that begins with an initiation codon (typically ATG in DNA or AUG in RNA) and terminates at the first in-frame stop codon (TAA, TAG, or TGA), without any intervening stop codons that would disrupt the phase. This in-frame continuity preserves the reading phase, ensuring that the sequence can theoretically be translated into a polypeptide without premature termination. Stop codons act as boundaries, segmenting the frame into discrete potential ORFs; any out-of-phase stops do not affect the integrity of an in-frame ORF. The number and length of potential ORFs in a reading frame of a sequence with N nucleotides (approximately N/3 codons) depend on the positions of start and stop codons within that frame. For example, each start codon paired with the nearest downstream in-frame stop codon defines one ORF, and the total count equals the number of such valid pairs before the sequence ends; the longest ORF often serves as a proxy for likely functional genes, such as those spanning over 100 codons in prokaryotic genomes. This positional determination highlights how sparse or clustered stop codons influence ORF distribution and potential protein yield. The structural implications of reading frames vary between prokaryotes and eukaryotes due to differences in genome organization. In prokaryotes, which lack introns, the genomic DNA sequence directly forms the mature mRNA, so the reading frame is uninterrupted and frameshift mutations (e.g., insertions or deletions not in multiples of three) propagate disruptions throughout the downstream sequence. In contrast, eukaryotic genes contain introns that interrupt the coding sequence in genomic DNA, requiring accurate splicing to excise non-coding regions and join exons, thereby reconstructing a continuous in-frame ORF in the mature mRNA; errors in splicing can shift the frame, leading to aberrant proteins.

Types of Open Reading Frames

Canonical ORFs

Canonical open reading frames (ORFs) refer to the primary protein-coding sequences in mRNA transcripts that align with annotated coding sequences (CDS) of known or predicted genes, typically representing the longest uninterrupted sequence in the primary reading frame starting from an AUG initiation codon and terminating at an in-frame stop codon (UAA, UAG, or UGA). These ORFs encode the full-length, functional polypeptides that constitute the main products of protein-coding genes, distinguishing them as the standard baseline for gene expression in both prokaryotes and eukaryotes. Key characteristics of canonical ORFs include their substantial length, often exceeding 300 (corresponding to more than 100 ), which exceeds typical thresholds for functional protein-coding potential in pipelines. They are generally positioned downstream of promoter regions that facilitate transcription and are validated through expression evidence from techniques like RNA sequencing, , and , confirming their translation into abundant proteins. This association with regulatory elements and empirical support underscores their role in producing essential cellular components, such as enzymes and structural proteins. Illustrative examples abound in model organisms. In the bacterium , canonical ORFs form the core of polycistronic operons, such as the , where the lacZ, lacY, and lacA ORFs encode , lactose permease, and galactoside O-acetyltransferase, respectively, enabling metabolism under inducible conditions. In humans, canonical ORFs often traverse multiple exons, as exemplified by the TP53 gene, whose ORF produces the 393-amino-acid protein critical for DNA damage response and tumor suppression. These cases highlight how canonical ORFs integrate into well-characterized genetic pathways. Canonical ORFs demonstrate pronounced evolutionary conservation owing to purifying selection on their encoded proteins' functions, with sequences showing higher similarity across related compared to non-coding regions. This conservation is evident in alignments of orthologous genes, where synonymous substitutions are limited by codon bias and nonsynonymous changes are constrained by requirements. In genome annotation, ORFs are prioritized over shorter or out-of-frame spurious ORFs using multifaceted criteria: minimal length thresholds, sequence conservation scores (e.g., via PhyloCSF), codon usage patterns matching the organism's bias, and corroborative expression data, ensuring only biologically relevant candidates are designated as genes.

Alternative ORFs including short and upstream variants

Alternative open reading frames (altORFs) are non- sequences within a transcript that overlap or are nested relative to the primary coding sequence but utilize different reading frames, allowing the potential encoding of distinct protein isoforms or micropeptides. These structures contrast with ORFs by introducing translational from the same genetic locus, often arising from alternative start codons or frame shifts. Short open reading frames (sORFs) are defined as sequences shorter than 300 , typically encoding micropeptides of fewer than 100 , and have been identified in high abundance across eukaryotic genomes through advances in and . These sORFs frequently reside in untranslated regions (UTRs), long non-coding RNAs (lncRNAs), or intergenic spaces, challenging traditional views of non-coding genomic elements. Notable examples include functional micropeptides such as myoregulin, a 46-amino-acid that modulates sarcoplasmic/ Ca²⁺-ATPase activity in muscle cells, and those derived from mitochondrial genes like MIEF1, which influence dynamics. Upstream open reading frames (uORFs) are positioned in the 5' UTR of mRNAs, upstream of the main coding sequence, and predominantly function to repress of the downstream ORF by impeding progression or promoting dissociation. uORFs occur in 46–63% of protein-coding transcripts and can encode bioactive micropeptides that fine-tune cellular responses, such as in stress-induced pathways where uORF enhances downstream gene activation under specific conditions like eIF2α . For instance, uORFs in the transcript regulate expression during starvation by controlling reinitiation efficiency. The discovery of these alternative ORFs has accelerated in the , with integrated analyses of data from diverse eukaryotes revealing over 58,000 noncanonical ORFs in humans alone, including thousands of stable sORFs and uORFs translated with 3-nucleotide periodicity. However, challenges persist due to their generally low sequence conservation compared to canonical ORFs, making it difficult to differentiate genuine coding events from translational noise, though functional validation via screens has confirmed roles in processes like for high-confidence candidates. Recent underscores their coding and regulatory potential, with micropeptides from sORFs implicated in stress responses and signaling, such as PIGBOS in mitochondrial integrity.

Methods of Detection

Six-frame translation

Six-frame translation is a foundational computational method for identifying potential open reading frames (ORFs) in sequence by translating it in all six possible reading frames—three on the forward strand and three on the reverse-complement strand—to detect continuous stretches of codons from a (typically ATG) to an in-frame (TAA, , or ). This approach leverages the triplet nature of the , where each frame shifts the reading position by one , allowing exhaustive scanning without assuming a specific transcriptional direction or prior annotations. The process involves several key steps: first, the DNA sequence is divided into six frames by offsetting the starting position (positions 1, 2, or 3) on both strands; second, each frame is scanned for start codons; third, potential ORFs are extended downstream until the first in-frame stop codon is encountered; and fourth, candidate ORFs are filtered based on criteria such as minimum length (e.g., greater than 100 nucleotides) to exclude short, likely non-coding sequences. This method produces a set of putative protein sequences for each frame, which can then be analyzed for biological relevance. One primary advantage of six-frame translation is its simplicity and independence from existing genome annotations, making it particularly suitable for de novo analysis of prokaryotic genomes or viral sequences, where coding regions are typically continuous and compact without introns. It requires no prior knowledge of gene structure, enabling rapid identification of potential coding regions in uncharacterized sequences. However, limitations include a high rate of false positives due to the generation of spurious ORFs in non-coding regions, especially in eukaryotic genomes where introns and splicing disrupt continuous translation, leading to lower specificity compared to more targeted methods. The approach also expands the search space sixfold, increasing computational demands without accounting for post-transcriptional modifications like alternative splicing. Historically, six-frame translation played a key role in early genome projects, such as the in the early 2000s, where it was applied to the entire assembly to generate putative ORFs for proteomic validation and novel discovery, aiding de novo prediction in the absence of complete annotations. This method facilitated the identification of peptides from data in initiatives like the Human Proteome Organization Plasma Proteome Project, contributing to refined models post-draft sequencing.

Advanced experimental and sequencing-based approaches

Advanced experimental and sequencing-based approaches have revolutionized the identification of open reading frames (ORFs) by providing direct evidence of , surpassing the limitations of purely computational predictions. These methods leverage high-throughput sequencing and biochemical assays to map activity and , enabling the detection of non-canonical ORFs such as short ORFs (sORFs) and upstream ORFs (uORFs) that may be overlooked in standard annotations. Ribosome profiling, also known as Ribo-seq, is a cornerstone technique that sequences ribosome-protected mRNA fragments to pinpoint actively translated regions with resolution. Developed by Ingolia et al. in 2009, this method isolates mRNA fragments shielded by ribosomes during translation, revealing the precise positions of translating ribosomes across the and identifying translated ORFs, including sORFs and uORFs, that encode previously unannotated peptides. Advances in the have refined Ribo-seq protocols, incorporating multiplexed sequencing and improved digestion to enhance sensitivity for low-abundance transcripts. Proteogenomics complements sequencing-based detection by integrating mass spectrometry-based proteomics with genomic data to empirically confirm ORF translation at the protein level. This approach searches mass spectrometry spectra against custom ORF databases derived from genomic sequences, identifying peptides from novel or alternative ORFs that align with translated regions. Seminal work by Nesvizhskii in 2015 highlighted proteogenomics' role in discovering coding regions missed by annotation pipelines, while recent screens have identified thousands of novel proteins from the "dark proteome." Additional techniques provide orthogonal validation of ORF translation. Polysome profiling fractionates mRNA based on the number of associated ribosomes, enriching for actively translated transcripts and enabling the detection of sORFs through subsequent sequencing or analysis. Reporter assays, such as luciferase-based constructs fused to putative ORF sequences, assess functional translation by measuring expression in cellular systems, confirming the regulatory potential of alternative start sites. These approaches offer key advantages over six-frame translation, which serves as a preliminary computational step but cannot distinguish potential from actively translated ORFs; in contrast, Ribo-seq and proteogenomics provide of ribosome occupancy and protein detection, respectively, while revealing the use of non-AUG start codons like CUG. Recent advances from to 2025 have expanded Ribo-seq applications for sORF , addressing gaps in traditional annotations. In humans, enhanced Ribo-seq datasets have annotated thousands of non-canonical ORFs, improving proteome predictions through integration with resources like GENCODE. Studies in viruses using massively parallel Ribo-seq have uncovered thousands of novel protein-coding sequences, highlighting viral sORFs that evade host detection. These developments underscore the incompleteness of prior annotations and pave the way for more accurate catalogs.

Biological Significance

Role in gene prediction and annotation

Open reading frames (ORFs) play a central role in gene prediction algorithms by serving as the primary signals for identifying potential protein-coding regions, especially in prokaryotic genomes where the longest ORF hypothesis assumes that the longest continuous without stop codons in a given is most likely to represent a functional . This approach leverages statistical properties of ORFs, such as length and , to distinguish coding sequences from , with tools training models on known long ORFs to score candidates. In genome annotation pipelines, predicted ORFs are integrated with homology-based searches using BLAST to align sequences against protein databases, confirming functional similarities, and supplemented by evidence from expressed sequence tags (ESTs) that indicate transcriptional activity. For instance, the Glimmer system incorporates ORF statistics—including length distributions, periodicities, and GC content correlations—to refine predictions, where higher GC content often correlates with increased coding potential in certain bacterial lineages. These metrics help validate ORFs by assessing their density (typically 85-90% genome coverage in prokaryotes) and compositional biases against expected random distributions. ORFs are particularly essential in metagenomics for annotating genes from uncultured or unknown organisms, where sequence fragments lack contextual genomic information, relying on ORF detection to infer functional elements without prior homology. However, in eukaryotes, ORF-based prediction faces significant challenges due to alternative splicing, which fragments coding sequences across introns, necessitating hybrid approaches that combine ab initio ORF signals with transcript evidence to accurately delineate exons. The impact of ORF-driven methods has been profound, enabling the annotation of over 95% of protein-coding genes in bacterial genomes, as demonstrated by high in tools like Glimmer on benchmark datasets. A historical milestone occurred with the sequencing of the genome, where 1,717 ORFs longer than 100 codons were identified as putative genes, marking the first complete of a free-living and paving the way for the genomic era.

Regulatory functions and recent discoveries

Upstream open reading frames (uORFs) serve as potent cis-regulatory elements that inhibit of downstream main sequences (CDSs) by stalling ribosomes, thereby fine-tuning under various conditions. In scenarios, such as oxidative or nutritional , uORFs translational variability of CDSs, reducing fluctuations in protein output by 10–25% as uORF translation efficiency rises, which helps maintain cellular during development and evolution. For instance, the bicoid uORF in buffers over threefold variation in efficiency compared to its CDS, and its knockout via CRISPR-Cas9 elevates CDS , disrupting embryogenesis. Similarly, small ORFs (sORFs) contribute to regulatory fine-tuning by modulating mRNA stability and rates, often acting as enhancers or repressors of nearby in a context-dependent manner. Recent proteogenomics studies have unveiled thousands of functional sORFs previously overlooked in genome annotations, encoding micropeptides under 100 amino acids that play roles in cancer and development. In humans, proteogenomics identified 365 novel sORFs across tissues, with 53% validated by ribosome profiling, including examples like the HNRNPUL2 uORF in lymphocytes that influences immune cell function during development. In cancer contexts, sORF-encoded micropeptides such as the HOXB-AS3 peptide from lncRNA HOXB-AS3 suppress colorectal cancer proliferation and invasion by downregulating PKM2, while the SHPRH-146aa peptide from circ-SHPRH inhibits glioblastoma growth via protein stabilization. These discoveries highlight sORFs' contributions to tumor progression and therapeutic resistance through pathways like MAPK and PI3K/AKT. ORFs within non-coding RNAs, particularly long non-coding RNAs (lncRNAs), have emerged as sources of regulatory micropeptides that were historically dismissed due to lacking long CDSs. reveals sORFs in approximately 30% of fly lncRNAs, translating peptides around 22 on average, which often regulate canonical proteins in before evolving broader functions. In human cancers, lncRNA-derived peptides like CRNDEP from CRNDE promote ovarian tumor progression, underscoring their oncogenic potential. In biology, alternative ORFs facilitate immune evasion by suppressing host defenses. For example, ORF10 downregulates innate immune genes like IFIT1 and OAS1, promoting mitophagy of MAVS to enhance persistence and reduce responses. Evolutionarily, alternative ORFs and sORFs drive gene birth from non-coding sequences, transitioning from cis-regulatory roles (e.g., on RNA stability) to peptides that modulate proteins, with conservation spanning 500 million years in metazoans. This evolutionary remodeling expands the , as seen in non-canonical ORFs that buffer translational stress across species. These advances from the , including 2025 proteogenomics efforts, reveal sORFs' abundance across all transcript types—coding, non-coding, and intergenic—challenging early annotations that missed thousands of functional sORFs. Such findings emphasize ORFs' dynamic regulatory landscape beyond traditional , with implications for and .

Computational Tools

Traditional ORF-finding software

Traditional ORF-finding software primarily relies on sequence-based algorithms to identify potential coding regions in sequences through straightforward scanning methods, such as six-frame . One of the earliest and most widely used tools is NCBI's ORFfinder, introduced in the late 1990s as part of the National Center for Biotechnology Information's suite of bioinformatics resources. This web-based program performs a six-frame of input DNA sequences and locates open reading frames (ORFs) defined by start and stop codons within a user-specified size range, outputting the positions, sequences, and corresponding protein translations. It accepts FASTA-formatted sequences up to 50 kb in the web interface and supports various genetic codes, with options for alternative start codons beyond ATG, such as and TTG in prokaryotes. A key parameter is the minimum ORF length, defaulting to 75 for bacterial analyses, which helps filter out spurious short sequences while capturing most prokaryotic genes. Another established tool is getorf from the (European Molecular Biology Open Software Suite), a command-line program designed for extracting ORFs from sequences in . Getorf identifies ORFs as regions of a specified minimum size (default 30 bases, though often set higher for practical use) bounded by start and stop codons, with support for circular genomes and reverse complements, making it suitable for prokaryotic sequence analysis such as genomes. Outputs include protein sequences by default, but can be formatted in GFF or for downstream annotation, and it integrates with other tools for broader workflow support. Both ORFfinder and getorf emphasize basic, rule-based detection without advanced statistical modeling, prioritizing simplicity for initial in microbial datasets. In the early 2000s, tools like FrameD emerged to address frame-specific challenges in prokaryotic , using a probabilistic model based on a to detect coding regions while accounting for frameshifts and ambiguous bases in GC-rich bacterial genomes. FrameD, published in 2003, excels in quality-checking noisy sequences and predicting genes in intronless eukaryotic contexts like expressed sequence tags (ESTs), but remains rooted in traditional scanning approaches. ORFfinder further enhances utility through integration with , allowing users to verify predicted proteins via SMART BLAST or standard BLASTP searches against non-redundant databases directly from the output. Despite their reliability for prokaryotic applications, traditional ORF finders face limitations in handling eukaryotic genome complexity, such as introns and , which require matured or processed sequences for accurate detection. These tools are also outdated for identifying short ORFs (sORFs) under 100-300 , as historical annotation pipelines often exclude them due to high false-positive rates and lack of statistical support in random sequences. For instance, ORFfinder's default filters prioritize longer ORFs, potentially overlooking regulatory sORFs prevalent in higher organisms.

Specialized tools for advanced analysis

Specialized tools for advanced ORF analysis leverage high-throughput sequencing data, such as (Ribo-seq), to identify and quantify translated open reading frames (ORFs) that traditional sequence-based methods often overlook, particularly alternative, short, and overlapping variants. These tools integrate footprint data to detect active , enabling the discovery of non-canonical ORFs with greater accuracy in complex transcriptomes. Among Ribo-seq-focused tools, RibORF (versions 0.1 and 1.0, developed in the and updated through the ) uses ribosome-protected fragment periodicity and coverage to predict translated ORFs, with version 1.0 improving sensitivity for short ORFs by incorporating frame-specific read counting. Similarly, RiboCode, introduced in , employs a straightforward to map ribosome footprints to ORFs, emphasizing three-nucleotide periodicity to distinguish translated regions from , and supports genome-wide identification in eukaryotes. Both tools process Ribo-seq data from advanced experimental approaches to output candidate ORFs with associated translation scores. For small ORFs (sORFs), sORF finder, released in 2010, scans nucleotide sequences in all six frames to identify sORFs (10–100 amino acids) with high coding potential, using a hexamer scoring system that compares frequencies of codon-like motifs against non-coding controls to prioritize likely translated candidates. In viral genomics, OrfViralScan 3.0, updated in 2025, facilitates ORF identification and tracking in viral genomes by searching for ATG-initiated ORFs, detecting overlaps and alternative starts, and visualizing evolutionary changes across sequences, making it suitable for monitoring viral diversity. Quantification-focused tools include ORFquant, an R package from 2020 that annotates and measures translation efficiency at individual ORFs using Ribo-seq coverage, accounting for alternative splicing and isoform-specific expression to estimate relative contributions of overlapping ORFs. RiboTISH, developed in 2017, predicts translation initiation sites (TIS) and ORFs via statistical tests on footprint distributions (negative binomial for TIS and Wilcoxon rank-sum for frame bias), also providing differential analysis across conditions and outperforming earlier methods in novel ORF detection from standard Ribo-seq data. These tools output metrics like translation initiation efficiency and ribosome density scores, aiding in functional prioritization. A 2024 comparative study evaluated the accuracy of these tools (RibORFv0.1, RibORFv1.0, RiboCode, ORFquant, and Ribo-TISH) on simulated and real Ribo-seq datasets, finding RibORFv1.0 and RiboCode excelled in sensitivity for unannotated ORFs (F1 scores >0.85), while ORFquant and Ribo-TISH provided superior quantification for overlapping features, though all showed trade-offs in false positive rates for short ORFs under 50 codons. Recent 2024–2025 advances incorporate proteogenomics integration, combining Ribo-seq with to validate novel sORFs, as in workflows that enhance ORF discovery by cross-referencing evidence with genomic predictions. Tools like ORFik, a comprehensive updated post-2021, extend this by processing multi-omics data (Ribo-seq, , CAGE) for ORF annotation, handling alternative starts and overlaps with high efficiency via C++-accelerated computations.

References

  1. [1]
    Small Open Reading Frames, How to Find Them and Determine ...
    Jan 28, 2022 · An open reading frame (ORF) is defined as a start codon followed by a downstream in-frame stop codon. ORFs occur randomly and abundantly across ...
  2. [2]
    Open Reading Frame - National Human Genome Research Institute
    An open reading frame is a portion of a DNA molecule that, when translated into amino acids, contains no stop codons.
  3. [3]
    Small Open Reading Frames: How Important Are They for Molecular ...
    Aug 25, 2020 · Small Open Reading Frames (small ORFs/sORFs/smORFs) are important sources of putative peptides previously dismissed as being non-functional or junk DNA.Missing: definition | Show results with:definition
  4. [4]
    The Definition of Open Reading Frame Revisited - PubMed
    The term open reading frame (ORF) is of central importance to gene finding. Surprisingly, at least three definitions are in use.
  5. [5]
    Finding ORF of a Given Sequence (Theory) - Amrita Virtual Lab
    The region of a nucleotide that starts from an initiation codon and ends with a stop codon is called an Open Reading Frame(ORF). Proteins are formed from ORF.
  6. [6]
    Selecting Open Reading Frames From DNA - PMC - NIH
    We describe a method to select DNA encoding functional open reading frames (ORFs) from noncoding DNA within the context of a specific vector.Missing: importance | Show results with:importance
  7. [7]
    Functions of viral small open reading frames | Carolina Arias Lab
    These putative open reading frames (ORFs) are characterized into two categories: independently localized small ORFS (smORFs), of if they precede a primary ORF, ...Missing: definition | Show results with:definition
  8. [8]
    Filtering "genic" open reading frames from genomic DNA samples ...
    Jun 15, 2011 · In this paper we discuss the preparation of diverse genic ORF libraries from randomly fragmented genomic DNA using ß-lactamase to filter out the open reading ...
  9. [9]
    Upstream open reading frame translation enhances immunogenic ...
    Aug 27, 2025 · This mitotic induction of uORF/uoORF enriches human leukocyte antigen (HLA) presentation of non-canonical peptides on the surface of cancer ...
  10. [10]
    Open Reading Frame - an overview | ScienceDirect Topics
    An Open Reading Frame (ORF) is a sequence of adjacent codons that starts with a start codon, followed by a series of codons for amino acids, and ends with a ...
  11. [11]
    ORFfinder Home - NCBI - NIH
    ORF finder searches for open reading frames (ORFs) in the DNA sequence you enter. The program returns the range of each ORF, along with its protein translation.
  12. [12]
    Overlapping genes in bacteriophage φX174 - Nature
    Nov 4, 1976 · Cite this article. Barrell, B., Air, G. & Hutchison, C. Overlapping genes in bacteriophage φX174. Nature 264, 34–41 (1976). https://doi.org ...Missing: phiX174 | Show results with:phiX174
  13. [13]
    Small open reading frames: a comparative genetics approach to ...
    May 1, 2023 · Open reading frames (ORFs) with fewer than 100 codons are generally not annotated in genomes, although bona fide genes of that size are ...
  14. [14]
    Module 5. Translation: The need for an Open Reading Frame
    Dec 31, 2022 · An Open Reading Frame is a string of consecutive codons that is uninterrupted by stop codons. Every mRNA contains one ORF that is translated by the ribosome ...<|control11|><|separator|>
  15. [15]
    How do Cells Read Genes? - Learn Genetics Utah
    The DNA sequence is interpreted in groups of three nucleotide bases, called codons. Each codon specifies a single amino acid in a protein.
  16. [16]
    Reference Sequence Track - Rat Genome Database
    Zooming in even further yields a display which shows both the DNA sequence and the putative amino acid sequences in all three reading frames for both directions ...
  17. [17]
    Translation: DNA to mRNA to Protein | Learn Science at Scitable
    ### Summary of Reading Frames, ORFs, Introns, and Splicing Differences
  18. [18]
    THE TRI-FRAME MODEL - PMC - NIH
    The tri-frame model gives mathematical expression to the transcription and translation processes, and considers all three reading frames.
  19. [19]
    Where to Start? Activity-dependent Alternative Translation Initiation ...
    Oct 17, 2024 · The monocistronic structure of the eukaryotic mRNA, whereby a single 'canonical' open-reading frame (ORF) can be identified has reinforced ...
  20. [20]
    What can Ribo-seq and proteomics tell us about the non-canonical ...
    Throughout this text the term “non-canonical open reading frame” is therefore defined as any open reading frame that is not an annotated CDS, an in-frame ...
  21. [21]
    Small Open Reading Frames, How to Find Them and Determine ...
    Jan 27, 2022 · An open reading frame (ORF) is defined as a start codon followed by a downstream in-frame stop codon. ORFs occur randomly and abundantly ...<|control11|><|separator|>
  22. [22]
    Shining a light on the dark proteome: Non‐canonical open reading ...
    Jun 23, 2023 · The origins of ncORF research date back to early observations of small open reading frames. In 1987, Marilyn Kozak explored “nonfunctional” ...
  23. [23]
    Cross-evaluation of E. coli's operon structures via a whole-cell ...
    Some E. coli operons are “canonical” in the sense that they contain a single promoter and a single transcription terminator, and thus always express all genes ...
  24. [24]
  25. [25]
    Short open reading frames (sORFs) and microproteins
    Mar 17, 2022 · A short open reading frame (sORFs) constitutes ≤ 300 bases, encoding a microprotein or sORF-encoded protein (SEP) which comprises ≤ 100 amino acids.Ribosome Profiling... · Proteogenomics Approach · Abbreviations
  26. [26]
    Upstream ORFs are prevalent translational repressors in vertebrates | The EMBO Journal
    ### Summary of uORFs from https://www.embopress.org/doi/full/10.15252/embj.201592759
  27. [27]
    Translational regulation by uORFs and start codon selection ...
    Jul 11, 2023 · In this review, we summarize the mechanisms by which uORFs can repress or stimulate mRNA translation, highlight uORF-mediated translational repression ...
  28. [28]
    Widespread stable noncanonical peptides identified by integrated ...
    Mar 2, 2024 · Our study presents a statistical framework to identify stable noncanonical peptides in the genome and provides a valuable resource for functional ...
  29. [29]
  30. [30]
  31. [31]
  32. [32]
    Extensive translation of small Open Reading Frames revealed ... - NIH
    In this study, we present a genome-wide assessment of smORF translation by ribosomal profiling of polysomal fractions in Drosophila.
  33. [33]
    Genome-wide analysis in vivo of translation with nucleotide ...
    Apr 10, 2009 · We present a ribosome-profiling strategy that is based on the deep sequencing of ribosome-protected mRNA fragments and enables genome-wide investigation of ...Missing: open reading frames seminal
  34. [34]
    [PDF] Empowering multiplexed ultra-throughout ribosome profiling with ...
    Oct 17, 2025 · Abstract. Ribosome profiling (RiboSeq) improved the understanding of mRNA translation, enabling the precise mapping of ribosome positioning ...
  35. [35]
    Proteogenomic analysis and global discovery of posttranslational ...
    Dec 15, 2014 · We describe an integrated workflow for proteogenomic analysis and global profiling of posttranslational modifications (PTMs) in prokaryotes.
  36. [36]
    Proteogenomics from a bioinformatics angle: A growing field
    Dec 15, 2015 · I. INTRODUCTION. Proteogenomics integrates two different research fields, mass spectrometry (MS) based proteomics and next-generation sequencing ...Ii. Proteogenomics Goals · B. Unravel Proteome... · A. Custom Database CreationMissing: seminal | Show results with:seminal<|separator|>
  37. [37]
    A Massive Proteogenomic Screen Identifies Thousands of Novel ...
    To provide a resource of high-confidence novel proteins from the dark proteome, we screened 50,000 mass spectrometry runs from over 900 projects to identify ...
  38. [38]
    Unraveling the influences of sequence and position on yeast uORF ...
    May 25, 2023 · Functional evaluations of nine S. cerevisiae uORFs showed six altered the expression in a luciferase reporter assay. More recently, genome-wide ...
  39. [39]
    Ribosome profiling reveals hidden world of small proteins
    The main advantage of Ribo-seq versus earlier methods, such as polysome profiling, is that it allows one to monitor ribosome positioning with single-codon ...Missing: six- | Show results with:six-
  40. [40]
    What Can Ribo-Seq, Immunopeptidomics, and Proteomics Tell Us ...
    Ribo-Seq does not directly detect proteins but rather provides evidence of ongoing nucleotide translation. By contrast, LC–MS/MS evidence for noncanonical ORFs ...
  41. [41]
    Ribo-seq ORFs - GENCODE
    Ribosome Profiling (Ribo-seq) has been used to detect thousands of non-canonical – ie unannotated – translated open reading frames (ORFs) in the human genome.Missing: discovery viruses 2022-2025 advances<|separator|>
  42. [42]
    Pan-viral ORFs discovery using Massively Parallel Ribosome Profiling
    Jun 12, 2025 · The development of ribosome profiling (also termed Ribo-seq) has transformed our ability to detect translated regions across genomes (12).Missing: sORF | Show results with:sORF
  43. [43]
    Improvements in human genome databases offer a promising future ...
    Sep 21, 2023 · Researchers expand the use of ribosome profiling, also known as Ribo-seq, to understand protein production in cells.Missing: sORF viruses
  44. [44]
    StartLink and StartLink+: Prediction of Gene Starts in Prokaryotic ...
    The task is to identify the start codon of a prokaryotic gene within its longest open-reading frame (LORF) embedded in a nucleotide sequence Q (query). The ...
  45. [45]
    [PDF] Microbial gene identification using interpolated Markov models
    This paper describes a new system, GLIMMER, for finding genes in microbial genomes. In a series of tests on Haemophilus influenzae, Helicobacter pylori and.Missing: original | Show results with:original
  46. [46]
    The NCBI Eukaryotic Genome Annotation Pipeline - NIH
    Apr 4, 2024 · The NCBI Eukaryotic Genome Annotation Pipeline provides content for various NCBI resources including Nucleotide, Protein, BLAST, Gene and the Genome Data ...
  47. [47]
    Identifying bacterial genes and endosymbiont DNA with Glimmer
    Results: The new methods dramatically reduce the rate of false-positive predictions, while maintaining Glimmer's 99% sensitivity rate at detecting genes in most ...
  48. [48]
    Predicting Statistical Properties of Open Reading Frames in ...
    Throughout this work, reading frame +1 is defined as the frame in which an annotated gene is located. An open reading frame is defined as the region between a ...Missing: literature | Show results with:literature
  49. [49]
    Gene prediction with Glimmer for metagenomic sequences ...
    A dynamic programming algorithm finds the set of ORFs with maximum score subject to the constraint that genes cannot overlap for more than a certain threshold, ...INTRODUCTION · MATERIALS AND METHODS · RESULTS · CONCLUSION
  50. [50]
    Integrating alternative splicing detection into gene prediction
    Feb 10, 2005 · Each additional or alternative gene structure prediction has to be supported by some biological evidence.Methods · The Gene-Finder Eugène · As Evidences Integration
  51. [51]
    Combining diverse evidence for gene recognition in completely ...
    In a test on Bacillus subtilis and Escherichia coli genomes, the program correctly identified 93.3% (resp. 96.3%) of experimentally annotated genes longer than ...
  52. [52]
    Whole-Genome Random Sequencing and Assembly of ... - Science
    The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism ...Missing: ORF | Show results with:ORF
  53. [53]
    Upstream open reading frames buffer translational variability during ...
    Jun 6, 2025 · This study reveals the important role of upstream open reading frames (uORFs) in limiting the translational variability of downstream coding sequences.
  54. [54]
    Proteogenomics-enabled discovery of novel small open reading ...
    Jul 26, 2025 · Small open reading frames (sORFs) are typically 100 codons or fewer in length. The mammalian genome contains a substantial number of sORFs ...
  55. [55]
    Noncoding RNA-encoded peptides in cancer: biological functions ...
    Feb 19, 2025 · The identification of ncRNA-encoded peptides in human cancers is increasing. These peptides regulate cancer progression through multiple molecular mechanisms.
  56. [56]
    Unveiling the functional implications of long noncoding RNA ...
    Unlike traditional coding RNAs, lncRNAs lack long, conserved open reading frames (ORFs), hence were classified as non-coding. However, ribosome profiling (Ribo- ...
  57. [57]
  58. [58]
    The ORF10 Strain-Restricted Orphan Gene of SARS-CoV-2 ...
    Oct 15, 2025 · Such ORF10 transcripts might aid the virus in evading an innate immune response for an extended period, resulting in increased viral persistence ...
  59. [59]
    Evolution of translational control and the emergence of genes and ...
    Sep 24, 2024 · Evolutionary innovations can be driven by changes in the rates of RNA translation and the emergence of new genes and small open reading frames (sORFs).
  60. [60]
    August 1997 - NCBI - NIH
    August 1997. PubMed Launched. Using Sequin. Structure Neighbors. ORF Finder. Electronic PCR. CGAP Revolutionizes Research. Frequently Asked Questions.Missing: history date
  61. [61]
    getorf - EMBOSS
    This program finds and outputs the sequences of open reading frames (ORFs) in one or more nucleotide sequences. An ORF may be defined as a region of a specified ...Missing: suite | Show results with:suite
  62. [62]
    getorf - EMBOSS - Bioinformatics
    getorf. Find and extract open reading frames (ORFs) (read the manual). Unshaded fields are optional and can safely be ignored. (hide optional fields).Missing: suite | Show results with:suite
  63. [63]
    FrameD: a flexible program for quality check and gene prediction in ...
    We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial ...Missing: tool | Show results with:tool
  64. [64]
    Viral Short ORFs and Their Possible Functions - PMC
    Traditionally, putative short open reading frames (sORFs) coding for less than 100 amino acids were disregarded due to computational and experimental ...
  65. [65]
    Comparison of software packages for detecting unannotated ...
    Jun 6, 2024 · Analyzing Ribo-seq coverage revealed that smORFs detected by more than one tool tend to have higher translation levels and higher fractions of ...
  66. [66]
    xryanglab/RiboCode: release version - GitHub
    RiboCode is a very simple but high-quality computational algorithm to identify genome-wide translated ORFs using ribosome-profiling data.
  67. [67]
    sORF finder: a program package to identify small open reading ...
    sORF finder is a program package used to identify sORFs (10–100 amino acids) with high-coding potential in all six frames of a given nucleotide sequence. sORF ...
  68. [68]
    Quantification of translation uncovers the functions of the alternative ...
    Jun 29, 2020 · Here, we describe ORFquant, a method to annotate and quantify translation at the level of single open reading frames (ORFs), using information from Ribo-seq ...
  69. [69]
    Genome-wide identification and differential analysis of translational ...
    Nov 23, 2017 · Ribo-TISH can also predict novel open reading frames (ORFs) from regular ribosome profiling (rRibo-seq) data and outperform several established ...
  70. [70]
    ORFik: a comprehensive R toolkit for the analysis of translation
    Jun 19, 2021 · ORFik streamlines the steps to process, analyze, and visualize the different steps of translation with a particular focus on initiation and elongation.