A fusion protein is a hybrid polypeptide engineered by genetically linking two or more distinct protein domains, typically from separate genes, to form a single chain with combined or enhanced functionalities.[1] These chimeric molecules can arise naturally through genetic rearrangements, such as the BCR-ABL fusion in chronic myelogenous leukemia, or be artificially constructed via recombinant DNA technology for research and therapeutic applications.[2] The design often incorporates flexible or rigid linkers between domains to optimize folding, stability, and activity, enabling outcomes like improved solubility, targeted binding, or prolonged half-life in vivo.[3]In biotechnology, fusion proteins have revolutionized protein production and analysis by serving as affinity-tagged constructs for purification and detection.[1] Common tags include the glutathione S-transferase (GST) for facile isolation via affinity chromatography and the hexahistidine (His-tag) for metal chelation-based purification, allowing high-yield expression in systems like E. coli or mammalian cells.[4] Fluorescent fusions, such as those with green fluorescent protein (GFP), enable real-time visualization of protein localization and dynamics in living cells, advancing fields like cell biology and drug discovery.[5]Therapeutically, fusion proteins represent a major class of biopharmaceuticals, with several approved drugs leveraging their multifunctionality to treat diverse conditions.[6] Fc-fusion proteins, which pair bioactive domains with the Fc region of immunoglobulin G, extend serum half-life through FcRn-mediated recycling and enhance effector functions like antibody-dependent cellular cytotoxicity.[7] Key examples include etanercept (Enbrel®), a tumor necrosis factor receptor-Fc fusion for rheumatoid arthritis and psoriasis; abatacept (Orencia®), a CTLA-4-Fc fusion for autoimmune disorders; and aflibercept (Eylea®), a VEGF trap-Fc fusion for neovascular age-related macular degeneration.[6] Emerging applications extend to immunotoxins for cancer targeting and enzyme fusions for biocatalysis, underscoring their versatility in precision medicine.[8]
Introduction
Definition
A fusion protein is a hybrid protein formed by the covalent linkage of two or more polypeptide chains encoded by separate genes, resulting in a single polypeptide with combined functionalities.[4] This structure allows the protein to exhibit properties derived from each component, such as enhanced stability, targeting capabilities, or novel enzymatic activities.[9]The structural basis of fusion proteins typically involves the in-frame joining of coding sequences at the DNA level for artificial constructs, ensuring seamless translation into a continuous polypeptide chain.[10] In natural cases, fusion proteins emerge from gene fusion events, where chromosomal rearrangements juxtapose distinct genes, leading to a chimeric transcript and protein.[11]Unlike non-covalent protein complexes, which consist of multiple independent polypeptides associated through weak intermolecular forces like hydrogen bonds or hydrophobic interactions, fusion proteins maintain their integrity through stable peptide bonds within a unified chain.[12] This covalent nature provides greater structural stability and prevents dissociation under physiological conditions.[4]Representative examples of artificial fusion proteins include green fluorescent protein (GFP) fused to target proteins as a tag for visualization without altering the core function significantly.[5]
Historical Development
The concept of fusion proteins first gained attention in the 1970s through protein sequencing and structural analyses that revealed natural occurrences, such as the fused variable and constant domains in immunoglobulin chains, as demonstrated in early X-ray crystallography studies of antibody fragments.[13] These observations highlighted how evolutionary processes had linked distinct protein domains into single polypeptides, providing a foundation for understanding modular protein architecture.[14]The advent of recombinant DNA technology in the 1970s and 1980s marked a pivotal shift toward artificial fusion proteins, beginning with Paul Berg's 1972 construction of the first recombinant DNA molecule by joining SV40 viral DNA to lambda phage DNA.[15] This breakthrough, coupled with the 1975 Asilomar Conference organized by Berg and others, which established safety guidelines for genetic engineering, enabled the cloning and expression of fused genes in host cells.[16] By the late 1970s, techniques like restriction enzyme ligation facilitated the creation of chimeric constructs, such as early beta-galactosidase fusions for protein expression, laying the groundwork for engineered biologics.[17]A major milestone came in 1994 with the development of enhanced green fluorescent protein (GFP) variants by Roger Tsien and collaborators, allowing GFP to be fused to target proteins for real-time visualization without disrupting function.[18] This innovation spurred widespread use of fusion tags in cell biology. In the 2000s, therapeutic applications advanced with the approval of etanercept in 1998, the first Fc-fusion protein, which combined tumor necrosis factor receptor with an immunoglobulin Fc domain to treat rheumatoid arthritis and demonstrated the potential for extending protein half-life and efficacy.[19]In the 2010s and 2020s, CRISPR-Cas9 genome editing, pioneered in 2012, enabled precise insertion of fusion constructs into genomes for stable expression, as seen in applications for engineering therapeutic proteins in mammalian cells.[20] Concurrently, site-specific incorporation of unnatural amino acids into fusion proteins advanced, with 2023 studies optimizing engineered Fc-fusions for enhanced conjugation and stability in drug development.[21]
Types
Artificial Fusion Proteins
Artificial fusion proteins are engineered constructs created in laboratory settings through recombinant DNA techniques to combine distinct protein domains or peptides for targeted functionalities. These proteins are designed to address challenges in protein expression, purification, and application, such as improving solubility, stability, or enabling specific detection methods. For instance, affinity tags like the polyhistidine (His-tag) are fused to proteins of interest to facilitate purification via immobilized metal affinity chromatography (IMAC), achieving over 80% purity in a single step without significantly disrupting the target's native structure.[22] Similarly, larger tags such as glutathione S-transferase (GST) and maltose-binding protein (MBP) enhance the solubility of recombinant proteins expressed in bacterial systems like Escherichia coli, often yielding 10-40 mg/L of soluble product by preventing aggregation.[22]Common categories of artificial fusion proteins include tag fusions and reporter fusions. Tag fusions primarily serve expression and purification roles; His-tags, typically 6-10 histidine residues, are the most prevalent due to their small size and versatility in detection and immobilization.[22] GST (26 kDa) and MBP (42.5 kDa) act as solubility enhancers, particularly for eukaryotic proteins in prokaryotic hosts, while also supporting pull-down assays.[23] Reporter fusions, on the other hand, integrate enzymes or fluorescent proteins to monitor biological processes; green fluorescent protein (GFP) fusions enable real-time visualization of protein localization in living cells via flow cytometry or microscopy.[24] Luciferase fusions, such as those with firefly luciferase, provide sensitive bioluminescent readouts in assays for protein-protein interactions or gene expression, often amplifying signals through complementation strategies.[25]The engineering of artificial fusion proteins relies on principles that ensure domain compatibility and functional integrity, with linker sequences playing a critical role in preventing misfolding or steric interference. Flexible linkers, such as glycine-serine repeats (e.g., (GGGGS)3), provide rotational freedom to allow independent folding of fused domains, improving expression yields up to 11-fold and bioactivity by 10-fold in cases like cytokine fusions.[3] Rigid linkers, like α-helical (EAAAK)3, maintain precise spacing to avoid interference, while success rates vary: improper linker choice can lead to reduced solubility or activity, but optimized designs enhance overall protein stability and functionality.[3] As of 2025, fusion tags are employed in virtually all recombinant protein productions commercially, underscoring their essential role in biotechnology workflows.[26]
Natural Fusion Proteins
Natural fusion proteins are multi-domain polypeptides that emerge endogenously through evolutionary genetic processes, such as gene duplication, exon shuffling, and chromosomal rearrangements, which juxtapose distinct protein domains into a single coding sequence.[27][28][29] These mechanisms allow for the creation of novel protein architectures without the need for de novo sequence invention, often resulting in proteins where individual domains retain their core functions while the fusion enables coordinated activity. For instance, gene duplication provides redundant copies that can diverge, exon shuffling recombines modular exons encoding functional domains, and chromosomal rearrangements can fuse adjacent genes, all contributing to the diversity of multi-domain architectures observed across species.[30]Such fusion proteins are prevalent in both prokaryotic and eukaryotic genomes, where multi-domain proteins—many arising from these fusion events—constitute approximately 40% of proteins in prokaryotes and 65% in eukaryotes.[31] In bacterial genomes, gene fusion and fission events account for 27-64% of the evolution of multi-domain proteins, highlighting their role as a major driver of proteomic complexity.[32] This prevalence underscores the evolutionary utility of fusions in expanding functional repertoires, particularly in adapting to environmental pressures or metabolic demands.Functionally, natural fusion proteins often confer advantages by integrating multiple activities into a single polypeptide, enhancing efficiency through physical proximity and reduced diffusion times between domains. A prominent example is seen in polyketide synthases (PKSs), large multifunctional enzymes in bacteria and fungi that biosynthesize complex natural products like antibiotics.[33] In modular PKSs, catalytic domains such as ketosynthases, acyl transferases, and reductases are fused in assembly-line fashion, allowing sequential processing of substrates without intermediate release, which streamlines biosynthesis and minimizes side reactions.[34] This domain organization improves overall catalytic throughput compared to separate enzymes, as evidenced by the production of diverse polyketides essential for microbial defense and symbiosis.Beyond metabolic enzymes, natural fusions are evident in signaling and regulatory proteins. Transcription factors like the Signal Transducer and Activator of Transcription (STAT) family proteins exemplify this, featuring fused domains for cytokine receptor binding, tyrosine phosphorylation, DNA binding, and transcriptional activation.[35] In STATs, the SH2 domain for signaling interacts intramolecularly with the DNA-binding domain, enabling rapid nuclear translocation and gene regulation upon activation, which coordinates immune and developmental responses with high fidelity. Similarly, viral polyproteins represent another class of natural fusions, where viruses like HIV encode long precursor chains of conjoined functional units—such as proteases, polymerases, and structural proteins—that are post-translationally cleaved by viral proteases into mature components.[36] This strategy allows viruses to compactly package their genome while regulating protein maturation temporally during the replication cycle, optimizing infectivity in host cells.
Production Methods
Recombinant DNA Technology
Recombinant DNA technology enables the production of artificial fusion proteins by genetically engineering host cells to express chimeric polypeptides composed of two or more protein domains. The core process begins with the cloning of target genes into expression vectors that incorporate fusion partners, ensuring in-frame ligation to maintain the reading frame and proper translation. This is typically achieved through restriction enzyme digestion of both the insert DNA (amplified via polymerase chain reaction, PCR) and the vector, followed by ligation using T4 DNA ligase, or by seamless methods like In-Fusion cloning that exploit homologous recombination between PCR-generated overlaps and linearized vectors.[37][38][39]Once constructed, the recombinant plasmids are introduced into host cells via transformation or transfection, and expression is induced under controlled conditions. Common expression systems include bacterial hosts like Escherichia coli, which offer rapid growth and high yields but may lack eukaryotic post-translational modifications; yeast such as Saccharomyces cerevisiae, providing glycosylation capabilities at moderate scale; mammalian cells like CHO or HEK293, ideal for complex folding and modifications but more costly; and insect cells such as Sf9 or High Five using baculovirus expression vector systems (BEVS), which support eukaryotic post-translational modifications including glycosylation similar to mammalian systems and typical yields of 10-500 mg/L.[40] To enhance solubility, particularly in E. coli, co-expression with molecular chaperones (e.g., GroEL/ES) is employed, mitigating inclusion body formation and improving functional yields. Promoter strength, such as the strong T7 or tac promoters in E. coli systems, further drives high-level transcription.[41][42][43]Following expression, fusion proteins are purified using affinity tags engineered into the construct, such as polyhistidine (His-tag) for immobilized metal affinity chromatography (IMAC) or glutathione S-transferase (GST) for glutathione-sepharose binding, allowing one-step isolation from cell lysates under native or denaturing conditions. Verification of fusion integrity involves sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) to assess molecular weight and purity, complemented by Western blotting with tag-specific antibodies to confirm the presence and connectivity of domains. Typical yields range from 1 to 100 mg/L of culture, significantly influenced by codon optimization to match host tRNA abundances—potentially increasing expression by up to 50%—and selection of inducible promoters for controlled accumulation.[44][45][46][47][48]Linker sequences between domains are often incorporated during cloning to ensure flexibility and prevent steric interference, as detailed in specialized design sections. Overall, these techniques have revolutionized fusion protein production, enabling scalable manufacturing for research and therapeutics while minimizing immunogenicity through precise genetic control.[49]
Chemical Conjugation Techniques
Chemical conjugation techniques enable the post-translational covalent linkage of proteins to form fusion constructs, allowing assembly without co-expression from a single genetic construct, though often requiring prior modification of the proteins via genetic engineering (e.g., for tags) or chemical means. These methods primarily involve the use of chemical cross-linkers that target reactive amino acid residues, such as the primary amines of lysine or the thiols of cysteine, to form stable bonds between proteins. Additionally, enzymatic approaches like sortase A-mediated ligation provide site-specific conjugation by recognizing a pentapeptide motif (LPXTG, often introduced genetically) on one protein and coupling it to an N-terminal oligoglycine on another, facilitating precise protein-protein fusions under mild aqueous conditions. Another chemical method is native chemical ligation (NCL), which joins a C-terminal thioester of one protein segment (often generated recombinantly via intein cleavage) to an N-terminal cysteine of another, forming a native peptide bond under mild conditions and enabling semi-synthesis of fusion proteins by combining recombinant and synthetic components.[50][51][52]Cross-linkers are classified by their structure and reactivity. Zero-length cross-linkers, such as 1-ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC), promote direct amide bond formation between carboxylic acids (e.g., aspartate or glutamate side chains) and amines without introducing spacer atoms, minimizing structural perturbations but risking non-specific reactions. Homobifunctional linkers, like glutaraldehyde for amines or bismaleimides for thiols, react with identical functional groups on both proteins, while heterobifunctional linkers, such as succinimidyl 4-(N-maleimidomethyl)cyclohexane-1-carboxylate (SMCC), sequentially target different residues—NHS ester for lysine and maleimide for cysteine—enabling controlled, stepwise assembly of fusions like antibody-drug conjugates.[50][53]These techniques offer the advantage of combining proteins from diverse sources, including those incompatible with co-expression in recombinant systems, and allow modular assembly for applications like bispecific antibodies. However, they often introduce heterogeneity due to multiple reactive sites per protein, leading to mixtures of mono-, di-, and multi-conjugated products, and typically achieve lower yields of 10-50% for traditional chemical methods, with purification challenges further reducing efficiency. Enzymatic variants like sortase improve specificity and yields up to 90% but require engineered tags, potentially altering protein function.[50][54][55]Since the early 2000s, bioorthogonal click chemistry has emerged as a precise modern variant for protein bioconjugation, exemplified by copper-catalyzed azide-alkyne cycloaddition (CuAAC), which forms stable 1,4-triazole linkages between azide- and alkyne-modified proteins under mild conditions with high efficiency (e.g., 70-90% yields). Copper-free alternatives, such as strain-promoted azide-alkyne cycloaddition (SPAAC) using cyclooctynes, avoid metal toxicity and enable in vivo applications, as demonstrated in the site-specific fusion of antibodies and peptides. These methods have revolutionized fusion protein design by providing rapid, selective coupling with minimal side reactions.[50][56][57]
Design Strategies
Tandem Fusion Constructs
Tandem fusion constructs represent a fundamental strategy in protein engineering where entire protein domains are linearly attached end-to-end, either at the N- or C-terminus of one another, to create multifunctional proteins while maintaining the independent folding and activity of each domain. This approach relies on the insertion of a flexible linker sequence between the domains to provide spatial separation and reduce interference, allowing the fused protein to exhibit additive or synergistic functions. Flexible linkers, often composed of glycine-serine repeats such as (Gly₄Ser)ₙ, are commonly employed to ensure conformational flexibility and preserve the native structures of the individual domains.[3]In design applications, tandem fusions are particularly valuable for generating bifunctional enzymes and reporter systems that enable simultaneous enzymatic activity and visualization. A prominent example is the luciferase-GFP fusion protein, where firefly or Renilla luciferase is tandemly fused to green fluorescent protein (GFP) via a short linker, facilitating real-time imaging of cellular processes by combining bioluminescent signaling with fluorescence for non-invasive monitoring in vivo. This construct has been widely used in reporter gene assays to track promoter activation, protein secretion, and viral infections, providing dual-readout capabilities that enhance sensitivity and specificity in biological studies.[58][59]Another key application involves therapeutic constructs, such as recombinant immunotoxins where a Fab fragment from a monoclonal antibody is tandemly fused to a toxin domain, directing cytotoxic activity to specific cell surface targets. For instance, the B3(Fab)-PE38 immunotoxin fuses the Fab portion of antibody B3, which targets Lewis Y carbohydrate on tumor cells, to a truncated Pseudomonasexotoxin A (PE38), enabling targeted killing of cancer cells with minimal off-target effects. These fusions leverage the antigen-binding specificity of the Fab for precise delivery of the toxin's enzymatic activity, which inhibits protein synthesis upon internalization.[60]Despite their utility, tandem fusion constructs face challenges related to domain interactions and linker design. Steric hindrance can arise if the fused domains are in close proximity or exhibit unwanted interactions, potentially disrupting folding, stability, or bioactivity of one or both components. The success of these constructs heavily depends on linker length, typically ranging from 5 to 20 amino acids, to balance flexibility and prevent overcrowding; shorter linkers may cause steric clashes, while excessively long ones can lead to reduced efficiency or proteolysis susceptibility.[3]
Domain Insertion Approaches
Domain insertion approaches involve embedding a functional protein domain within a loop or surface-exposed region of a host protein to generate a chimeric fusion protein, thereby integrating disparate biological activities into a single compact unit. Tolerogenic insertion sites are identified through structural modeling, leveraging data from the Protein Data Bank (PDB) to select locations that preserve the host domain's native fold and minimize steric clashes or functional interference. This method relies on computational prediction of permissible insertion points, where the inserted domain is connected via short, optimized loops rather than extended linkers, ensuring the overall structure remains stable and functional.[61][62]The primary benefits of domain insertion include the creation of a more rigid and compact architecture compared to end-to-end fusions, which reduces proteolytic susceptibility and enhances thermodynamic stability, making it particularly advantageous for therapeutic applications. By promoting intimate inter-domain interactions, this approach can induce allosteric regulation, where binding or activity at one domain modulates the other, enabling conditional functionality such as ligand-dependent activation. In therapeutic contexts, these fusions exhibit improved pharmacokinetics, with examples demonstrating extended serum half-lives through albumin-binding insertions that leverage FcRn recycling pathways.[63][64]Representative examples include the insertion of serum albumin-binding knob domains into the framework III loop of antibody variable heavy chains to produce bispecific formats for targeted delivery in inflammatory diseases, achieving nanomolar affinities for both antigens while extending half-life from hours to days. Unlike tandem fusion constructs, domain insertion fosters structural coupling that can enhance specificity and efficacy in such designs.[64]Computational tools play a crucial role in designing viable insertions, with the Rosetta software suite offering protocols like Loop-Directed Domain Insertion (LooDo) to sample linker conformations, evaluate interface energies, and predict successful domain pairings with high accuracy. These tools use fragment libraries and energy minimization to rank potential sites, achieving near-native structures in benchmarks across enzyme families.[65]
Linker Optimization
In fusion protein engineering, linkers are short peptide sequences that connect individual domains, ensuring proper spatial separation, independent folding, and minimal interference between fused moieties. These linkers are crucial for maintaining the bioactivity and stability of the construct, as inadequate design can lead to steric hindrance, aggregation, or reduced functionality.[3]Linkers are broadly classified into three categories based on their structural properties: flexible, rigid, and cleavable. Flexible linkers, often composed of glycine-serine repeats such as (GGGGS)_n where n typically ranges from 1 to 4, provide conformational freedom that allows domains to fold independently without imposing constraints. Rigid linkers, exemplified by alpha-helical sequences like (EAAAK)_n, enforce a fixed distance and orientation between domains, which is beneficial when precise positioning is required to avoid unwanted interactions. Cleavable linkers incorporate protease-sensitive motifs, such as the sequence VSQTSKLTR↓AETVFPDV recognized by furin, enabling controlled separation of domains in response to specific cellular cues for targeted activation.[3][66]Key design criteria for linkers include length and hydrophobicity to optimize performance. Linker lengths generally span 3 to 50 amino acids, with natural inter-domain linkers averaging 6.5 to 10 residues; shorter lengths (under 10 aa) risk domain interference, while longer ones (over 30 aa) may promote flexibility but increase susceptibility to proteolysis. Hydrophobicity is tuned to prevent aggregation, favoring hydrophilic compositions like glycine-rich sequences that enhance solubility and reduce non-specific interactions in aqueous environments.[3][67]Optimization of linkers involves empirical and computational methods to identify sequences that maximize fusion protein efficacy. Library screening approaches generate diverse linker variants—such as randomized glycine-serine or helical motifs—for high-throughput evaluation of expression, folding, and activity in host systems like E. coli or mammalian cells. More recently, artificial intelligence-based predictions, including models leveraging AlphaFold for simulating linker-induced structural effects on domain orientation and stability, have accelerated design by forecasting outcomes without extensive experimentation; for instance, diffusion-based tools like RFdiffusion enable de novo generation of linker sequences tailored to specific fusion architectures.[67][68]The choice and refinement of linkers significantly influence fusion protein production and performance. Poorly designed linkers, such as those causing domain misfolding or aggregation, can result in substantial yield reductions, with direct fusions without linkers often leading to 50% or greater losses in soluble expression compared to optimized constructs. In contrast, well-optimized linkers enhance folding efficiency and bioactivity; for example, a rigid helical linker increased the activity of a human serum albumin-interferon alpha-2b fusion by 115%, while rigid linker variants boosted expression yields by up to 1.44-fold in granulocyte colony-stimulating factor-transferrin fusions.[3]
Applications
Research and Biotechnology Tools
Fusion proteins serve as versatile tools in research and biotechnology, particularly through the integration of fluorescent tags that enable real-time visualization of cellular processes. The green fluorescent protein (GFP), originally isolated from the jellyfishAequorea victoria and cloned in 1992, has been widely fused to target proteins to facilitate live-cell imaging without the need for exogenous substrates.[69] Variants such as enhanced GFP (EGFP) and mCherry, a monomeric red fluorescent protein developed in 2004, allow multicolor labeling and reduce aggregation issues, improving resolution in dynamic studies of protein localization and trafficking.[70] These fusions have been instrumental in Förster resonance energy transfer (FRET) assays since the late 1990s, where spectral overlap between donor (e.g., GFP) and acceptor (e.g., mCherry) fluorophores measures protein-protein interactions at nanometer scales in living cells.[71]Affinity and solubility tags are commonly incorporated into fusion proteins to streamline protein production and purification in bacterial systems like Escherichia coli. The hexahistidine (His6) tag, a short sequence of six histidine residues, binds nickel or cobalt ions on immobilized metal affinity chromatography (IMAC) resins, enabling rapid and high-yield purification of recombinant proteins under native conditions.[72] For challenging proteins prone to insolubility, the small ubiquitin-like modifier (SUMO) tag enhances folding and solubility when fused at the N-terminus, often increasing expression levels by 5- to 10-fold in E. coli compared to untagged constructs, as demonstrated in studies on cytokines and enzymes.[73] SUMO fusions are particularly effective because the tag can be specifically cleaved by SUMO proteases post-purification, yielding the native protein without residual sequences.[74]Enzyme fusions expand the utility of fusion proteins in biosensor development for detecting environmental or biological analytes. A notable example is the beta-lactamase-GFP fusion, where the enzyme's hydrolysis of beta-lactam antibiotics alters the local pH, modulating GFP fluorescence to signal antibiotic presence with sensitivities in the micromolar range.[75] This construct has been employed in whole-cell biosensors for rapid, non-invasive detection of beta-lactam antibiotics like penicillin, leveraging the enzyme's substrate specificity and GFP's pH-sensitive emission.[76]Recent advances in split-protein systems have revolutionized proximity labeling for proteomics, allowing precise mapping of protein interactomes in vivo. From 2023 onward, split-BioID and split-TurboID variants—where biotinligase enzymes are divided into non-functional halves that reassemble upon bait-prey proximity—have enabled conditional labeling of endogenous proteins with biotin, followed by mass spectrometry identification of neighbors within 10 nm.[77] These systems, refined through 2023-2025 studies, minimize background labeling compared to full-length enzymes, enhancing resolution in complex cellular environments like organelles and synapses.[78]
Therapeutic and Pharmaceutical Uses
Fusion proteins have revolutionized therapeutic interventions by combining functional domains to enhance efficacy, stability, and targeting in pharmaceutical applications. One prominent class is Fc-fusion proteins, which link bioactive molecules to the Fc region of immunoglobulins to extend serum half-life through FcRn-mediated recycling and improve pharmacokinetics. For instance, etanercept, approved by the FDA in 1998, is a dimeric fusion of the extracellular domain of the human TNF receptor with the Fc portion of human IgG1, acting as a TNF inhibitor for treating rheumatoid arthritis, psoriatic arthritis, and other autoimmune conditions by neutralizing soluble TNF-alpha.[79][80]Bispecific antibodies, often constructed as fusion proteins of two single-chain variable fragments (scFvs), enable dual targeting to redirect immune cells against diseased tissues, particularly in oncology. Blinatumomab, approved by the FDA in 2014, exemplifies this approach as a bispecific T-cell engager fusing anti-CD19 and anti-CD3 scFvs, facilitating T-cell-mediated cytotoxicity against CD19-positive B-cell precursor acute lymphoblastic leukemia cells.[81][82] By 2025, over 20 fusion protein therapeutics, including Fc-fusions and bispecifics, have received FDA approval, spanning indications from autoimmune diseases to cancers and hematologic disorders.[83][84]These therapeutics have driven substantial market growth, with the global fusion proteins market valued at approximately $36.3 billion in 2025, reflecting their commercial success and expanding clinical utility.[85] In the 2020s, emerging strategies integrate fusion proteins into chimeric antigen receptor (CAR)-T cell therapies, such as engineering CAR-T cells to secrete bifunctional fusions of cytokines like IL-12 with PD-L1-targeting domains, enhancing antitumor responses in solid tumors while minimizing systemic toxicity.[86] These innovations underscore the potential of fusion proteins to advance immunotherapy by localizing cytokine activity to tumor microenvironments.
Natural Occurrence and Evolution
Examples in Biological Systems
In prokaryotic organisms, fusion proteins are exemplified by type I fatty acid synthases (FAS I) found in certain bacteria, such as those in the Corynebacterineae suborder including Mycobacterium tuberculosis. These multidomain enzymes integrate multiple catalytic units—such as β-ketoacyl synthase (KS), malonyl/acetyl transferase (MAT), dehydratase (DH), enoyl reductase (ER), β-ketoacyl reductase (KR), and acyl carrier protein (ACP)—into a single polypeptide chain, enabling iterative fatty acid synthesis without the need for diffusible intermediates characteristic of the dissociated type II systems in most bacteria.[87] This fused architecture facilitates efficient de novo production of long-chain acyl-CoAs, particularly C16–C18 and C24–C26 species, which are essential for mycobacterial cell wall mycolic acid biosynthesis.[88]In eukaryotic systems, the human Notch receptor serves as a prominent example of a natural fusion protein critical for cell signaling. Notch-1 is a single-pass transmembrane protein comprising a large extracellular domain with 36 epidermal growth factor-like (EGF-like) repeats, including calcium-binding sites that mediate ligand interactions, followed by a negative regulatory region, a transmembrane helix, and an intracellular domain with RAM and ankyrin repeats for transcriptional regulation.[89] Upon ligand binding from neighboring cells, proteolytic cleavage releases the intracellular domain, which translocates to the nucleus to activate target genes involved in cell fate determination, such as during embryonic development and tissue homeostasis.[90]Viral fusion proteins are illustrated by the HIV-1 envelope glycoprotein gp160, a polyprotein precursor expressed on the surface of infected cells and virions. Gp160 consists of fused gp120 and gp41 subunits linked by a furin cleavage site; post-translational cleavage by cellular proteases like furin separates them into a non-covalently associated complex, where gp120 binds CD4 and coreceptors to initiate entry, and gp41 mediates membrane fusion via its heptad repeat regions and fusion peptide.[91] This precursor strategy ensures coordinated assembly and activation, enabling the virus to fuse with host cell membranes during infection.[92]Pathological fusion proteins arise naturally through genetic errors, as seen in the BCR-ABL oncoprotein resulting from the t(9;22) chromosomal translocation, known as the Philadelphia chromosome, in chronic myelogenous leukemia (CML). Discovered in the 1980s, this fusion joins the breakpoint cluster region (BCR) gene on chromosome 22 with the Abelson murine leukemia viral oncogene homolog 1 (ABL1) on chromosome 9, producing a chimeric tyrosine kinase that constitutively activates downstream signaling pathways like RAS/MAPK and PI3K, driving uncontrolled proliferation of hematopoietic cells. The p210 BCR-ABL isoform predominates in CML, present in over 95% of cases, and its aberrant activity disrupts normal myeloid differentiation.[93]
Evolutionary Mechanisms
Fusion proteins arise through several evolutionary mechanisms that juxtapose coding sequences from distinct genes, leading to chimeric polypeptides with novel functionalities. In prokaryotes, gene fusions often occur via unequal crossing-over during recombination, which can duplicate and juxtapose adjacent genes, or through retrotransposition, where a reverse-transcribed mRNA integrates upstream or downstream of an existing gene, creating a fused transcript. Horizontal gene transfer further facilitates fusions by introducing pre-fused genes from distantly related organisms, as evidenced by the dispersed phylogenetic distribution of certain multidomain architectures across bacterial lineages. In eukaryotes, exon shuffling represents a prominent mechanism, enabled by the presence of introns and transposable elements, allowing exons encoding protein domains to be rearranged or inserted into unrelated genes, thereby generating modular proteins.These fusion events are driven by selective pressures that favor increased functional efficiency and regulatory simplicity. By linking genes involved in sequential steps of a pathway, fusions enable coordinated expression under shared promoters, reducing the need for complex transcriptional control and minimizing diffusional losses of intermediates. In metabolic pathways, such as those for amino acid biosynthesis, fusions promote substrate channeling, where the product of one enzymatic domain is directly passed to the adjacent domain, enhancing reaction rates and cellular fitness under nutrient-limited conditions. Convergent evolution of similar fusions across independent lineages underscores the adaptive value of multifunctionality in diverse environments.Phylogenetic analyses using comparative genomics have uncovered ancient fusion events predating the divergence of major domains of life. For instance, ribosomal proteins in the last universal common ancestor (LUCA) exhibit fused domains, such as the tandem SH3 and OB folds in universal protein uL2, indicating that domain fusion contributed to core translational machinery early in cellular evolution. These conserved fusions highlight vertical inheritance as a key mode of propagation, with evidence from sequence alignments across archaea, bacteria, and eukaryotes showing minimal fission events post-LUCA.Recent phylogenomic studies in the 2020s, leveraging tools like GriffinDetector for detecting chimeric genes, have traced fusion histories across eukaryotic lineages, revealing that such events occur at rates of approximately 16 fixed fusions per species per million years in plants. These analyses indicate that ancient fusions account for a substantial fraction—estimated at 10-20%—of the eukaryotic proteome, particularly in signaling and metabolic proteins, underscoring their role in proteome diversification without relying on de novo domain invention.