Drug design

Drug design, also referred to as rational drug design, is a pharmaceutical science focused on the invention and optimization of new medications by leveraging knowledge of biological targets, such as proteins or enzymes, and their interactions with drug molecules to achieve therapeutic efficacy while minimizing toxicity and side effects.^[1] This process integrates computational modeling, structural biology, and chemical synthesis to create compounds that selectively bind to disease-related targets, addressing unmet medical needs through targeted interventions rather than broad-spectrum treatments.^[2] The field has evolved significantly since its foundational concepts in the late 19th century, beginning with Emil Fischer's "lock-and-key" model of drug-receptor interactions in the 1890s, which posited that drugs fit specific biological sites like keys into locks.^[1] Key milestones include the development of quantitative structure-activity relationship (QSAR) analysis by Corwin Hansch in 1964, enabling predictive modeling of molecular properties, and the establishment of the Protein Data Bank (PDB) in the 1970s, which now hosts over 220,000 three-dimensional protein structures essential for modern design efforts.^[1]^[3] These advancements shifted drug discovery from empirical trial-and-error methods, rooted in natural product screening, to systematic, knowledge-driven approaches that have accelerated the development of blockbuster drugs like selective serotonin reuptake inhibitors (SSRIs) for depression.^[1] At its core, drug design encompasses two primary strategies: structure-based drug design (SBDD), which utilizes the three-dimensional structure of a target protein—often determined by X-ray crystallography or cryo-electron microscopy—to model and refine ligand binding, and ligand-based drug design (LBDD), which relies on the known structures and activities of existing ligands to infer pharmacophores and optimize new candidates when target structures are unavailable.^[1]^[2] Tools such as molecular docking simulations, virtual screening of vast chemical libraries, and artificial intelligence platforms like AlphaFold for protein structure prediction have become integral, reducing the time and cost associated with lead identification and optimization.^[1] For instance, in silico methods have dramatically lowered pharmacokinetic failure rates from 39% to as low as 1% in some pipelines by predicting absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties early in development.^[1] The overall drug design process is embedded within a broader discovery and development pipeline that typically spans 12–15 years and costs approximately $1–2.8 billion per approved drug, reflecting high attrition rates where only 1 in 5,000–10,000 compounds progresses to market.^[4]^[2] It begins with target identification and validation through genomic and proteomic studies, followed by hit discovery via high-throughput screening or computational methods, lead optimization to enhance potency and selectivity, and preclinical testing for safety and efficacy.^[4] Regulatory milestones, such as submitting an Investigational New Drug (IND) application to agencies like the U.S. Food and Drug Administration (FDA), ensure that preclinical data on manufacturing, stability, and pharmacology support safe entry into human clinical trials (Phases I–III).^[5] Post-approval Phase IV monitoring further evaluates long-term effects, underscoring the emphasis on both innovation and rigorous validation in producing safe, effective therapeutics.^[4] Emerging trends, driven by artificial intelligence and machine learning, promise to revolutionize drug design by enabling rapid, patient-tailored therapies—potentially generating viable candidates in hours rather than years—as demonstrated by AI-assisted discoveries during the COVID-19 pandemic.^[1] Despite challenges like off-target effects and the complexity of biological systems, these innovations, combined with biologics such as monoclonal antibodies alongside traditional small molecules, continue to expand the druggable genome and address previously intractable diseases.^[1]^[4]

Fundamentals

Definition and Principles

Drug design is the inventive process of finding new medications based on the knowledge of a biological target, involving the identification, synthesis, and optimization of small molecules or biologics to interact specifically with these targets for therapeutic effect.^[6]^[1] This process designs candidate compounds that are complementary in shape and charge to the target, enabling effective binding while aiming to modulate disease-related biological pathways.^[6] Central to drug design are principles such as selectivity, which focuses on targeting specific biological entities to reduce side effects; potency, the strength of a drug's effect at a given concentration; and efficacy, the maximum therapeutic response a drug can produce.^[6]^[7] A key challenge is balancing high binding affinity— the strength of the drug-target interaction—with minimization of off-target effects, which are unintended bindings that can lead to toxicity or adverse reactions.^[6] These principles guide the development of agents that not only achieve desired outcomes but also maintain safety profiles suitable for clinical use.^[8] The drug discovery pipeline provides a structured framework for this process, comprising high-level stages including target identification to select relevant biological entities; hit finding through screening for initial active compounds; lead optimization to enhance properties like affinity and selectivity; and preclinical and clinical testing to assess safety, efficacy, and pharmacokinetics in models and humans.^[6]^[8]^[9] Essential metrics underpinning these efforts include bioavailability, the fraction of an administered dose that reaches systemic circulation to exert its effect; half-life, the duration over which drug levels halve in the body, influencing dosing frequency; and the therapeutic index, the ratio of a drug's toxic dose to its effective dose, serving as a measure of its safety margin.^[6]^[8] These concepts ensure that designed drugs are not only effective but also practical for therapeutic application.^[7]

Drug Targets

Drug targets are the biological molecules or pathways that therapeutic agents are designed to interact with in order to elicit a desired pharmacological response. The majority of these targets—over 95%—are proteins, which mediate about 93% of known drug-target interactions.^[10] Protein targets primarily fall into classes such as enzymes, which catalyze biochemical reactions; receptors, which transmit signals across cell membranes; and ion channels, which regulate ion flow to influence cellular excitability.^[11] Less common targets include nucleic acids like DNA and RNA, which can be modulated to interfere with gene expression or replication, and cellular pathways, such as signaling cascades, where drugs indirectly alter flux through multi-step processes.^[12] This focus on proteins reflects their central role in disease pathology, though non-protein targets have gained attention for addressing previously undruggable mechanisms. The identification of drug targets has evolved significantly, particularly since the 1990s, when drug design shifted from empirical, phenotype-based screening to a target-driven paradigm fueled by advances in molecular biology and the Human Genome Project.^[13] Modern methods leverage genomics to map genetic variations associated with diseases; for instance, genome-wide association studies (GWAS) scan populations to link single nucleotide polymorphisms (SNPs) with traits, prioritizing genes like IL6R for coronary heart disease or PCSK9 for hypercholesterolemia.^[14]^[15] Proteomics complements this by profiling protein expression and interactions, using techniques like mass spectrometry to identify differentially abundant proteins in diseased states, such as HER2 overexpression in breast cancer.^[15] Disease association studies further integrate these omics data to nominate candidates, ensuring targets align with therapeutic relevance. Once identified, targets undergo rigorous validation to confirm their causal role in disease and suitability for modulation. Techniques include knockout models, where CRISPR-Cas9 permanently disables genes to assess phenotypic consequences, as in evaluating essentiality for cancer cell survival.^[16] RNA interference via siRNA transiently silences gene expression, allowing observation of effects like reduced tumor growth upon knockdown of oncogenes such as MELK.^[16] Functional assays, ranging from in vitro enzymatic readouts to in vivo disease models, quantify how target perturbation alters biology, ensuring the intervention yields a beneficial outcome without redundancy.^[16] These orthogonal approaches minimize false positives and establish causality. Druggability assessment evaluates a target's potential for safe, effective modulation by small molecules or biologics. Key criteria encompass the presence of suitable binding pockets—hydrophobic cavities capable of accommodating ligands with high affinity, as analyzed in over 22,000 protein-ligand complexes.^[17] Differential expression levels between diseased and healthy tissues are examined to confirm accessibility, with tools tracking variations across thousands of cell types to prioritize targets like those upregulated in tumors.^[17] Finally, the potential for selective modulation is gauged to avoid toxicity, incorporating predictions of off-target effects and microbiota interactions that could exacerbate adverse outcomes.^[17] Targets meeting these thresholds, such as well-defined kinase pockets, proceed to lead optimization, balancing efficacy with a favorable safety profile.

History

Early Developments

The origins of drug design trace back to ancient and medieval pharmacology, where empirical observations guided the use of natural products for therapeutic purposes. Civilizations such as the Sumerians, Egyptians, Greeks, and Romans documented the medicinal properties of plants, minerals, and animal-derived substances through trial-and-error experimentation, forming the basis of early pharmacopeias. For instance, opium, derived from the Papaver somniferum poppy, was used for pain relief and sedation as early as 3400 BCE in Mesopotamia, with its active alkaloid morphine isolated in 1804 by Friedrich Sertürner, marking the first purification of a plant-derived analgesic. Similarly, willow bark (Salix spp.) was employed by ancient cultures for fever and pain reduction, leading to the isolation of salicin in 1828 by Johann Andreas Buchner, which served as a precursor to salicylic acid and later aspirin. These practices relied heavily on herbal remedies, with medieval European and Islamic scholars compiling texts like the works of Avicenna that preserved and expanded knowledge of natural product applications.^[18] In the 19th and early 20th centuries, drug development began transitioning from purely empirical methods to include serendipitous discoveries and initial chemical synthesis, though still limited by incomplete understanding of disease mechanisms. A foundational theoretical advance came in 1894 with Emil Fischer's "lock-and-key" model, which proposed that enzymes and substrates interact specifically, like a key fitting a lock, laying the groundwork for understanding drug-receptor binding.^[1] Chloral hydrate, synthesized in 1832 by Justus von Liebig through chlorination of ethanol, became the first synthetic sedative, introduced clinically in 1869 for hypnosis and anesthesia. A landmark serendipitous find occurred in 1928 when Alexander Fleming observed that a Penicillium mold contaminant inhibited bacterial growth in a petri dish, leading to the identification of penicillin as an antibacterial agent. These advances highlighted the potential of both natural extracts and laboratory synthesis, yet progress remained haphazard, often dependent on accidental observations rather than targeted design.^[19]^[20] The emergence of pharmaceutical chemistry in the early 1900s introduced more systematic approaches, exemplified by Paul Ehrlich's "magic bullet" concept, which envisioned selective agents that target pathogens without harming the host. This idea culminated in the development of Salvarsan (arsphenamine) in 1910 by Ehrlich and Sahachiro Hata, the first effective chemical treatment for syphilis through targeted arsenic-based therapy after testing over 600 compounds. Key milestones followed, including the isolation of insulin in 1921 by Frederick Banting and Charles Best, which revolutionized diabetes management by extracting the hormone from canine pancreases. The 1930s saw the advent of sulfa drugs, with Gerhard Domagk discovering Prontosil's antibacterial effects in 1932, the first synthetic agent to combat streptococcal infections in mice and humans. World War II accelerated antibiotic development, notably with Selman Waksman's isolation of streptomycin from Streptomyces griseus soil bacteria in 1943, providing the first effective treatment for tuberculosis.^[21]^[22]^[23]^[24] Despite these breakthroughs, early drug development was constrained by heavy reliance on trial-and-error screening of natural sources and crude synthetics, lacking molecular insights into drug-target interactions or pharmacokinetics, which often resulted in inconsistent efficacy and unforeseen toxicities. This empirical era laid the groundwork for modern rational design but underscored the inefficiencies of non-targeted approaches, with many remedies failing due to poor standardization and limited mechanistic knowledge.^[18]

Modern Evolution

The post-World War II period marked a transformative boom in biochemistry, fueled by wartime advancements in instrumentation and funding, which accelerated the elucidation of biomolecular structures and laid the groundwork for rational drug design. The 1953 discovery of DNA's double-helix structure by Watson and Crick provided critical insights into genetic mechanisms, indirectly supporting the evolution of receptor theory by emphasizing molecular interactions at the atomic level.^[1] Further progress in the 1960s included Corwin Hansch's development of quantitative structure-activity relationship (QSAR) analysis in 1964, which integrated physical organic chemistry and statistics to predict biological activity from molecular structure, enabling more rational lead optimization. The establishment of the Protein Data Bank (PDB) in 1971 centralized three-dimensional protein structures, growing to over 180,000 entries by the 2020s and becoming indispensable for structure-based drug design.^[1] This era enabled the first targeted rational designs, exemplified by cimetidine, developed in the 1970s at Smith Kline & French Laboratories through systematic modification of histamine analogs to block H2-receptors and treat peptic ulcers.^[25] Cimetidine, approved in 1976 as Tagamet, represented a milestone in structure-activity relationship studies, reducing the need for ulcer surgeries by inhibiting gastric acid secretion without the toxicity of earlier candidates like metiamide.^[25] In the 1980s and 1990s, drug design shifted toward industrialized processes with the introduction of high-throughput screening (HTS), originating at companies like Pfizer in 1986 by adapting natural products assays to synthetic libraries in 96-well plates, scaling from hundreds to thousands of compounds screened weekly.^[26] Recombinant DNA technology, advanced through systems like E. coli expression vectors and baculovirus in insect cells, facilitated large-scale production of drug targets for biochemical assays, enabling the validation of novel proteins as therapeutic candidates.^[27] X-ray crystallography progressed rapidly, culminating in the 2000 determination of the first G-protein-coupled receptor (GPCR) structure—bovine rhodopsin— which provided a template for modeling the superfamily, comprising ~30% of marketed drugs, and spurred structure-based ligand optimization despite initial challenges in membrane protein crystallization.^[28] The 2000s witnessed the influence of genomics on target identification, with the Human Genome Project's completion in 2003 cataloging ~20,000 protein-coding genes and enabling genome-wide association studies (GWAS) to link variants to diseases, increasing the success rate of genetically validated targets by up to twofold in clinical trials.^[29] This era also saw the ascent of biologics, highlighted by the 1997 FDA approval of rituximab, the first monoclonal antibody for cancer (non-Hodgkin lymphoma), which targeted CD20 on B-cells and improved survival rates, paving the way for over a dozen similar therapies like trastuzumab by the decade's end.^[30] From the 2010s to the 2020s, cryogenic electron microscopy (cryo-EM) revolutionized structural biology, earning the 2017 Nobel Prize in Chemistry for its developers and enabling over 30,000 atomic models as of 2025, particularly for challenging targets like ion channels and complexes previously intractable by X-ray methods.^[31] Artificial intelligence and machine learning integrated deeply, with DeepMind's AlphaFold, announced in 2020 and detailed in 2021, achieving near-atomic accuracy in protein structure prediction (median 0.96 Å RMSD), accelerating drug discovery by providing structures for ~200 million proteins without experimental effort.^[32] The mRNA vaccine platform exemplified rapid design principles during the COVID-19 pandemic, with Pfizer-BioNTech and Moderna vaccines approved in 2020 based on spike protein sequences, leveraging lipid nanoparticle delivery for unprecedented speed from sequence to deployment.^[33] Post-2020 AI advancements include Insilico Medicine's INS018_055, a generative AI-designed TNIK inhibitor for idiopathic pulmonary fibrosis, which entered Phase II trials in 2023 after target identification in 18 months and reported positive Phase IIa results in November 2024, demonstrating improvements in lung function and end-to-end AI efficiency in small-molecule development.^[34]^[35]

Discovery Strategies

Phenotypic Approaches

Phenotypic drug discovery (PDD) involves screening compound libraries against biological systems to identify molecules that induce a desired phenotypic change, such as inhibition of cell proliferation or restoration of normal function in disease models, without prior knowledge of the molecular target.^[36] This approach is particularly suited for complex diseases like cancer, where multiple pathways may contribute to pathology, as it focuses on holistic therapeutic outcomes rather than isolated targets.^[37] By observing emergent biological responses, PDD can uncover drugs with polypharmacological effects that might be missed in more reductionist strategies.^[38] Key techniques in PDD include cell-based assays, which measure phenotypic endpoints like cell growth inhibition in cancer models or protein aggregation in neurodegeneration assays, often using high-content imaging for multiparametric readouts.^[36] Organism-level screens employ model organisms such as Caenorhabditis elegans (worms) or Drosophila melanogaster (flies) for neurodegeneration studies, where compounds are tested for effects on motility or lifespan, and zebrafish (Danio rerio) models for whole-animal phenotypes in developmental or cardiovascular disorders.^[39] These systems enable in vivo validation early in discovery, bridging cellular and organismal biology.^[39] Advantages of PDD include its ability to capture beneficial off-target interactions and polypharmacology, which can enhance efficacy in multifactorial diseases, and it has contributed to approximately 30% of small-molecule drugs approved by the US Food and Drug Administration between 1999 and 2008.^[40] Representative successes include ivacaftor for cystic fibrosis, identified through airway epithelial cell assays showing corrected chloride transport.^[36] The typical workflow begins with assay development to establish a robust disease-relevant phenotype, followed by screening of diverse compound libraries to identify hits, which are then validated through dose-response curves and secondary assays.^[36] Hit-to-lead progression involves target deconvolution using techniques like CRISPR-based genetic screens or chemoproteomics to elucidate mechanisms post-identification.^[38] In contrast to target-based approaches, PDD emphasizes functional biology from the outset.^[36] Historically, PDD dominated drug discovery before the 1990s, yielding classics like aspirin through observational screening in animal models, but declined with the rise of genomics-driven target identification.^[37] It has resurged since the 2010s, fueled by advances in imaging, genomics, and model systems, proving effective for "undruggable" targets in areas like oncology and rare diseases.^[36]

Target-Based Approaches

Target-based approaches in drug design center on the rational selection of a molecular target validated for its role in disease pathogenesis, followed by the development of small-molecule modulators that interact with this target to elicit a therapeutic response. This strategy emphasizes a hypothesis-driven process, where the target—typically a protein such as an enzyme, receptor, or signaling molecule—is chosen based on evidence from genomics, proteomics, or pathway analysis demonstrating its causal involvement in the pathology. Validation ensures the target's druggability, meaning it possesses suitable binding pockets for modulation without undue off-target effects. Once a target is selected and validated, drug candidates are designed to function as agonists, which activate the target to enhance its activity; antagonists, which block endogenous ligands to inhibit signaling; or inhibitors, which directly impair enzymatic or catalytic functions. These modulators can engage the target through orthosteric binding, where they occupy the primary active site to compete with natural substrates or ligands, or allosteric binding, where they interact with secondary sites to induce conformational changes that indirectly regulate activity, often providing enhanced selectivity and reduced toxicity compared to orthosteric agents. For instance, allosteric modulators can fine-tune receptor responses without fully ablating function, which is particularly useful for G protein-coupled receptors (GPCRs) or ion channels.^[41] The design process incorporates virtual screening to computationally evaluate vast libraries of compounds for potential binding to the target, identifying initial hits for experimental follow-up. These hits undergo rational synthesis, guided by structure-activity relationship (SAR) studies that map how chemical modifications influence potency, selectivity, and pharmacokinetics. Iterative testing in biophysical assays, cellular models, and early animal studies refines the candidates, optimizing for efficacy while minimizing adverse effects through cycles of synthesis, evaluation, and redesign. This iterative framework ensures progressive improvement in lead compounds. Key to success in target-based approaches is the concept of reverse translation, which bridges molecular-level target modulation back to phenotypic outcomes by confirming that altering the target produces the desired disease-relevant effects in model systems. Modern validation increasingly employs CRISPR-Cas9 screens to systematically knock out or edit target genes, assessing impacts on cellular phenotypes or drug sensitivity in high-throughput formats, thereby strengthening causal links between target engagement and therapeutic benefit. Druggability is quantitatively gauged using ligand efficiency (LE), a metric that normalizes binding affinity by molecular size to prioritize compact, efficient binders during optimization; it is defined as
\text{LE} = \frac{-\Delta G}{N}
where \Delta G is the Gibbs free energy of binding and N is the number of heavy atoms, with a practical approximation of
\text{LE} \approx \frac{1.37 \times \mathrm{p}K_i}{N}
for Ki-based affinities at standard conditions, guiding the selection of leads with high efficiency per atom.^[42] These approaches have driven a majority of small-molecule drugs through the discovery pipeline to regulatory approval, reflecting their prevalence in modern pharma portfolios despite challenges like target validation complexity.^[40] A seminal example is imatinib (Gleevec), approved in 2001, which exemplifies targeted therapy as a selective inhibitor of the BCR-ABL tyrosine kinase in chronic myeloid leukemia, transforming treatment outcomes by precisely blocking the oncogenic driver. While effective, target-based methods complement phenotypic screening, which prioritizes observable effects over predefined targets for discovering novel mechanisms.

Computational Methods

Ligand-Based Design

Ligand-based drug design leverages datasets of known ligands and their biological activities to predict and generate new drug candidates without requiring the three-dimensional structure of the target protein. This approach fundamentally relies on quantitative structure-activity relationship (QSAR) modeling, which establishes mathematical correlations between a ligand's structural descriptors—such as physicochemical properties, topological features, or molecular fingerprints—and its observed activity, expressed as activity = f(structural descriptors).^[43]^[44] Developed over 50 years ago, QSAR enables the extrapolation of activity trends from a training set of active and inactive compounds to novel structures, facilitating rational optimization in the absence of target structural data.^[43] Key techniques in ligand-based design include pharmacophore modeling, which identifies essential spatial arrangements of molecular features—such as hydrogen-bond donors, acceptors, hydrophobic regions, or charged groups—that are common among active ligands and necessary for biological recognition.^[45]^[46] Three-dimensional QSAR methods, such as Comparative Molecular Field Analysis (CoMFA) and Comparative Molecular Similarity Indices Analysis (CoMSIA), extend this by aligning ligands in 3D space and computing grid-based fields to quantify steric, electrostatic, hydrophobic, and hydrogen-bonding interactions influencing activity.^[47]^[48] Additionally, machine learning-driven similarity searching employs molecular fingerprints, like Extended Connectivity Fingerprints (ECFPs), to encode structural patterns into bit vectors for rapid comparison and virtual screening of large compound libraries, prioritizing molecules structurally akin to known actives.^[49] The typical workflow begins with curating a dataset of ligands with measured activities, followed by descriptor calculation and model training using regression or classification algorithms to predict activity for unseen compounds. These predictions guide the prioritization of virtual hits for synthesis and testing, iteratively refining the model with new data. A seminal example is the Hansch-Fujita QSAR equation, which pioneered this paradigm by correlating lipophilicity and electronic effects with potency:

\log\left(\frac{1}{\text{IC}_{50}}\right) = a(\log P)^2 + b(\log P) + c\sigma + d

Here, \log P represents lipophilicity, \sigma is the Hammett constant for electronic substituent effects, and a, b, c, d are fitted coefficients.^[50]^[43] In applications, ligand-based methods are particularly valuable for lead optimization, where they predict structural modifications to enhance potency by analyzing activity cliffs in QSAR models or pharmacophore fits. These techniques support de novo design by generating diverse analogs that maintain key features while improving selectivity or reducing off-target effects. Recent advances incorporate deep learning for de novo ligand generation, such as the REINVENT model, which uses reinforcement learning on recurrent neural networks to propose novel SMILES strings optimized for desired properties like binding affinity.^[51] Hybrid strategies briefly integrate ligand-based predictions with structure-based docking to refine candidates when partial target data becomes available.^[52]

Structure-Based Design

Structure-based drug design leverages the three-dimensional (3D) atomic coordinates of a biological target, typically a protein, to rationally design and optimize small-molecule ligands that bind specifically to its active site. These coordinates are primarily obtained through experimental techniques such as X-ray crystallography, which provides high-resolution structures of protein-ligand complexes, nuclear magnetic resonance (NMR) spectroscopy for solution-state dynamics, or computational methods such as traditional homology modeling or AI-driven structure prediction (e.g., AlphaFold) when experimental data is unavailable, using known structures of related proteins as templates or sequence-based deep learning models to predict the target's geometry.^[53]^[54]^[32] The approach emphasizes the geometry of the target's binding pocket, including its shape, size, and physicochemical properties, to guide ligand placement and interaction modeling.^[53] A core method in structure-based design is molecular docking, which computationally predicts the preferred orientation (pose) of a ligand within the target's binding site by simulating non-covalent interactions. Seminal tools like AutoDock employ an empirical scoring function to estimate the binding free energy (ΔG), approximated as the sum of van der Waals (vdW), hydrogen bonding (H-bond), desolvation, and electrostatic terms:

\Delta G \approx \sum \left( E_{\text{vdW}} + E_{\text{H-bond}} + E_{\text{desolvation}} + E_{\text{electrostatic}} \right)

where each term is calculated from pairwise atom interactions weighted by empirical coefficients derived from known complexes.^[55] To account for target flexibility and solvent effects, molecular dynamics (MD) simulations are integrated, using force fields like AMBER to model atomic motions over time. The AMBER potential energy function decomposes the system's energy as:

E = E_{\text{bond}} + E_{\text{angle}} + E_{\text{dihedral}} + E_{\text{vdW}} + E_{\text{electrostatic}}

enabling refinement of docked poses by exploring conformational ensembles and binding dynamics. Identifying suitable binding sites is a prerequisite, often achieved through cavity detection algorithms that analyze the target's surface for pockets. CASTp (Computed Atlas of Surface Topography of Proteins) computes pocket volumes and accessibility by triangulating solvent-accessible surfaces from atomic coordinates, quantifying geometrically feasible binding regions. Complementary hot spot mapping with FTMAP probes the site's druggability by docking small organic fragments and ranking consensus binding clusters based on their energetic contributions, highlighting key interaction regions for ligand design. Scoring functions rank docking poses and predict binding affinities, categorized as empirical (knowledge-based from experimental data), force-field based (physics-derived interactions), or consensus approaches combining multiple functions for improved accuracy. For instance, X-Score is an empirical function that linearly combines H-bond, vdW, and hydrophobic terms fitted to known affinities, enhancing virtual screening enrichment. A persistent challenge is the underestimation of entropic contributions, such as ligand conformational flexibility and solvent entropy, which force-field methods often approximate poorly, leading to biases in pose selection.^[56] Recent advances have accelerated these methods, with GPU-accelerated MD simulations in the 2020s enabling longer timescales (microseconds to milliseconds) for flexible targets, as implemented in tools like AMBER with CUDA support, facilitating allosteric site exploration. AI integration, such as DiffDock (2022), refines docking by diffusion models that generate ligand poses de novo, outperforming traditional samplers in pose accuracy on diverse targets.^[57] Cryo-electron microscopy (cryo-EM) has further expanded structural inputs, providing near-atomic resolution for large complexes intractable to X-ray, directly informing docking for membrane proteins and multi-subunit assemblies in the 2020s.^[58]

Experimental Techniques

High-Throughput Screening

High-throughput screening (HTS) is an automated process in drug discovery that evaluates millions of chemical compounds from libraries exceeding 10^6 members against biological targets or phenotypic endpoints to identify initial active hits.^[59] This approach relies on robotic systems for liquid handling, plate movement, and detection, typically using multi-well formats such as 384- or 1536-well microplates with volumes as low as 1-5 µL per well to enable efficient testing.^[26] Common detection methods include fluorescence, luminescence, and absorbance readouts, which allow for rapid signal measurement across thousands of samples per day.^[59] HTS assays are categorized into biochemical, cell-based, and absorption, distribution, metabolism, and excretion (ADME)-focused types. Biochemical assays, such as those measuring enzyme inhibition through fluorescence resonance energy transfer (FRET), directly assess target-ligand interactions in purified systems, providing high specificity for mechanism-based validation. Cell-based assays, like reporter gene systems where luciferase expression indicates pathway activation, incorporate physiological context to detect functional responses in living cells.^[59] ADME-focused screens, including solubility assessments via turbidimetric methods, evaluate physicochemical properties to prioritize compounds with favorable pharmacokinetic profiles early in discovery.^[60] Hit identification in HTS involves statistical validation and confirmation to ensure reliability. The Z'-factor, a dimensionless metric calculated from positive and negative control variability, assesses assay quality, with values greater than 0.5 indicating robust performance suitable for screening.^[61] Potential hits are confirmed using orthogonal assays, such as switching from fluorescence to mass spectrometry-based readouts, while counterscreens filter false positives by testing against unrelated targets or known interferents like reactive compounds.^[62] The evolution of HTS traces back to the 1990s, when ultra-HTS (uHTS) platforms emerged, capable of processing over 10^5 compounds per day through integrated robotics and miniaturization from 96-well to higher-density plates.^[26] By the early 2000s, incorporation of ADME endpoints expanded its scope beyond efficacy to safety profiling. In the 2020s, advancements include microfluidic technologies, such as droplet-based systems that encapsulate reactions in picoliter volumes for throughputs exceeding 10^6 assays per day, reducing reagent use and enabling single-cell analysis.^[63] As of 2025, integrations with next-generation sequencing (NGS) and advanced AI have further boosted capabilities, enabling genomic-focused screens and throughputs up to 10^8 assays per day in optimized droplet systems.^[64]^[65] Additionally, AI-assisted triage has become prevalent, using machine learning to prioritize screening subsets and analyze hit patterns, improving efficiency in large-scale campaigns.^[66] Typical HTS campaigns yield hit rates of 0.1-1%, depending on library diversity and assay stringency, often generating hundreds to thousands of initial actives from million-compound screens for further triage.^[66]

Fragment-Based Screening

Fragment-based screening, a key experimental technique in drug design, involves identifying low-molecular-weight chemical fragments that bind weakly to a biological target, serving as starting points for developing potent drug candidates. These fragments typically have molecular weights under 300 Da and are screened from libraries comprising 1,000 to 10,000 compounds, enabling efficient exploration of a vast chemical space with minimal synthetic effort.^[67]^[68] Hits exhibit weak binding affinities, often in the millimolar range, but are prioritized based on ligand efficiency (LE), defined as the binding free energy per non-hydrogen atom, with target values exceeding 0.3 kcal/mol per heavy atom to ensure efficient optimization potential.^[68]^[69] Detection of fragment binding relies on sensitive biophysical methods, including X-ray crystallography through direct soaking of fragments into pre-formed protein crystals to visualize binding sites, ligand-observed nuclear magnetic resonance (NMR) techniques such as saturation transfer difference (STD)-NMR for indirect detection of interactions, surface plasmon resonance (SPR) for real-time affinity measurements, and differential scanning fluorimetry (DSF) to assess thermal stability shifts induced by binding.^[70]^[71] Once identified, hits are elaborated via strategies like linking multiple fragments, merging overlapping scaffolds, or growing individual fragments, often informed by structure-activity relationships derived from catalog sourcing (SAR by catalog, SBC) to rapidly test analogs without extensive synthesis.^[71]^[72] These post-2010 biophysical advances, such as widespread adoption of DSF for high-throughput triage, have enhanced the reliability of hit validation.^[73] The approach offers distinct advantages over traditional high-throughput screening (HTS), including broader chemical diversity coverage from smaller libraries and a lower synthetic burden, as fragments require fewer modifications to achieve drug-like properties.^[74] It excels for "undruggable" targets like bromodomains and protein-protein interfaces, where flat or featureless binding sites challenge conventional methods; for instance, the 2016 FDA approval of venetoclax, a BCL-2 inhibitor for chronic lymphocytic leukemia, originated from fragment hits that enabled structure-guided optimization of this previously intractable target.^[74]^[70] In the workflow, validated fragment hits are expanded iteratively through structure-guided design, integrating crystallographic or NMR-derived binding modes with medicinal chemistry to improve potency while maintaining high LE, often complemented by HTS for orthogonal validation of elaborated leads.^[67] This process has yielded several FDA-approved drugs in the 2010s and 2020s, including erdafitinib (2019) for bladder cancer, asciminib (2021) for chronic myeloid leukemia, and sotorasib (2022) for non-small cell lung cancer, underscoring FBDD's growing role in FDA approvals of small-molecule drugs from novel scaffolds.^[74]

Optimization and Development

Lead Optimization

Lead optimization represents the iterative refinement stage in drug design, where initial hit compounds from screening methods are systematically modified to enhance their therapeutic potential while addressing limitations in efficacy, pharmacokinetics, and safety. This process focuses on transforming promising leads into viable preclinical candidates by balancing multiple molecular properties, often requiring several cycles of synthesis and evaluation to achieve optimal profiles.^[75] The core of lead optimization involves structure-activity relationship (SAR) analysis, which maps how structural changes influence biological activity to guide targeted modifications of hit scaffolds. Common strategies include bioisostere replacement, where functional groups are substituted with analogs of similar electronic and steric properties to retain potency while improving metabolic stability or reducing toxicity. Additionally, parallel synthesis techniques enable the rapid production of focused analog libraries, allowing medicinal chemists to explore SAR trends efficiently and accelerate iteration.^[75]^[76]^[77] Key performance metrics in lead optimization emphasize potency, with a target half-maximal inhibitory concentration (IC₅₀) below 100 nM for the primary target to ensure sufficient efficacy at achievable doses. Selectivity is quantified by the selectivity index, typically aiming for a value greater than 100 relative to off-target proteins to minimize adverse effects. To holistically evaluate candidates, multiparameter optimization (MPO) scores are employed, calculated as the sum of normalized desirability functions for physicochemical properties such as logP (ideally 1-3 for balanced solubility and permeability), hydrogen bond donors (≤3), and rotatable bonds (<5), providing a composite measure to prioritize leads with favorable drug-like characteristics.^[78]^[79]^[80] Optimization techniques rely on iterative medicinal chemistry cycles, integrating design, synthesis, biological assays, and computational predictions to refine leads progressively. Early absorption, distribution, metabolism, excretion, and toxicity (ADMET) profiling is crucial, including cytochrome P450 (CYP) inhibition assays to identify and mitigate drug-drug interaction risks. Compliance with Lipinski's Rule of 5—molecular weight <500 Da, calculated logP <5, hydrogen bond donors <5, and hydrogen bond acceptors <10—serves as a predictive guideline for oral bioavailability, helping to filter candidates likely to succeed in vivo.^[81]^[82] Despite these approaches, lead optimization faces significant challenges, including an attrition rate of approximately 50% for projects advancing from this phase, primarily due to difficulties in simultaneously optimizing efficacy, selectivity, and ADMET properties.^[83] Multi-objective scoring systems are essential to navigate these trade-offs, but incomplete integration can lead to stalled progress or suboptimal candidates. Recent advances incorporate artificial intelligence-driven generative models, developed post-2020, for de novo design and optimization, enabling the exploration of vast chemical spaces to propose novel scaffolds with balanced profiles more rapidly than traditional methods; as of 2025, AI tools are further accelerating hit-to-lead processes through optimized design-synthesis-testing cycles.^[84]^[80]^[85]^[86]

Pharmacokinetics and Safety

Pharmacokinetics and safety evaluation are integral to drug design, focusing on the absorption, distribution, metabolism, and excretion (ADME) properties of candidate compounds to ensure effective delivery and minimal adverse effects in vivo. During development, leads from prior optimization stages are assessed for their pharmacokinetic profiles to predict human exposure and duration of action. Poor ADME characteristics contribute significantly to attrition, with approximately 40% of drug candidates failing due to suboptimal pharmacokinetics or toxicity issues in preclinical and early clinical phases.^[87] ADME principles guide the assessment of drug behavior in biological systems. Absorption is evaluated using models like Caco-2 cell permeability assays, which simulate intestinal epithelial transport to estimate oral bioavailability. Distribution is influenced by factors such as plasma protein binding, where high binding can limit free drug availability at target sites. Metabolism primarily involves cytochrome P450 (CYP450) enzymes in the liver, with substrates screened to avoid rapid clearance or drug-drug interactions. Excretion occurs mainly via renal clearance, assessed through glomerular filtration rates and transporter activity to ensure adequate elimination without accumulation. These properties are iteratively refined to achieve desirable half-lives and tissue penetration.^[88]^[89] Safety assessment encompasses toxicity profiling to establish a therapeutic window, calculated as the therapeutic index (TI = LD₅₀ / ED₅₀), where LD₅₀ is the median lethal dose in rodents and ED₅₀ is the median effective dose. In vitro assays include hERG channel inhibition for cardiotoxicity risk, which can lead to QT prolongation and arrhythmias. Genotoxicity is tested via the Ames assay, detecting mutagenic potential in bacterial strains. In vivo studies determine LD₅₀ through acute toxicity in rodents, providing data on dose-dependent harm. These evaluations ensure candidates have a wide safety margin before advancing.^[90]^[91]^[92] Optimization strategies address ADME and safety shortcomings. Prodrug design enhances absorption for poorly permeable compounds by masking polar groups, as seen in ester prodrugs that undergo enzymatic conversion post-absorption. To evade metabolism, fluorination at susceptible sites blocks CYP450 oxidation, improving stability without altering potency. Physiologically based pharmacokinetic (PBPK) modeling, using software like Simcyp, predicts human PK profiles from in vitro and preclinical data, aiding dose selection and reducing animal use. Regulatory compliance follows ICH M3(R2) guidelines, which outline nonclinical safety studies for clinical trial support, emphasizing duration and species relevance. Recent advances include human-on-a-chip models, integrating microfluidic organ systems to better predict ADME and toxicity in the 2020s, offering alternatives to traditional animal testing.^[93]^[94]^[95]^[96]^[97]

Applications and Challenges

Case Studies

One prominent example of structure-based drug design is the development of imatinib (Gleevec), approved in 2001 for chronic myeloid leukemia (CML). Imatinib was rationally designed to inhibit the BCR-ABL tyrosine kinase, a fusion protein driving CML, by exploiting the X-ray crystal structure of the Abl kinase domain to target the inactive conformation and block ATP binding.^[98] This approach enabled high selectivity, minimizing off-target effects on other kinases. Clinical trials demonstrated that imatinib transformed CML from a fatal disease to a manageable chronic condition, reducing annual mortality rates from 10-20% to 1-2%, an approximately 80-90% decline.^[99] In the realm of biologics, trastuzumab (Herceptin), approved in 1998, exemplifies target-based monoclonal antibody design for HER2-positive breast cancer. Developed from the murine 4D5 antibody generated via hybridoma technology against the HER2 receptor, trastuzumab was humanized to reduce immunogenicity while retaining binding affinity to the extracellular domain of HER2, thereby inhibiting downstream signaling and promoting antibody-dependent cellular cytotoxicity.^[100] Its efficacy was validated phenotypically in HER2-overexpressing breast cancer xenograft models, showing tumor regression and prolonged survival, which led to its approval as an adjuvant therapy improving overall survival by about 30% in combination regimens.^[101] A recent case of computational repurposing is baricitinib, a Janus kinase inhibitor originally for rheumatoid arthritis, which was rapidly redeployed for COVID-19 in 2020 using AI-driven methods. BenevolentAI's knowledge graph platform analyzed existing kinase inhibitors to predict baricitinib's inhibition of AP2-associated protein kinase 1 (AAK1), facilitating SARS-CoV-2 entry while also countering cytokine storm via JAK-STAT pathway blockade.^[102] This ligand-similar virtual screening from kinase libraries accelerated validation, culminating in successful Phase III trials (e.g., ACTT-2) that showed a reduced median recovery time by 1 day and lower mortality in hospitalized patients.^[103]^[104] Zanamivir, approved in 1999 for influenza treatment, illustrates mixed-method optimization starting from structure-based mimics of sialic acid, the natural substrate of viral neuraminidase. Using X-ray crystallography of neuraminidase-sialic acid complexes, researchers designed zanamivir as a transition-state analogue with a guanidino group at C4 to enhance binding affinity in the active site, preventing viral release.^[105] Subsequent structure-activity relationship (SAR) studies refined substituents for potency and bioavailability, yielding an orally inhaled drug that shortens symptom duration by 1-2 days in uncomplicated influenza.^[106] These cases highlight how integrating computational, structural, and experimental methods can streamline drug design, often compressing the typical 10-15 year development timeline to 5-10 years for targeted therapies. For instance, AI platforms have enabled faster lead identification, as seen in Exscientia's DSP-1181, a 5-HT1A agonist for obsessive-compulsive disorder that reached Phase I trials in 2020 after AI-optimized design from chemical libraries, demonstrating reduced iteration cycles compared to traditional approaches.^[107]^[108]

Limitations and Advances

Drug design faces significant limitations, including a high failure rate where over 92% of candidates fail to translate from preclinical testing to human approval, primarily due to poor predictive models of efficacy and safety.^[109] The average cost to develop and bring a new drug to market is estimated at $2.6 billion, encompassing research, clinical trials, and regulatory hurdles, which strains resources and discourages investment in high-risk areas.^[110] Many therapeutically relevant targets, such as protein-protein interactions (PPIs), remain "undruggable" due to their flat interfaces lacking defined binding pockets for small molecules, limiting the scope of viable candidates.^[111] Additionally, polypharmacology—unintended interactions with multiple targets—poses risks of off-target effects, toxicity, and reduced selectivity, complicating lead optimization.^[112] Criticisms of current practices highlight an overreliance on reductionist approaches that isolate single targets, often overlooking the interconnected nature of biological systems and leading to failures in complex diseases.^[113] Ethical concerns arise from extensive animal testing, which not only raises welfare issues but also contributes to translational failures, as animal models poorly recapitulate human physiology.^[114] Intellectual property barriers, including prolonged patent protections and exclusivity periods, can stifle innovation by restricting access to foundational research and delaying generic competition, ultimately hindering collaborative progress.^[115] Recent advances address these challenges through artificial intelligence and machine learning for predictive toxicology, leveraging datasets like Tox21 to forecast adverse effects with greater accuracy and reducing reliance on animal models.^[116] Organoids, three-dimensional cell cultures mimicking human tissues, have emerged post-2015 as more relevant platforms for drug testing, improving predictions of human responses and efficacy in diseases like cancer.^[117] Multi-omics integration—combining genomics, proteomics, and metabolomics—enables precision drug design by revealing holistic disease mechanisms and tailoring therapies to individual profiles.^[118] Emerging technologies promise further breakthroughs, such as quantum computing for simulating molecular dynamics, exemplified by IBM's 2024 advancements in scaling quantum simulations to model complex protein interactions beyond classical limits.^[119] DNA-encoded libraries facilitate screening of over 10^9 compounds in the 2020s, accelerating hit identification for undruggable targets while minimizing costs and physical library synthesis.^[120] Looking ahead, integrated platforms could shorten development timelines dramatically, as demonstrated by COVID-19 vaccines approved in under one year compared to the traditional 10-15 years, signaling potential for faster responses to unmet needs through accelerated regulatory and technological synergies.^[121]