Fact-checked by Grok 2 weeks ago

Copy-and-paste programming

Copy-and-paste programming is a widespread practice in software development wherein programmers duplicate existing code snippets—often from within the same project, documentation, external sources, or prior work—and adapt them with minor modifications rather than authoring entirely new code or implementing generalized solutions.^[1] This approach, also known as code cloning, serves as a quick method for reusing logic and templates but frequently results in duplicated code blocks that can complicate maintenance.^[2] Studies indicate that copy-and-paste is highly prevalent among developers, with empirical observations from 2004 showing an average of 16 instances per hour during coding sessions for experienced users, including both trivial edits (like variable names) and more substantial blocks or methods comprising about 25% of cases.^[1] Analysis of usage data from over 20,000 Eclipse IDE users across 20 months (2009–2010) reveals an overall average of 2.72 copy-and-paste incidents per hour, with roughly 24% involving external sources and 61% occurring within the same file for structural reuse.^[3] Recent analyses as of 2024 show code cloning comprising 12.3% of changed lines (up from 8.3% in 2021), with projections of 4x growth in 2025 linked to AI tool adoption.^[4] Developers often employ this technique to capture design decisions, such as crosscutting concerns, or to navigate language limitations, like the absence of certain constructs in Java at the time of early studies.^[1] While copy-and-paste can accelerate initial development by leveraging proven patterns and reducing repetitive typing, it offers illusory time savings in the long term due to accumulated technical debt.^[5] Key drawbacks include code duplication that propagates bugs across clones—exemplified by a Mozilla case where a single error affected 12 locations—and heightened security risks from unvetted external code embedding vulnerabilities or licensing conflicts.^[1]^[6] Additionally, it hinders refactoring and overall software quality, as integrated development environments (IDEs) like Eclipse historically provided limited support for detecting or managing clones until tools like CnP emerged to track and refactor them.^[2]

Definition and Overview

Core Concept

Copy-and-paste programming refers to the practice of duplicating source code snippets from one location within a program—or from external sources such as documentation or other projects—and inserting them into another area, typically with minimal or no modifications, resulting in replicated logic and structure across the codebase.^[1] This approach often arises during rapid development or under tight deadlines, where developers prioritize immediate productivity over long-term maintainability.^[5] Key characteristics of copy-and-paste programming include a lack of abstraction, where code is replicated verbatim in terms of both syntax and semantics, rather than being generalized into reusable components like functions or classes.^[1] It commonly occurs in prototyping phases to save typing effort and capture specific design decisions quickly, but it introduces dependencies that complicate program comprehension and evolution.^[1] Unlike parameterized mechanisms such as macros or templates, which allow for variable substitution and reuse without full duplication, copy-and-paste involves direct replication that bypasses such flexibility.^[7] This practice stands in direct contrast to the DRY (Don't Repeat Yourself) principle, which advocates avoiding duplication of knowledge or logic in software systems to enhance maintainability and reduce errors. For instance, a developer might copy a loop that processes a list of user data into another section handling similar employee records, rather than extracting the logic into a shared function, leading to identical code blocks that must be updated separately if requirements change.^[8]

Prevalence in Software Development

Copy-and-paste programming remains a widespread practice in software development, with empirical studies revealing that duplicated code often accounts for 5-20% of lines in typical codebases. An analysis of 153 Apache open-source projects from 2023, for instance, determined that an average of 18.5% of code lines consisted of duplicates, highlighting the scale of the issue even in mature repositories.^[9]^[10] Static analysis tools such as PMD's Copy/Paste Detector (CPD) and SonarQube routinely identify such duplication during scans of open-source projects, with configurable duplication thresholds—such as a minimum of 100 successive tokens—to flag significant overlaps.^[11] These metrics underscore the persistence of the practice across diverse repositories, where clones can inflate maintenance efforts without adding unique value. The occurrence of code duplication varies notably by programming paradigm and project scale. In scripting languages like JavaScript and Python, rates are elevated due to the emphasis on rapid prototyping for quick scripts and prototypes; for example, a 2017 large-scale study of GitHub projects found 94% of JavaScript files to be duplicates of others in the corpus, compared to 40% for Java.^[12] Duplication is particularly prevalent in legacy systems, where incremental modifications often lead to ad-hoc copying to avoid refactoring complex structures, increasing risks during modernization efforts. Small-scale development can exacerbate this through limited oversight, while large teams with code reviews and modular standards tend to mitigate it. Factors such as time pressures, developer inexperience, and insufficient emphasis on modular design further drive its commonality. Under tight deadlines, developers frequently prioritize speed over abstraction, leading to copied snippets as a shortcut. Surveys of development behaviors indicate that copying occurs routinely, with one 2015 study of Eclipse IDE users showing that over 60% of copy-paste incidents occur within the same file.^[13] In terms of domain variations, web development often involves replication for UI components to accelerate frontend assembly, while embedded systems emphasize code optimization due to resource constraints like limited memory, favoring reuse via functions or macros over duplication. As of 2024, the use of AI-generated code has been observed to significantly increase duplication rates, with an 8-fold rise in certain code blocks.^[14]

Historical Development

Early Origins

Copy-and-paste programming emerged in the 1950s and 1960s amid the constraints of early computing, where programmers relied on punch-card systems and assembly languages for manual code reuse due to the absence of high-level compilers and standardized libraries. In this era, code was typically written on coding sheets, punched into cards by hand or via keypunch machines, and fed into computers like the IBM 701 or UNIVAC I, making abstraction and modular reuse challenging without digital editing tools. Programmers often duplicated code segments manually to avoid rewriting repetitive logic, as recompiling or reassembling from scratch was time-intensive and error-prone on resource-limited hardware.^[15]^[16] A notable example of this practice occurred in early subroutine handling, where developers physically copied routines from notebooks or prior card decks into new programs. Grace Hopper, working on the UNIVAC at Remington Rand, recounted requesting subroutines like a sine function from colleagues: "If I needed a sine subroutine, angle less than π/4, I’d whistle at Dick and say ‘can I have your sine subroutine?’ and I’d copy it out of his notebook." This manual transcription frequently led to errors, such as miscalculating memory addresses or mistyping symbols (e.g., confusing "4" with "Δ" or "A"), prompting Hopper to develop the A-0 compiling system in 1952 to automate subroutine linkage and reduce duplication mistakes. By the late 1950s, IBM's Fortran implementation for the 704 computer introduced formalized subroutines to facilitate reuse without verbatim copying.^[15] Technological limitations, including the lack of integrated development environments and reliance on physical media like magnetic tapes and punch cards, further entrenched these methods; tapes required sequential rewinding for edits, while cards were immutable once punched, discouraging iterative abstraction. In resource-scarce environments, such as government and corporate labs with limited machine time, duplication was viewed pragmatically as a necessary expedient rather than a flaw, reflecting a broader absence of software engineering principles that would later identify it as an anti-pattern. Programmers prioritized functionality over maintainability, sharing code informally without formal attribution, as existing programs were often opaque and tailored to specific machines.^[16]

Evolution in Modern Programming

In the 1980s and 1990s, the proliferation of personal computers democratized programming, with languages like BASIC and early C gaining widespread adoption among hobbyists and professionals.^[17] This era saw the rise of integrated development environments (IDEs) such as Turbo Pascal, released in 1983, which provided rapid compilation and intuitive editing features that facilitated copy-and-paste operations for code reuse.^[18] However, as object-oriented programming (OOP) paradigms emerged in languages like C++ and later Java, these practices began introducing duplication challenges, as developers copied procedural code snippets without adapting them to modular, inheritance-based designs. The 2000s marked a surge in copy-and-paste programming driven by the Web 2.0 era, where developers frequently reused HTML, CSS, and JavaScript snippets from online forums and tutorials to accelerate front-end development. The launch of Stack Overflow in 2008 further normalized this approach by providing a vast repository of raw, executable code examples, enabling quick integration but contributing to widespread code duplication across projects.^[19] Open-source culture amplified these trends, as shared repositories encouraged direct copying of boilerplate code, often without full comprehension of underlying logic. From the 2010s to 2025, AI-assisted coding tools like GitHub Copilot, introduced in 2021, have occasionally generated duplicated code by suggesting near-identical snippets based on common patterns in training data.^[20] Empirical analyses of over 153 million lines of code reveal that AI tools exert downward pressure on refactoring efforts, leading to higher rates of duplication compared to human-written code.^[21] Meanwhile, the adoption of microservices architecture has aimed to mitigate duplication through service isolation and shared libraries, yet it persists due to the need for independent deployments and technology heterogeneity across services.^[22] Culturally, copy-and-paste programming evolved from an accepted norm to a recognized code smell following the 2001 Agile Manifesto, which emphasized sustainable development and refactoring to address issues like duplication that hinder maintainability.^[23] In agile methodologies, practices such as test-driven development and continuous integration have promoted abstraction over repetition, viewing duplicated code as a symptom of deeper design flaws.^[24]

Motivations for Use

Unintentional Duplication

Unintentional duplication in copy-and-paste programming arises when developers replicate code segments accidentally, often due to oversight during time-pressured tasks or when handling similar but not identical requirements. For instance, programmers may copy code for implementing similar features, such as duplicate API handlers that process requests from different endpoints, without abstracting common logic into reusable functions or classes. This oversight frequently occurs in debugging scenarios, where developers patch similar bugs in multiple locations by pasting modified code snippets instead of creating a centralized fix. Psychological factors contribute significantly to these accidental duplications, as high cognitive load in complex projects prompts reliance on "quick fixes" to maintain momentum. Studies indicate that such practices intensify under deadlines, with developers exhibiting reduced attention to refactoring opportunities amid increased stress and multitasking. Research analyzing developer behavior in open-source projects has shown that code duplication rates can rise during sprint deadlines, correlating with hurried copy-paste actions rather than deliberate design. Key indicators of unintentional duplication include the emergence of identical bugs across multiple code sections, suggesting that a flaw in one pasted block propagated without modification. Other code smells manifest as near-identical functions or blocks that differ only in variable names, constants, or minor tweaks, often detectable through static analysis tools that flag redundancy exceeding 70-80% similarity. In practice, these patterns reveal a lack of abstraction, where developers fail to recognize opportunities for generalization during initial implementation. A notable case study involves monolithic applications developed through trial-and-error prototyping, where developers iteratively copy and tweak code blocks to test variations, leading to widespread unintentional duplication. For example, in large-scale enterprise software, such as legacy banking systems, prototyping phases have resulted in duplicated authentication modules scattered across services, complicating maintenance and increasing vulnerability to synchronized failures. Analysis of such systems reveals that up to 15-25% of the codebase can stem from these unrefactored copies, accumulated over months of ad-hoc development.^[25]

Intentional Design Decisions

Copy-and-paste programming can serve as an intentional design decision in scenarios where the overhead of abstraction outweighs its benefits, such as in performance-critical sections of code. Developers may deliberately duplicate code to optimize execution speed, for instance, by manually unrolling loops to eliminate branch overhead and improve instruction-level parallelism, a technique commonly applied in hot paths of applications like web servers or game engines. In the Apache HTTP server, cloning the worker multithreading model into threadpool and leader variants allowed targeted performance enhancements without risking the stability of the original implementation. Similarly, in embedded systems, hardware-specific adaptations often involve replicating core driver logic, as seen in the Linux SCSI subsystem where code for different controllers like NCR5380 was forked to accommodate platform variations while minimizing abstraction complexity.^[26] This approach accelerates development by avoiding the time and testing costs associated with creating reusable functions or classes, particularly in throwaway prototypes or experimental features where maintainability is not a primary concern. For example, programmers use copy-and-paste to replicate structural templates, such as logging statements or design pattern skeletons, enabling quick iteration and deferred refactoring until the appropriate abstraction level emerges during prototyping. In data science workflows, duplication within Jupyter notebooks facilitates rapid hypothesis testing and data exploration, with studies showing an average self-duplication rate of 7.6% across code cells.^[27] These benefits include reduced cognitive load in early stages and preservation of code clarity by sidestepping premature generalizations that might introduce unnecessary dependencies. Intentional duplication proves appropriate in small-scale projects, one-off scripts, or domains prioritizing speed over long-term evolution, such as exploratory data analysis where readability through explicit repetition enhances understanding for non-developer collaborators. However, even deliberate use carries limitations; as projects scale, duplicated code can amplify maintenance burdens if changes propagate inconsistently, underscoring the need for eventual refactoring in growing systems like evolving embedded firmware. While this contrasts with unintentional duplication from oversight, strategic cloning remains a valid tactic when risks are assessed and contained.^[26]

Forms of Code Duplication

Reuse from External Libraries

Copy-and-paste programming often involves developers pasting code snippets directly from external library documentation or examples into their projects, bypassing proper import mechanisms or dependency declarations. This practice is prevalent in languages lacking robust package management, such as Solidity, where developers frequently copy third-party library (TPL) code to integrate functionality without formal dependency tools.^[28] For instance, in Java projects, developers may clone entire classes from Maven libraries like edu.ucar:netcdf into their codebase, embedding them as local implementations rather than linking to the original package.^[29] While this approach accelerates initial development by allowing quick adaptation of proven code, it introduces significant drawbacks related to maintenance and integrity. Copied code becomes disconnected from the original library's updates, leading to outdated implementations that miss security patches or performance improvements, thus violating standard dependency management principles.^[30] In Solidity smart contracts, such copies affect approximately 8.87% of analyzed repositories, propagating vulnerabilities like those in unpatched TPLs.^[28] Similarly, in broader software engineering, this form of reuse heightens the risk of error propagation, as cloned fragments from unverified external sources—such as websites or forums—may contain hidden flaws like SQL injection vulnerabilities.^[31] These practices not only complicate long-term maintenance but also expose projects to legal risks, such as unintended license violations from unlicensed or improperly attributed external code.^[29] Detection of such external reuse relies on specialized tools that identify code clones through similarity metrics, flagging fragments with high overlap (e.g., 80% or greater structural match) to known library sources. In Solidity, the SPADE tool infers fine-grained TPL dependencies by analyzing metadata and code patterns, achieving precise identification of copied elements across repositories.^[28] For Java, the JC-Finder employs class-level Abstract Syntax Tree analysis against a reference dataset of over 9,000 libraries, detecting clone-based TPLs with an F1-score of 0.818 and revealing their prevalence in 10% of GitHub projects.^[29] These methods, rooted in static analysis techniques like token-based or AST-based comparison, enable early remediation by recommending proper imports over embedded copies.^[31]

Branching and Conditional Logic

In copy-and-paste programming, duplication in branching and conditional logic occurs when developers replicate code segments across different execution paths within the same application, such as in if-else statements or switch cases, to handle varying conditions without refactoring for reuse. This practice is common in scenarios like duplicating validation logic for distinct user roles; for instance, error-handling routines for standard user inputs might be copied and slightly modified for administrative paths, resulting in near-identical blocks that perform similar checks but diverge in minor ways.^[32]^[33] Such duplication often stems from evolving requirements during development, where initially shared logic is forked into separate branches to accommodate new conditions, or from collaborative efforts where multiple programmers independently copy existing code to adapt it quickly under time constraints. It is particularly prevalent in event-driven programming, such as GUI event handlers, where boilerplate code for tasks like logging or data binding is repeatedly pasted into handlers for different user interactions, like button clicks versus menu selections, to avoid redesigning a unified approach.^[8]^[34] A typical issue arises in switch statements, where multiple cases contain copied boilerplate, such as identical setup or cleanup operations, leading to overlooked updates; for example, if a common validation step is duplicated across cases processing different input types, modifications to that step in one case will not automatically apply to others, propagating inconsistencies.^[35]^[36] In contrast to reusing external libraries, which involves importing pre-existing modules, this internal duplication specifically fragments logic within control flows, amplifying maintenance challenges when requirements shift.^[8] To mitigate such issues, alternatives like polymorphism in object-oriented designs can centralize shared behavior, though copy-pasting persists when quick adaptations are prioritized over refactoring.^[32]

Repetitive or Variant Implementations

Repetitive or variant implementations in copy-and-paste programming occur when developers duplicate code fragments to handle similar tasks with minor differences, often resulting in near-miss clones where identifiers, parameters, or structures are altered slightly. These patterns, known as Type-3 clones in software engineering literature, arise from iterative development needs, such as adapting algorithms or interfaces for subtle variations without leveraging abstraction mechanisms. A study of large open-source systems like Apache HTTP Server and Gnumeric identified templating and customization as primary patterns, where boilerplate code for repetitive operations is copied and tweaked, comprising up to 71% of detected clones in systems exceeding 300,000 lines of code (LOC).^[37] Common examples include duplicating file input/output (I/O) logic for handling multiple data formats, such as parsers for CSV and JSON files in data processing pipelines. In these cases, core reading, parsing, and error-handling routines are copied, with changes limited to format-specific delimiters or schema mappings, leading to fragmented maintenance. Similarly, in user interface (UI) development, code for elements like buttons is often replicated with minor tweaks for styling, event handlers, or accessibility attributes; for instance, Gnumeric's GUI toolkit duplicated button creation sequences across dialog modules to accommodate locale-specific variations. Another frequent instance appears in database interactions, where similar SQL queries are copied and modified only for table or column names, as seen in embedded SQL within Java applications, exacerbating duplication in query-heavy systems.^[37]^[38] Key drivers of these duplications include limitations in programming languages lacking robust generics or templates, which force developers to replicate type-specific implementations rather than parameterizing common logic. For example, in pre-generics Java or C without templates, data structure operations like sorting or serialization must be duplicated across types, a practice noted in refactoring analyses of C++ codebases where templates explicitly reduce such redundancy. This issue is particularly prevalent in data processing pipelines and web applications, where variant requirements for formats or validations (e.g., repeated form input checks differing by field type) amplify the pattern.^[39]^[37]

Impacts and Drawbacks

Maintenance Challenges

Copy-and-paste programming introduces significant maintenance challenges by creating multiple identical or similar code fragments that must be updated synchronously across the codebase. When a modification is required in one instance, developers must locate and alter all corresponding copies, a process that is error-prone and time-consuming without automated tools. This often results in inconsistencies, such as uneven application of security patches, where a vulnerability fixed in one duplicate may persist in others, compromising system integrity.^[40] Empirical studies demonstrate that duplicated code substantially elevates maintenance costs compared to unique code. For instance, an analysis of six open-source systems revealed that cloned code demanded higher modification effort in 61.11% of examined cases, with Type 2 (syntactically identical except for differences in whitespace, comments, or identifiers) and Type 3 (similar with further modifications like added or deleted statements) clones showing the most pronounced increases—up to an order of magnitude more effort in specific examples, such as 36,993.6 effort units for a Type 2 clone versus 3,260.8 for non-cloned code in the QMail Admin system.^[41] Over time, pervasive code duplication leads to bloated codebases, where redundant fragments inflate the overall size and complexity of the software. This redundancy complicates onboarding for new developers, who must navigate and comprehend repetitive sections without gaining proportional value, thereby slowing knowledge transfer and team productivity. Additionally, while not the primary focus, such duplication can exacerbate risks of error propagation during maintenance.^[42] Clone detection tools, such as NiCad, quantify these issues through metrics like clone density, typically reporting 5–30% of a system's lines as duplicated in large software projects, with averages around 7–23% in empirical surveys of open-source repositories. These figures underscore the scale of maintenance overhead in duplicated systems.^[43]

Risk of Error Propagation

Copy-and-paste programming facilitates the replication of flaws across multiple code instances, as any defect in the source code—such as a buffer overflow or null pointer dereference—is duplicated without alteration unless explicitly modified.^[44] This mechanism amplifies error propagation because subsequent adaptations often fail to address all instances consistently, leading to incomplete fixes where a correction in one copy leaves vulnerabilities in others.^[44] For example, in operating system kernels, copy-pasted routines in device drivers have introduced type mismatches or inconsistent variable usages, causing runtime failures like segmentation faults when the duplicated logic interacts with varying contexts.^[44] Real-world incidents illustrate the severity of such propagation in critical systems. In the Linux kernel's SCSI subsystem, developers copy-pasted error-handling code between driver functions, but failed to update a variable name in one instance, resulting in a null pointer dereference that crashed the system during disk operations.^[44] Similarly, in network drivers, duplicated packet-processing logic omitted boundary checks in one variant, enabling buffer overflows that exposed the kernel to exploitation.^[44] These cases, drawn from large-scale software like the Linux kernel, demonstrate how copy-paste in custom implementations—particularly in low-level or cryptographic modules—exacerbates impacts by spreading identical flaws across interdependent components.^[44] Code duplication elevates the overall attack surface by multiplying the locations where a single flaw can be triggered, with empirical studies indicating that the likelihood of errors scales with the number of copies due to inconsistent maintenance. Quantitative analyses of open-source projects reveal that approximately 18.42% of buggy code clones participate in bug propagation, where a defect in one clone affects related instances, increasing system-wide failure rates compared to non-duplicated code. This scaling effect is particularly pronounced in security-critical domains, where duplicated cryptographic primitives heighten vulnerability exposure without proportional benefits in robustness.^[45] Detecting propagated errors in duplicated code presents significant challenges, as subtle modifications—such as renamed variables or adjusted parameters—can mask identical underlying bugs, evading standard static analysis tools.^[46] These inconsistencies, common in Type-3 clones (near-miss duplicates), require specialized mining techniques to identify related bugs, yet even advanced detectors like CP-Miner struggle with large codebases where variants diverge just enough to obscure shared flaws.^[44] Consequently, such hidden propagations often persist until runtime failures or security audits reveal them, complicating proactive mitigation.^[46]

Strategies for Avoidance

Refactoring Approaches

Refactoring approaches for copy-and-paste programming focus on restructuring existing code to eliminate duplicates while preserving functionality. One primary method is extract method refactoring, which identifies repeated code fragments and moves them into a reusable method. In integrated development environments (IDEs) like Eclipse, developers select the duplicated code block, then use the Refactor > Extract Method option (or shortcut Alt+Shift+M) to generate a new method with a descriptive name, replacing the originals with calls to it. This technique reduces redundancy and improves readability, as demonstrated in Java projects where long methods are broken into focused units.^[47]^[48] For handling variants of duplicated code, such as similar logic with minor differences, form templates or higher-order functions can abstract the common structure. In functional programming paradigms, higher-order functions like Python decorators wrap repetitive boilerplate around core logic, avoiding inline repetition. For instance, a decorator can encapsulate logging or validation that appears in multiple functions, applying it via @decorator syntax without altering the underlying code. This approach is particularly effective for repetitive implementations, allowing parameterization for variations.^[49] The process typically begins with clone detection to identify duplicates systematically. Tools like CCFinder, a token-based detector, transform source code into normalized tokens and compare them to find exact or near-exact clones across languages such as C, Java, and COBOL. Once clones are located, developers abstract them into shared modules, such as utility classes or libraries, by pulling common elements upward via refactorings like Pull Up Method in Eclipse. This step ensures the refactored code is centralized and testable.^[50]^[47] Language-specific techniques further tailor refactoring to idioms. In Java, interfaces define contracts for duplicated behaviors, enabling polymorphism; for example, extracting shared validation logic into an interface implemented by multiple classes eliminates inline copies. In Python, decorators handle repetitive cross-cutting concerns like error handling, refactoring scattered try-except blocks into a reusable wrapper applied at the function level. These methods leverage the language's strengths to create maintainable abstractions.^[8]^[49] Best practices emphasize thresholds to prioritize efforts, such as refactoring clones exceeding 6 statements to balance cost and benefit, avoiding trivial changes. This proactive approach minimizes accumulation of duplicates over time.^[51]

Tooling and Best Practices

Static analyzers such as Simian provide robust detection of code duplication across multiple programming languages, including Java, C#, C++, and Ruby, by identifying similar blocks of code to help developers maintain the DRY principle proactively.^[52] Similarly, CloneDR scans source code to uncover duplicated fragments, enabling early intervention before integration.^[53] Linters like ESLint incorporate rules to flag specific forms of duplication, such as no-duplicate-case in switch statements or no-dupe-keys in object literals, which prevent common copy-paste errors during development in JavaScript projects.^[54]^[55] Integrated development environment (IDE) features further support prevention through refactoring tools; for instance, IntelliJ IDEA's Extract Method refactoring allows developers to select duplicated code blocks and automatically generate reusable methods, reducing manual copying by promoting abstraction at the point of writing. Best practices for enforcing the DRY principle include rigorous code reviews, where peers scrutinize pull requests for redundant implementations and suggest abstractions or library usage to eliminate copies.^[56] Version control systems like Git can aid detection by analyzing diffs in commit histories, highlighting near-identical changes across files that indicate potential duplication before merging.^[57] Promoting the use of external libraries over ad-hoc pasting is another key practice; in JavaScript ecosystems, developers are encouraged to leverage npm packages for common functionalities, such as utility functions, to avoid reinventing and duplicating logic across modules.^[58] Organizational strategies, including pair programming, help minimize accidental duplicates by having one developer (the navigator) actively recall and reference existing code, fostering reuse and higher cohesion.^[59] Code quality gates in continuous integration pipelines enforce thresholds, such as limiting new code duplication to under 10%, to block merges that violate standards and ensure scalable maintainability. In the 2020s, emerging AI-powered tools are enhancing these efforts; for example, GitHub Copilot includes a duplication detection filter that suppresses suggestions matching public code on GitHub to avoid external copying, while SonarQube provides automated warnings for repeated patterns, augmented by AI for code quality analysis including duplication detection, as of 2025.^[60]^[61]

References

[1]
[PDF] An ethnographic study of copy and paste programming practices in ...
Copied text is often reused as a template and is customized in the pasted context. Current software engineering tools have poor support for identifying reusable ...
[2]
Managing the copy-and-paste programming practice in modern IDEs
Abstract. Copy-and-paste is a common practice in industrial software development and maintenance, which results in code clones. Prior research has focused on ...
[3]
[PDF] An Empirical Study of the Copy and Paste Behavior during ...
Abstract—Developers frequently employ Copy and Paste. How- ever, little is known about the copy and paste behavior during development.
[4]
The Dangers of Copy and Paste - GrammaTech
Aug 8, 2018 · Poor reuse: The real cost of developing software is not in the typing of the code, so simply duplicating code does little to increase ...
[5]
Why You Should Avoid Copy & Paste Code - Mend.io
Jul 5, 2023 · Copying and pasting code from unknown sources poses substantial security risks. Malicious actors can intentionally embed vulnerabilities or ...Licensing considerations · Neglecting the benefits of the... · Risks to security
[6]
Copy and paste programming - Semantic Scholar
Copy-and-paste programming is the production of highly repetitive computer programming code, as produced by copy and paste operations.
[7]
Duplicate Code - Refactoring.Guru
Duplication usually occurs when multiple programmers are working on different parts of the same program at the same time.
[8]
What percentage of code is copied and pasted? - /src - Software.com
Sep 14, 2022 · The team at Stack Overflow reported that "depending on who you ask, as little as 5-10% or as much as much as 7-23% of code is cloned from ...
[9]
Who Made This Copy? An Empirical Analysis of Code Clone Authorship
### Summary of Statistic and Context on Code Clone Authorship
[10]
Finding duplicated code with CPD | PMD Source Code Analyzer
Duplicate code can be hard to find, especially in a large project. But PMD's Copy/Paste Detector (CPD) can find it for you! CPD works with Java, JSP, C/C++, C#, ...
[11]
[PDF] DéjàVu: A Map of Code Duplicates on GitHub - Jan Vitek
Lastly, a project-level analysis shows that between 9% and 31% of the projects contain at least 80% of files that can be found elsewhere. These rates of ...
[12]
Look Before You Leap! Duplicate Code Increases Risk During ...
Oct 17, 2017 · Look Before You Leap! Duplicate Code Increases Risk During Legacy Modernization and Inhibits the Journey towards Digital Transformation.
[13]
(PDF) An Empirical Study of the Copy and Paste Behavior during ...
Our objective is to identify the role of copy and paste programming or code clone in current development practices. A Systematic Mapping Study (SMS) has ...
[14]
In embedded development, is it better to duplicate code or create ...
Oct 3, 2019 · It is never a good idea to manually duplicate source code. If you use C++, and to a lesser degree plain C , you can write your code once and ...Which is better for a future, an embedded system or a web ... - QuoraWhy is embedded programming not as popular as web development?More results from www.quora.com
[15]
How can you reduce code size in embedded systems programming?
Feb 26, 2024 · 1. Choose the right language and compiler ; 2. Use data compression and encoding ; 3. Avoid unnecessary or duplicated code ; 4. Use code ...
[16]
[PDF] Early programming languages - Stanford University
“We were using subroutines. We were copying routines from one program into an other. There were two things wrong with that technique: one was that the ...
[17]
[PDF] Programming in America in the 1950s -- Some Personal Impressions
In contrast, programming in the early 1950s was a black art, a private arcane matter involving only a pro- grammer, a problem, a computer, and perhaps a small ...
[18]
(PDF) Forty years of software reuse - ResearchGate
Aug 5, 2025 · Forty years of software reuse This paper is an overview of software reuse, its origins, research areas and main historical contributions.
[19]
Software & Languages | Timeline of Computer History
An IBM team led by John Backus develops FORTRAN, a powerful scientific computing language that uses English-like statements.
[20]
30 Years Ago: Turbo Pascal, BASIC Turn PCs Into Programming ...
Sep 5, 2013 · Turbo Pascal included a compiler and an IDE for the Pascal programming language running on CP/M, CP/M-86 and DOS, developed by Borland under co ...
[21]
[2002.01275] Code Duplication on Stack Overflow - arXiv
Feb 4, 2020 · ... impact of code duplication on software maintainability, the prevalence and implications of code clones on SO have not yet received the ...
[22]
An Empirical Study of Code Clones from Commercial AI Code ...
Jun 19, 2025 · Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT.
[23]
Humans do it better: GitClear analyzes 153M lines of code, finds ...
Apr 17, 2024 · Highlighting key shifts in code churn, duplication, and age, it explores the impact of AI tools like GitHub Copilot on programming practices.
[24]
Enhancing Reusability in Microservice Architecture - IEEE Xplore
However, reusability in microservices remains a critical concern due to several challenges, including Code Duplication, Technology Heterogeneity, Service ...
[25]
On Technical Debt And Code Smells: Surprising insights from ...
Dec 23, 2021 · Code smells are signs of low-quality code. In this post, we explored scientific insights on code smells. One clear pattern from the studies we ...Missing: 2001 | Show results with:2001
[26]
What is Refactoring? - Agile Alliance
To download a free PDF copy of the Agile Manifesto and 12 Principles of ... Intermediate; knows and is able to remedy a broader range of “code smells” ...
[27]
[PDF] “Cloning Considered Harmful” Considered Harmful - PLG
“Cloning Considered Harmful” Considered Harmful. Cory Kapser and Michael W. Godfrey. Software Architecture Group (SWAG). David R. Cheriton School of Computer ...
[28]
Detecting and Analyzing Fine-grained Third-party Library ...
Sep 4, 2025 · As a lightweight language, Solidity does not have a unified way to manage third-party library (TPL) dependencies. Instead, the copy-and-paste ...Missing: external risks
[29]
[PDF] JC-Finder: Detecting Java Clone-based Third-Party Library ... - arXiv
Aug 4, 2025 · Oreo [72] utilizes a combination of machine learning, information retrieval, and software metrics to detect clones with high precision and ...
[30]
Surviving Software Dependencies - ACM Queue
Jul 8, 2019 · Software dependencies carry with them serious risks that are too often overlooked. The shift to easy, fine-grained software reuse has happened ...
[31]
A Survey of Software Clone Detection From Security Perspective
### Summary of Software Clone Detection Survey (Security Perspective)
[32]
Beyond Dependencies: The Role of Copy-Based Reuse in Open ...
The findings advocate for the development of better tools and infrastructure to manage copy-based reuse, including automated detection of security and legal ...
[33]
Consolidate Duplicate Conditional Fragments - Refactoring.Guru
Duplicate code is found inside all branches of a conditional, often as the result of evolution of the code within the conditional branches.
[34]
Java static code analysis
**Summary of RSPEC-1871: Duplicate Code in Conditional Branches**
[35]
[PDF] GUI Input and Event-Driven Programming - CS@Cornell
•GUI code responds to (and creates) events. • E.g., mouse button, keyboard ... •Event handlers can be registered with nodes that generate events: Button ...Missing: copy- | Show results with:copy-
[36]
Duplicate switch case — CodeQL query help documentation - GitHub
If two cases in a 'switch' statement are identical, the second case will never be executed. This most likely indicates a copy-paste error.
[37]
Code inspection: Duplicated sequential 'if' branches | JetBrains Rider
Mar 24, 2025 · This inspection detects consecutive if statements with identical bodies. Such redundancy negatively impacts code readability and ...
[38]
[PDF] patterns of cloning in software - PLG
Examples An example of experimental variation can be found in the Apache httpd web server. In the multi-process management subsystem, the subsystem worker was.
[39]
[PDF] Web-based Code Clone Detection System using Machine Learning 1
Firstly, removing uninteresting parts use to filter raw codes into a single language that can detect a clone. For example, some of JAVA code has SQL embedded.
[40]
Refactoring Detection in C++ Programs with RefactoringMiner++
Just as inheritance, template metaprogramming has the purpose of minimizing code duplication. Java does not offer templates, but generics instead. While these ...
[41]
Evaluating Code Clone Detection and Management
Jun 6, 2025 · Clone detection finds similar or repeating parts of software code, whether they are directly copied or only slightly altered. By pointing out ...Missing: variant | Show results with:variant
[42]
[PDF] Does Cloned Code Increase Maintenance Effort? - Chanchal Roy
Focusing on the negative impacts of code clones researchers suspect that code clones can possibly increase software maintenance effort and costs. However ...
[43]
An Empirical Study on the Impact of Duplicate Code - Hotta - 2012
May 28, 2012 · Their work is the first empirical evidence that a part of duplicate code increases the cost of source code modification. Table 1. Summarization ...
[44]
[PDF] The Role of Duplicated Code in Software Readability ... - DiVA portal
[23]did a comprehensive study on duplicated code, clone refactoring and clone tracking, which shows that clone codes have positive impact on software.Missing: prevalence | Show results with:prevalence
[45]
[PDF] Clones in Deep Learning Code: What, Where, and Why? - arXiv
For example, deep learning developers can clone models' architectures and model. (hyper)parameters settings or initialization for similar model implementations.
[46]
[PDF] CP-Miner: A Tool for Finding Copy-paste and Related Bugs in ...
In this paper we propose a tool, CP-Miner, that uses data mining techniques to efficiently identify copy-pasted code in large software including operating ...
[47]
https://www.baeldung.com/eclipse-refactoring
[48]
Finding Copy-Paste and Related Bugs in Large-Scale Software Code
In this paper, we propose a tool, CP-Miner, that uses data mining techniques to efficiently identify copy-pasted code in large software suites and detects copy ...Missing: examples | Show results with:examples
[49]
Refactoring in Eclipse | Baeldung
Jun 1, 2019 · Select the lines of code we want to extract; Right-click the selected area; Click the Refactor > Extract Method option. Eclipse refactor 20. The ...
[50]
Extract Method - Refactoring.Guru
How to Refactor · Create a new method and name it in a way that makes its purpose self-evident. · Copy the relevant code fragment to your new method.Inline Method · Replace Temp with Query · Відокремлення методу
[51]
Refactoring Opportunities That Will Boost The Quality Of Your Code
Apr 19, 2020 · Extract repetitive code into helper functions or use a decorator if you need to apply the same functionality to multiple functions or methods.
[52]
CCFinder: a multilinguistic token-based code clone detection system ...
This paper proposes a new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison.
[53]
[PDF] Towards Automated Refactoring of Code Clones in Object-Oriented ...
Jul 10, 2019 · We would argue that going with this “magic num- ber 6” eliminates a lot of harmful clones that should be refactored. For instance, a single 100 ...
[54]
Automate Away Duplicate Code: A Practical Guide
Aug 22, 2025 · Duplication creeps back if you're not watching. Surface metrics where people already look. Pull data from SonarQube every few minutes and push ...
[55]
https://eslint.org/docs/latest/rules/no-dupe-keys
[56]
Code clone detection software
Jul 20, 2015 · CloneDR typically finds 10+% duplicated code in software that is relatively well engineered. These numbers can be significantly larger in sloppy ...
[57]
no-duplicate-case - ESLint - Pluggable JavaScript Linter
The `no-duplicate-case` rule disallows duplicate test expressions in case clauses of switch statements, often caused by copied case clauses.Rule Details · When Not To Use It
[58]
no-dupe-keys - ESLint - Pluggable JavaScript Linter
Copy code to clipboard. Rule Details. This rule disallows duplicate keys in object literals. Examples of incorrect code for this rule: Open in Playground
[59]
A Deep Dive Into Clean Code Principles - Codacy | Blog
May 22, 2024 · This article will explore the details of clean code principles, including SOLID, DRY, and KISS, as well as their practical applications, real-world examples, ...
[60]
Duplicate code block detection - GitClear
In order to detect clone blocks without having access to the full repo source code, GitClear generates a one-way hash value to represent each changed line.
[61]
5 Practical Ways To Share Code: From NPM to Lerna And Bit
Feb 12, 2018 · Bit with NPM and Yarn. Bit speeds code sharing by combining the advantages of copy pasting and managed packages. Meaning, you can easily ...
[62]
Pair Programming vs. Code Reviews - Coding Horror
Nov 18, 2007 · ... reduces the likelihood of duplication/deviation and increases the chance of highly cohesive and lowly coupled solutions. I strongly suspect ...