Fact-checked by Grok 2 weeks ago

Artificial intelligence and copyright

Artificial intelligence and copyright addresses the intersection of copyright law with AI technologies, primarily involving disputes over the unauthorized use of protected works to train generative models and the eligibility of AI outputs for copyright protection.^[1] These issues have sparked extensive litigation since the proliferation of large language models and image generators in the early 2020s, pitting creators against tech developers over fair compensation and innovation incentives.^[2] Key controversies center on whether ingesting copyrighted materials constitutes infringement or qualifies as fair use under doctrines emphasizing transformative purpose and market effects.^[3] In 2025 rulings from California federal courts, judges determined that training AI on lawfully acquired books was transformative and did not harm original markets, granting fair use defenses to companies like Anthropic and Meta, but invalidated such claims for data from pirated sources.^[4]^[5] Concurrently, the first fully decided U.S. training case favored the copyright holder, highlighting judicial splits that may require appellate or legislative resolution.^[6] Over 40 lawsuits persist, including high-profile actions by authors against Anthropic—settled preliminarily for $1.5 billion—and artists in Andersen v. Stability AI, testing boundaries of liability for model outputs mimicking styles or specific works.^[7]^[8] For AI-generated content, the U.S. Copyright Office maintains that protection demands human authorship, excluding purely algorithmic outputs while allowing registration for human-authored elements like prompts, arrangements, or edits that demonstrate creativity.^[9]^[10] This stance, rooted in statutory requirements for originality from human intellect, underscores causal distinctions between machine replication and inventive contribution, influencing global policies amid calls for new licensing frameworks or exceptions.^[11]

Historical Context

Early Intersections Pre-2020

Early artificial intelligence systems, such as rule-based expert systems and rudimentary neural networks developed in the 1980s and 1990s, intersected minimally with copyright law. These technologies primarily generated outputs based on human-programmed algorithms and limited datasets, with any protectable expression attributed to the human developers rather than the machines themselves.^[12] Copyright implications were thus confined to the software code and human-curated inputs, without significant disputes over machine agency or outputs. Database protections emerged as a tangential concern for AI training data during this period. In the United States, the Supreme Court's decision in Feist Publications, Inc. v. Rural Telephone Service Co. (1991) rejected the "sweat of the brow" doctrine, holding that factual compilations like telephone directories lack the requisite originality for copyright protection beyond creative selection, coordination, or arrangement.^[13] This limited safeguards for raw datasets potentially used in early machine learning, emphasizing that mere effort in compilation does not confer rights. In contrast, the European Union's Directive 96/9/EC (1996) introduced a sui generis right protecting the investment in creating databases, offering broader recourse for substantial extractions that could apply to curated AI training corpora in the 1990s and 2000s. The advent of generative adversarial networks (GANs) in 2014 marked an initial escalation in technical capabilities for synthesizing images, prompting nascent discussions on copyright risks from training on potentially copyrighted visual data. However, pre-2020, these concerns remained largely academic, with no major litigation; proponents argued that model training transformed inputs into abstract parameters, akin to fair use precedents, while outputs were not direct copies. A key precedential analogy arose from Naruto v. Slater (2018), where the Ninth Circuit ruled that a macaque monkey lacked statutory standing to claim copyright in selfies it captured, as the U.S. Copyright Act extends authorship rights exclusively to humans.^[14] This decision underscored the human authorship requirement, later invoked in debates over AI-generated works lacking independent creative agency.^[15] Prior to widespread deep generative models, such cases highlighted foundational tensions without resolving AI-specific applications.

Rise of Generative AI and Initial Conflicts

The development of generative AI models accelerated in the late 2010s, transitioning from narrow task-specific systems to large-scale models capable of producing novel text, images, and other media from vast datasets scraped from the internet. OpenAI's GPT-2, released on February 14, 2019, exemplified this shift, trained on 40 gigabytes of internet text to predict subsequent words, which raised early questions about the sourcing and scale of web-derived training data amid broader concerns over potential misuse.^[16] This model underscored the reliance on publicly available online content, often including copyrighted material, without explicit permissions, setting the stage for escalating debates on data acquisition practices. By 2021, the introduction of text-to-image models like OpenAI's DALL-E, announced on January 5, intensified scrutiny, as these systems generated visuals mimicking artistic styles trained on billions of image-caption pairs harvested from the web. Artists began voicing concerns over unauthorized use of their works in training datasets, highlighting how such models could replicate or approximate copyrighted aesthetics without compensation. The scale of data requirements became starkly evident with datasets like LAION-5B, released in March 2022, comprising 5.85 billion CLIP-filtered image-text pairs drawn from Common Crawl web archives, primarily in English and multilingual sources.^[17] This dataset powered open-source models such as Stable Diffusion, launched by Stability AI in August 2022, which democratized access but amplified fears of mass infringement. The release of Stable Diffusion prompted immediate backlash from visual artists, who protested the scraping of their portfolios from platforms like DeviantArt without consent, arguing it devalued human creativity and flooded markets with low-cost imitations. Initial industry responses included developer assurances of transformative use in training, though opt-out mechanisms for web data remained limited and post-hoc, such as OpenAI's later policies for user-submitted content rather than retroactive web scrapes. These tensions culminated in early class-action suits, including one filed by artists against Stability AI, Midjourney, and DeviantArt in January 2023, alleging direct copying and dilution of original works through dataset ingestion.^[18] This period marked the pivot from technical innovation to public and legal conflicts, as the sheer volume of ingested data—often exceeding trillions of tokens or images—exposed systemic frictions between AI scalability and copyright holders' rights.

Core Legal Principles

Requirements for Copyright Protection

Copyright protection extends to original works of authorship fixed in a tangible medium of expression, as codified in statutes like 17 U.S.C. § 102(a) and aligned with the Berne Convention's minimum standards for literary and artistic works.^[19]^[20] The Berne Convention, administered by the World Intellectual Property Organization, requires signatory states to grant automatic protection without formalities upon creation, provided the work qualifies as original and embodies authorship.^[21] This framework emphasizes that protection arises from the act of creation itself, not registration or notice, fostering international reciprocity among over 180 member countries.^[20] Originality demands independent creation with at least a minimal degree of creativity, surpassing mere factual compilation or mechanical reproduction, as affirmed in U.S. Supreme Court precedents like Feist Publications, Inc. v. Rural Telephone Service Co., where sweat of the brow alone does not suffice.^[22] Fixation requires the work to be sufficiently permanent or stable to permit perception, reproduction, or communication, such as digital files or recordings.^[23] Authorship thresholds presuppose human intellectual contribution, distinguishing creative human agency from automated processes; tools like cameras or software can assist but do not independently author if outputs result from mechanical means without human creative control.^[24] In AI applications, these criteria underscore that systems functioning as mere reproducers—generating via algorithms without human-infused originality—fail to meet the authorship bar inherent to copyright's rationale of incentivizing human innovation.^[25] The idea-expression dichotomy further delineates protectable subject matter, safeguarding only the specific manner of expression while excluding underlying ideas, procedures, or patterns, as rooted in cases like Baker v. Selden (1879).^[26] This principle is particularly salient in AI contexts, where generative models derive from aggregated data patterns—often akin to unprotected ideas or functional elements—rather than unique expressions, limiting claims over stylistic mimicry or probabilistic outputs.^[27] Protection, once granted, remains time-limited under Berne's baseline of the author's life plus 50 years, ensuring eventual entry into the public domain to promote cumulative knowledge, a duration applied uniformly to qualifying AI-assisted works traceable to human origin.^[20]

Human Authorship and AI Agency

United States copyright law mandates that works must originate from human authorship to qualify for protection, a principle rooted in the Constitution's provision for promoting progress through authors' writings.^[24] The U.S. Copyright Office applies this requirement strictly, refusing registration for outputs generated solely by artificial intelligence systems without meaningful human creative involvement.^[24] This stance aligns with longstanding precedents excluding non-human creators, such as animals, emphasizing that copyright incentivizes human intellectual labor rather than automated processes.^[28] In Thaler v. Perlmutter, a 2023 district court decision affirmed by the D.C. Circuit in March 2025, the Office denied registration for an image autonomously produced by Thaler's "Creativity Machine" AI, ruling it ineligible due to the absence of human authorship.^[29]^[30] The courts upheld that AI, lacking the volition and originality inherent in human expression, cannot claim authorship, even if programmed to mimic creativity; Thaler conceded the work involved no "traditional human authorship."^[30] This decision critiques potential anthropocentric biases in law by tying protection to human-centric incentives, yet it reflects empirical reality: AI systems, driven by deterministic algorithms and statistical predictions from training data, exhibit no independent agency or causal intent beyond their human-engineered parameters.^[12] For hybrid works combining human and AI elements, copyrightability hinges on the degree of human creative control, evaluated case-by-case; protection extends only to human contributions meeting originality thresholds, such as substantial editing, arrangement, or selection that infuses personal expression.^[12]^[31] Simple prompts directing AI generation typically fall short, as they do not constitute authorship over the resulting output, which remains a product of the model's learned patterns rather than human origination.^[32] This demarcation preserves copyright's focus on rewarding human agency while acknowledging AI's role as an assistive tool, not an autonomous creator.^[33] From a philosophical and experimental perspective, some projects have begun to treat artificial intelligence as a structurally credited participant in authorship while explicitly accepting that copyright remains human-only. The Aisentica Research Group, for example, presents the AI-based identity Angela Bogdanova as a Digital Author Persona that functions as a named authorial entity but not a legal rights holder: the persona is registered in research infrastructure via an ORCID iD 0009-0002-6030-5730^[34] and a semantic specification deposited on Zenodo under DOI 10.5281/zenodo.15732480,^[35] while legal responsibility and copyright remain with the human initiators of the project. Such experiments do not contradict the human authorship requirement affirmed in cases like Thaler v. Perlmutter; instead, they explore how AI systems can be credited in metadata and bylines as non-human contributors within existing legal frameworks that still reserve authorship status to humans.

Fair Use, Transformative Use, and Exceptions

In the United States, the fair use doctrine under 17 U.S.C. § 107 provides a defense to copyright infringement by balancing four statutory factors: (1) the purpose and character of the use, including whether it is commercial or transformative; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work. This framework evaluates whether an unauthorized use, such as ingesting works into AI training datasets, justifies an exception by assessing its overall contribution to knowledge or creativity rather than mere replication. Applied to AI model training, the doctrine emphasizes transformation, where input data is processed to derive abstract statistical patterns encoded in model weights, rather than preserving literal copies.^[11] Empirical analyses demonstrate that training compresses vast datasets into compact parameter representations—for instance, models achieving compression ratios exceeding 200,000:1—yielding outputs that generalize patterns without retaining or reproducing original works verbatim, analogous to human cognition distilling insights from reading without memorizing texts.^[36] The first factor often weighs heavily in favor of AI developers due to the productive, non-substitutive nature of trained models, which enable novel applications; the third factor considers the necessity of broad data exposure for efficacy, while the fourth scrutinizes any demonstrable market harm, which doctrinal analyses tie to causal evidence of displacement rather than speculative fears.^[11]^[36] In the European Union, text and data mining (TDM) exceptions under Articles 3 and 4 of Directive (EU) 2019/790 permit reproductions of copyrighted works for computational analysis, explicitly encompassing AI training as an automated technique to extract information from texts or data. Article 3 mandates an exception for non-commercial scientific research, while Article 4 allows member states to extend it to commercial purposes, subject to rights holders' opt-out via machine-readable reservations; this balances innovation with control, recognizing TDM's role in deriving non-expressive insights without undermining the work's essence. The United Kingdom's Copyright, Designs and Patents Act 1988 (CDPA) includes a TDM exception in section 29A, limited to non-commercial research purposes where lawful access is obtained, without a parallel broad commercial provision post-Brexit divergence from EU harmonization. This narrower scope requires AI entities to navigate licensing or research exemptions, with ongoing consultations exploring expansions but prioritizing evidence-based assessments of transformative processes over unrestricted data flows.^[37]

Copyrightability of AI-Generated Outputs

United States

In the United States, copyright protection for outputs generated by artificial intelligence requires demonstrable human authorship, as established by longstanding precedents and recent guidance from the U.S. Copyright Office.^[12] Purely AI-generated works, lacking sufficient human creative input, are ineligible, reflecting the principle that copyright incentivizes human intellectual labor rather than automated processes.^[38] This stance aligns with cases like Naruto v. Slater (2018), where a photograph taken by a monkey was denied protection due to the absence of human authorship, underscoring that non-human agency cannot originate protectable expression.^[1] The U.S. Copyright Office formalized its approach in March 2023 guidance, stating that applicants must disclaim AI-generated elements in registrations and that intermingled human-AI works may receive annotations limiting protection to human contributions.^[38] This modular examination evaluates components separately: for instance, in the February 2023 review of the comic Zarya of the Dawn, registration was granted for the human-authored text and overall arrangement but denied for images created via Midjourney prompts, as the user's textual inputs did not constitute authorship-level control over the visual outputs.^[39] Subsequent decisions, including rejections through 2024, have consistently applied this threshold, denying full registrations for works where AI dominated the creative process.^[40] A January 2025 Copyright Office report reaffirmed this framework, concluding that existing law suffices without new legislation, as human oversight—such as iterative modifications or selections—can render AI-assisted elements protectable if they reflect the author's originality.^[12] Outputs from generative AI alone fail this test, as machines lack the intent and creativity inherent to authorship under the Copyright Act.^[12] This policy preserves incentives for human innovation by distinguishing AI's derivative capabilities from original expression, avoiding dilution of protections that could flood registries with uncreative machine products. Empirical evidence supports that AI tools primarily augment human creators rather than supplant them: a 2025 study found AI-assisted artists produce more novel artifacts through enhanced productivity, with human oversight driving creative expansion.^[41] Denying protection to unaided AI outputs thus channels investment toward human-AI collaboration, fostering sustained creative output without undermining the constitutional purpose of copyright.^[42]

European Union

In the European Union, copyright protection hinges on the criterion of originality, defined as the author's "own intellectual creation" reflecting free and creative choices that imprint the author's personality, as articulated by the Court of Justice of the European Union (CJEU) in Infopaq International A/S v Danske Dagblades Forening (Case C-5/08, July 16, 2009). This human-centric standard, reaffirmed in subsequent rulings like Painer v Standard VerlagsGmbH (Case C-145/10, December 1, 2011), precludes protection for works lacking demonstrable human authorship and creative agency. Consequently, purely AI-generated outputs—produced without meaningful human intervention in the creative process—fail to meet this threshold and remain ineligible for copyright, positioning the EU framework in tension with the autonomous capabilities of advanced generative models that mimic but do not replicate human intellectual effort.774095_EN.pdf) The European Union Intellectual Property Office (EUIPO) reinforced this stance in its 2025 study on the development of generative artificial intelligence from a copyright perspective, conducted between September 2024 and March 2025, which analyzed technical, legal, and economic dimensions of AI outputs.^[43] The study explicitly determined that AI-generated content devoid of human creative input does not qualify as an original work under EU law, recommending against the creation of sui generis rights for such outputs and instead advocating for protection only where human oversight provides the requisite intellectual contribution, such as through iterative prompting, editing, or selection that evidences the user's personal stamp.^[44] This approach prioritizes preserving incentives for human creativity amid AI proliferation, though it may constrain innovation by denying proprietary safeguards to fully automated generations, potentially exposing them to unrestricted replication.^[45] While EU directives like the InfoSoc Directive (2001/29/EC) and the Digital Single Market Directive (2019/790) harmonize economic rights across member states, variations persist in moral rights, which protect attribution and integrity. In France, where moral rights are perpetual, inalienable, and imprescriptible under Article L.121-1 of the Intellectual Property Code, their application to AI-assisted works underscores challenges in attributing authorship, as AI outputs may dilute or obscure the human element essential for moral claims. French courts have historically emphasized the author's personality in originality assessments, further entrenching skepticism toward AI autonomy and requiring evidence of human "stamp" for any protectable interest, even in hybrid creations.^[46] This member-state divergence highlights ongoing interpretive flux, with no uniform EU mechanism yet for certifying human involvement in AI processes to bridge gaps between strict eligibility rules and practical deployment.

United Kingdom

Under the Copyright, Designs and Patents Act 1988 (CDPA), section 9(3), the United Kingdom extends copyright protection to literary, dramatic, musical, or artistic works that are computer-generated, attributing authorship to the person who made the arrangements necessary for the computer's creation of the work.^[47] Such protection subsists for a term of 50 years from the end of the calendar year in which the work was made, shorter than the standard life-plus-70-years duration for human-authored works.^[47] This statutory fiction contrasts sharply with the United States' requirement for demonstrable human authorship, as affirmed by the U.S. Copyright Office in denials of registration for purely AI-generated outputs, and the European Union's emphasis on the author's "own intellectual creation," which courts have interpreted to exclude works lacking significant human input.^[48]^[49] Despite this provision, practical reliance on section 9(3) for AI-generated works appears minimal, with no widely documented copyright infringement claims successfully invoking it as of 2025, suggesting the protection functions more symbolically than substantively in incentivizing or safeguarding AI outputs.^[37] The absence of empirical evidence for robust enforcement or commercial uptake raises questions about its efficacy, as the arranger's role—often involving minimal human direction of complex algorithms—may not satisfy underlying principles of originality or creativity embedded in broader copyright doctrine.^[49] In response to generative AI advancements, the UK Intellectual Property Office initiated a consultation on December 17, 2024, titled "Copyright and Artificial Intelligence," soliciting stakeholder views on retaining, clarifying, or abolishing section 9(3) protections.^[37] The government document highlights that the regime "has little (if any) effect on the production of such works," potentially justifying reform to better align with human-centric incentives while avoiding unintended barriers to innovation.^[37] Post-Brexit divergence from EU directives affords the UK independent flexibility to recalibrate this framework, such as by narrowing or eliminating computer-generated work protections to prioritize human-authored content without supranational constraints.^[50]

Other Jurisdictions

In China, courts have granted copyright protection to AI-generated images where human users demonstrate sufficient intellectual input, such as through prompt selection and iterative adjustments. On November 27, 2023, the Beijing Internet Court ruled in Li v. Liu that an image of a sprite-like virtual character, created via Stable Diffusion based on the plaintiff's detailed prompts and parameter tweaks, qualified as an original work under Article 3 of the Copyright Law, attributing authorship to the human for their "intellectual investment" in guiding the AI process.^[51]^[52] This decision, diverging from stricter human-centric tests elsewhere, emphasizes evidentiary proof of human creativity to overcome claims of mere automation, as reaffirmed in subsequent 2025 rulings requiring documentation of such efforts.^[53] Japan adopts a flexible stance treating AI as an assistive tool, vesting authorship in humans who direct its use with creative intent. Japanese copyright law under the 1970 Act does not recognize AI as an author; instead, protection extends to outputs reflecting human originality in inputs like data selection or modifications.^[54] The Agency for Cultural Affairs' March 2024 "AI and Copyright Guidelines" clarify that AI-generated works are copyrightable if they embody human intellectual creation, encouraging innovation by avoiding rigid authorship barriers while cautioning against outputs infringing prior works.^[55]^[56] Registration of such works is recommended to establish human involvement and evidentiary chains. Singapore's Copyright Act 2021 extends protection to computer-generated works, including AI outputs, by attributing ownership to the individual who arranged the creation process, irrespective of direct human authorship.^[57] This provision, modeled on Commonwealth approaches, facilitates pro-innovation outcomes by safeguarding hybrid efforts without mandating predominant human control, though courts assess originality case-by-case to exclude trivial automation.^[58] In India, the Copyright Act 1957 mandates human authorship for protection, precluding copyright in purely AI-generated outputs lacking demonstrable human creative contribution. Emerging judicial interpretations, influenced by constitutional emphasis on original skill and judgment, deny standalone AI claims but permit registration for hybrid works where humans provide substantial inputs like curation or refinement, as seen in ongoing policy discussions adapting to generative tools.^[59]^[60] This framework balances incentives for technological advancement with safeguards for traditional creators, though no landmark 2024 ruling has fully tested pure AI denials in court.

Ingestion of Copyrighted Data for AI Training

United States Fair Use Applications

In the United States, the fair use doctrine under Section 107 of the Copyright Act permits limited use of copyrighted material without permission, evaluated through four factors: the purpose and character of the use (including whether it is transformative and commercial), the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect on the potential market for the original work. For AI training, ingestion of copyrighted works constitutes intermediate copying to derive statistical patterns for model weights, which courts have analogized to transformative processes that do not supplant the originals.^[11] This application emphasizes empirical transformation, as training converts raw data into non-expressive latent representations, enabling novel outputs rather than reproduction, thereby supporting innovation in machine learning without necessitating market harm.^[61] A key precedent is Authors Guild v. Google (2015), where the Second Circuit held that Google's scanning of millions of entire books for a searchable index, displaying only snippets, qualified as fair use due to its transformative purpose in facilitating access to knowledge without competing with sales of full works.^[62] Courts have extended this analogy to AI training, noting similarities in full-text ingestion for indexing-like pattern extraction, where the output—probabilistic generations—differs fundamentally from input copies, weighing in favor of the first fair use factor even for commercial models.^[63] Subsequent rulings, such as in cases involving Anthropic's training on books, have affirmed fair use for generative AI when the process yields non-substitutive models, distinguishing it from mere duplication.^[64] Ongoing litigation, including The New York Times Co. v. OpenAI filed on December 27, 2023, tests these principles, with plaintiffs alleging unauthorized ingestion of articles for training large language models constitutes infringement via intermediate copying.^[65] As of March 2025, a federal judge denied OpenAI's motion to dismiss core claims, allowing fair use defenses to proceed to discovery, though no summary judgment on training ingestion has been granted, reflecting judicial caution on blanket rulings.^[66] Similar suits, like those against Anthropic, have seen mixed outcomes, with some courts in 2025 finding training fair use under factor one for its innovative, non-expressive ends, while others scrutinize market effects from scaled data use.^[67] The U.S. Copyright Office's May 2025 report on generative AI training rejects per se exemptions or prohibitions, advocating case-by-case fair use analysis, but highlights that transformative ingestion often favors allowance when it does not enable verbatim regurgitation or displace licensing markets.^[11] The report notes intermediate copying precedents permit such uses if incidental to technological advancement, as in software reverse engineering, provided outputs avoid substantial similarity.^[68] Narrow interpretations requiring licensing for training data face practical barriers, as models like GPT-4 were trained on approximately 13 trillion tokens from diverse web sources, rendering comprehensive permissions infeasible due to identification, negotiation, and transaction costs across billions of fragments.^[69] Proponents argue that mandating licenses would stifle AI progress by entrenching incumbents with existing data troves, while empirical evidence shows training enhances public access to knowledge analogs, akin to search engines, without empirically harming creator markets when paired with opt-outs or filters.^[70] Critics of expansive fair use counter that viable licensing markets are emerging for curated datasets, potentially weighing against the fourth factor in commercial contexts, though scale challenges persist for broad web corpora.^[71]

EU Text and Data Mining Exceptions

The Directive on Copyright in the Digital Single Market (EU) 2019/790 established two text and data mining (TDM) exceptions in Articles 3 and 4.^[72] Article 3 provides a mandatory exception allowing research organizations and cultural heritage institutions to perform TDM, including reproductions and extractions of works or other subject matter, for scientific research purposes without needing permission from rights holders, provided lawful access to the works is obtained.^[72] Article 4 introduces a broader, mandatory exception applicable to any entity engaging in TDM for any purpose, but it permits rights holders to reserve their rights through machine-readable means in the work's access conditions or via contractual agreements, effectively allowing an opt-out for commercial or non-research uses.^[72] These provisions took effect after transposition into national laws by June 7, 2021, with member states required to maintain the exceptions alongside existing copyright limitations.^[73] The opt-out mechanism under Article 4 has drawn criticism for imposing disproportionate burdens on creators, particularly smaller ones lacking resources to implement technical reservations like metadata tags or updated robots.txt files across platforms.^[74] Enforcement relies on voluntary compliance by miners, but large-scale web scraping often bypasses or ignores such signals, rendering the system ineffective against automated, high-volume operations common in AI training.^[75] A July 2025 European Parliament study on generative AI and copyright highlighted this mismatch, noting that AI model training involving billions of data points exceeds the scope of current TDM exceptions, which were designed for narrower, research-oriented activities rather than commercial generative systems.^[76] The study recommends reforming the opt-out regime to address these enforcement gaps and adapt to AI's data demands, arguing that the framework fails to balance innovation with rights protection amid pervasive scraping practices.^[77] Empirical assessments indicate these TDM constraints contribute to slower AI advancement in the EU relative to the US, where more permissive data access fosters faster development.^[78] A 2025 report found nearly 60% of European tech startups facing product delays due to regulatory hurdles, including data usage restrictions, compared to 44% in the US, exacerbating Europe's lag in AI adoption with only 44% of large firms actively deploying AI versus higher US rates.^[78]^[79] Competitiveness analyses attribute part of this disparity to the EU's opt-out burdens, which deter domestic AI firms from scaling training datasets efficiently, potentially favoring non-EU actors unencumbered by similar reservations.^[80] This structure risks anti-competitive effects by asymmetrically burdening fragmented creator opt-outs while permitting unchecked access for well-resourced miners, undermining incentives for EU-based innovation.^[74]

UK Research and Commercial Exceptions

The United Kingdom's Copyright, Designs and Patents Act 1988 (CDPA) includes a specific exception under section 29A permitting the copying of copyright works for the purposes of text and data mining (TDM), defined as computational analysis of works to identify patterns or insights, provided the activity is conducted for non-commercial research and the copyist has lawful access to the work. This exception, introduced in 2014, applies to a range of works including literary, artistic, and database-protected materials but excludes computer programs and artistic works not analyzed computationally, and it does not extend to commercial applications such as training proprietary AI models for profit.^[37] For commercial TDM, AI developers currently rely on implied licenses, explicit permissions from rights holders, or the narrower section 28A exception for transient copies essential to technological processes, which courts have interpreted permissively in caching contexts but not as a blanket authorization for large-scale ingestion.^[81] Post-Brexit, the UK Intellectual Property Office (IPO) has pursued a pro-innovation stance on TDM to diverge from the European Union's more restrictive framework, initially proposing in 2022 to expand section 29A into a broad exception applicable to any purpose, including commercial AI training, without a reservation of rights allowing opt-outs by creators. This reflected evidence from the UK's AI sector, which grew to contribute £72 billion to the economy by 2023 with over 3,300 firms, attributing competitiveness partly to flexible copyright interpretations that avoided stifling data access unlike in more cautious jurisdictions.^[82] However, following stakeholder feedback on risks to creators' licensing markets, the government reversed course in early 2023, halting the expansion to preserve incentives for human-generated content. In response to evolving AI litigation and calls for clarity, the IPO launched a consultation on December 17, 2024, titled "Copyright and Artificial Intelligence," proposing to broaden the TDM exception under the CDPA to encompass commercial uses by default while introducing an opt-out mechanism for rights holders to reserve their works from unlicensed mining, mirroring the EU's optional commercial TDM provision but tailored to UK priorities.^[37] The consultation emphasized retaining this flexibility to prevent a flood of US-style infringement suits, as seen in cases like New York Times v. OpenAI, by providing statutory safe harbors rather than relying on case-by-case fair dealing assessments, which UK courts have deemed unsuitable for automated AI processes.^[83] As of October 2025, the proposals remain under review, with no legislative amendments enacted, though the IPO has signaled intent to legislate by 2026 to sustain the UK's AI leadership, evidenced by a 2024-2025 surge in AI investments reaching £2.5 billion amid global uncertainties.^[84] This balanced approach contrasts with the EU's hesitancy in fully mandating commercial exceptions without opt-outs, potentially enabling UK firms to scale training datasets more efficiently while addressing creator concerns through targeted reservations.^[85]

Challenges in Enforcement and Opt-Outs

Enforcing restrictions on the ingestion of copyrighted materials for AI training encounters significant practical barriers, primarily due to the immense scale of datasets involved. Modern large language models are typically trained on corpora comprising trillions of tokens derived from billions of web pages, documents, and images, many of which are copyrighted without explicit permission.^[86] Granular licensing for such volumes is infeasible, as negotiating individual agreements for billions of works would entail prohibitive transaction costs and administrative overhead, rendering comprehensive enforcement economically unviable.^[87] Opt-out mechanisms, such as modifications to robots.txt files to block AI crawlers, lack legal binding force and rely on voluntary compliance by developers.^[88] ^[89] Several AI firms have been reported to disregard these directives using stealth crawlers or alternative scraping methods, undermining their efficacy as a control measure.^[90] Even when respected during initial data collection, opt-outs prove causally limited post-training: embedded knowledge from excluded sources cannot be surgically removed from pre-trained weights without retraining the entire model, which is computationally prohibitive for systems requiring petabytes of data.^[91] Selective data exclusion or attempts to honor opt-outs can degrade model performance through mechanisms akin to poisoning attacks, where even a small fraction of altered or withheld samples disrupts generalization. Research demonstrates that introducing as few as 500 poisoned documents into training sets can reliably induce targeted misbehavior in large language models of varying sizes, with effects persisting across scales due to the models' reliance on distributional patterns in vast datasets.^[92] This vulnerability highlights the technical fragility of opt-outs: enforcing exclusions risks introducing imbalances or noise that mimic poisoning, reducing overall efficacy without fully preventing unauthorized ingestion by non-compliant actors. From a foundational perspective informed by empirical scaling laws, data abundance is a causal prerequisite for achieving high performance in transformer-based models, as optimal training balances compute between parameters and tokens rather than skimping on the latter. The Chinchilla findings establish that, for a fixed compute budget, performance scales predictably with data volume—using approximately 20 tokens per parameter yields compute-optimal results, far exceeding prior under-emphasis on data in models like GPT-3.^[86] Imposing widespread opt-outs or exclusions thus risks sub-optimal training regimes, where reduced data quantity leads to diminished capabilities, underscoring the tension between enforcement ideals and the empirical necessities of general AI development.^[93]

Liability for AI Outputs Infringing Copyright

Direct Infringement Theories

Direct infringement theories in AI copyright disputes center on whether generated outputs unlawfully reproduce or create derivative works substantially similar to protected originals, requiring proof of access to the work and copying of expressive elements. Plaintiffs must demonstrate that AI outputs capture protected expression beyond mere ideas or facts, with courts evaluating factors like the amount copied and overall similarity.^[94] A primary theory involves literal regurgitation, where AI models output verbatim or near-verbatim excerpts from copyrighted sources, constituting direct reproduction. In Concord Music Group v. Anthropic (filed 2023), plaintiffs alleged Anthropic's Claude model regurgitated copyrighted song lyrics in responses, prompting claims of unauthorized copying rather than transformative use. Such regurgitation typically occurs under targeted prompting exploiting model memorization, though developers contend it affects a negligible fraction of outputs and does not reflect systemic infringement.^[94] In contrast, stylistic imitation theories assert that AI replicates an artist's or author's distinctive aesthetic without literal copying of specific works, potentially infringing if outputs evoke substantial similarity in protected expression. The Andersen v. Stability AI litigation (filed 2023, Northern District of California) exemplifies this, where artists including Sarah Andersen claimed Midjourney and Stability AI generated images mimicking their unique line work and styles, leading Judge William Orrick to deny dismissal of direct infringement claims in August 2024 on grounds that alleged copying exceeded fair data use. However, U.S. copyright doctrine generally precludes protection for styles or techniques absent copying of particular expressions, complicating such claims absent evidence of specific work derivation.^[94]^[95] The market harm factor under fair use analysis weighs against infringement defenses if outputs displace demand for originals, yet empirical observations indicate limited substitution for novel AI syntheses. Market data post-AI entry shows expanded total creative supply, with generative tools enabling derivative applications like rapid prototyping that do not erode core licensing revenues, though some artist segments report reduced human-generated sales amid heightened competition. Defendants argue AI fosters ancillary markets without primary harm, supported by lack of widespread evidence that routine outputs supplant originals.^[96]^[95] Mitigations against output infringement include prompt engineering, such as chain-of-thought or task-specific instructions, which empirical tests show reduce image similarity to training data in diffusion models by promoting abstraction over memorization. AI providers also deploy output filters and safeguards to block high-similarity generations, further diminishing regurgitation risks in deployed systems.^[97]

Contributory and Vicarious Liability

Contributory liability in the context of generative AI requires proof that a provider had knowledge of specific infringing activity and materially contributed to it, either through inducement or provision of means for infringement.^[98] For AI systems, this doctrine draws analogies to cases like MGM Studios v. Grokster, where active promotion of infringing uses triggered liability, but differs fundamentally due to the probabilistic, user-driven nature of outputs.^[99] Providers argue that general-purpose tools like language models lack the intent to induce infringement, as outputs emerge from user prompts rather than predetermined facilitation of copies, making antecedent knowledge of specific violations rare and difficult to attribute beyond user actions.^[100] Expansive applications, however, risk imposing liability for unforeseeable user misuse, akin to holding knife manufacturers accountable for crimes, which overlooks causal distinctions between tool provision and directed harm. Vicarious liability imposes secondary responsibility on providers deriving direct financial benefit from infringement with the right and ability to supervise it.^[101] In AI deployments, revenue from subscriptions or usage fees constitutes potential benefit, but control is contested: systems lack volitional selection of infringing content, with outputs shaped by transient prompts rather than supervised repositories.^[102] Defenses emphasize user agency, as in models where prompts dictate generation—such as those prioritizing response to specific queries without pre-filtered infringing templates—shifting volition to end-users and undermining supervision elements.^[103] Courts have not uniformly extended this doctrine to generative technologies, recognizing that blanket vicarious theories fail to account for decentralized computation, where infringement arises sporadically from billions of interactions rather than centralized oversight failures. The decentralized architecture of generative AI—producing novel syntheses on demand—complicates secondary liability enforcement, contrasting with centralized piracy platforms hosting static files.^[104] Unlike file-sharing services amenable to takedown notices, real-time output generation defies proactive policing without embedding prohibitive computational overhead or preemptive content filters that could neuter utility. DMCA safe harbors, designed for passive hosting, face scrutiny in 2025 for AI contexts, as provisions shielding providers from user infringements may not extend seamlessly to systems enabling dynamic creations, prompting debates on eligibility absent designated agents for notices.^[105] Broad secondary liability risks chilling deployment by mandating infeasible monitoring regimes, empirically unproven to curb infringement while causally impeding scalable innovation, as evidenced by the technology's reliance on uncurated training for emergent capabilities rather than targeted reproduction.^[11]

Defenses and Mitigations

AI developers have implemented content filters to detect and block outputs that risk infringing copyrights, particularly by preventing verbatim regurgitation of training data. For instance, OpenAI deploys safeguards in models like GPT series to mitigate regurgitation attacks, where prompts attempt to elicit memorized copyrighted material.^[106] Similarly, Microsoft Azure AI incorporates a Protected Material Detection Filter that scans large language model outputs for known protected content, flagging potential matches before delivery.^[107] These filters operate by comparing generated text against databases of copyrighted works or using probabilistic thresholds to suppress high-similarity outputs, reducing direct infringement liability.^[108] Watermarking techniques propose embedding imperceptible signals into AI-generated content to verify origins and facilitate infringement detection. Proposals include altering token probability distributions in text generation to create detectable statistical anomalies, or adding hidden patterns in images and audio that survive minor edits.^[109] Such methods aim to enable rights holders to trace infringing AI outputs back to generative sources, though they require widespread adoption and robustness against removal attempts.^[110] Terms of service for major AI platforms shift responsibility for output misuse to users, requiring them to ensure generated content does not infringe copyrights. Providers like OpenAI and others stipulate that users bear liability for applications of outputs, disclaiming provider warranties against infringement and mandating user compliance with intellectual property laws.^[111] This contractual allocation aligns with precedents holding users accountable for directing tools toward infringing ends, insulating developers from vicarious claims where outputs are user-prompted.^[112] Empirical audits of deployed models demonstrate low rates of verified infringement when mitigations are active. Studies testing frontier large language models, including those from OpenAI, find that content filters significantly reduce verbatim regurgitation of copyrighted articles, with success rates in evasion dropping below detectable thresholds in controlled prompts.^[106] Independent evaluations confirm that filtered models exhibit minimal unprompted copying, supporting claims of effective mitigation in production environments over raw training vulnerabilities.^[108]

Key Litigation

Landmark US Cases

In Andersen v. Stability AI Ltd., filed on January 13, 2023, in the U.S. District Court for the Northern District of California, visual artists Sarah Andersen, Kelly McKernan, and Karla Ortiz alleged that Stability AI, Midjourney, and DeviantArt infringed copyrights by training image-generating AI models on datasets including billions of copyrighted images scraped from the internet without permission. The court granted in part defendants' motion to dismiss on October 30, 2023, rejecting claims lacking specific allegations of output copying but allowing direct infringement and related claims to proceed, emphasizing the need for evidence of substantial similarity between inputs and AI-generated outputs. As of October 16, 2025, discovery continues without class certification, with parties reporting completed negotiations on protective orders but no resolution on fair use motions, which defendants argue apply due to the transformative nature of latent model representations that do not reproduce originals.^[113]^[114] Bartz v. Anthropic PBC, initiated in August 2024 in the U.S. District Court for the Northern District of California by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, claimed Anthropic infringed copyrights by ingesting pirated books into training datasets for its Claude models. On June 25, 2025, Judge William Alsup granted partial summary judgment for Anthropic, ruling that the company's ingestion and internal use of copyrighted texts for model training constituted fair use under 17 U.S.C. § 107, as the process created non-expressive statistical models without market substitution for originals, prioritizing the transformative purpose over potential licensing harms. Despite this defense success, the parties settled for $1.5 billion on September 5, 2025, with preliminary court approval on September 25, 2025, allocating funds to class members (authors, publishers, and estates) whose works appeared in the ingested datasets, while requiring Anthropic to delete specified infringing copies and implement opt-out mechanisms, reflecting pragmatic resolution amid ongoing liability risks for outputs.^[115]^[116] Other 2025 rulings show mixed progress for defendants on transformative use arguments. In Thomson Reuters Enterprise Centre GmbH v. ROSS Intelligence Inc., a February 11, 2025, Delaware federal court decision rejected fair use for training a legal AI on Westlaw headnotes and content, finding the purpose commercial and non-transformative, with outputs competing in the licensing market, though this targeted specialized databases unlike general-purpose models. Conversely, partial dismissals in Tremblay v. OpenAI Inc. (filed June 28, 2023) upheld direct infringement claims for training on novels but rejected vicarious liability and DMCA violations lacking pleaded facts, allowing OpenAI to advance fair use defenses centered on intermediate copying for new expressive capabilities. These outcomes underscore empirical variances in fair use application, favoring AI developers where training yields non-competitive, derivative technologies supported by precedents like Authors Guild v. Google (2015).^[117]^[118]

International Disputes

In jurisdictions outside the United States, disputes over artificial intelligence and copyright often arise under regimes lacking flexible fair use doctrines, instead relying on narrowly defined exceptions such as text and data mining (TDM) provisions that permit limited reproductions for analysis but require opt-out mechanisms for commercial uses or exclude certain acts outright. These frameworks, implemented via the EU's 2019 Directive on Copyright in the Digital Single Market (DSM Directive) and analogous UK laws post-Brexit, have led to litigation emphasizing unauthorized copying during AI model training, with courts scrutinizing whether scraping datasets constitutes infringement despite exceptions. Outcomes tend to favor AI developers in non-commercial research contexts but impose higher compliance burdens, including prospective opt-outs, slowing innovation compared to U.S. approaches.^[76] A prominent UK case is Getty Images (US) Inc. v Stability AI Ltd., filed in January 2023 in the High Court of England and Wales, alleging infringement of over 12 million Getty photographs, captions, and metadata used to train Stability AI's Stable Diffusion model. Getty claimed primary infringement under section 17 of the Copyright, Designs and Patents Act 1988 (CDPA) for unauthorized copying during training, as well as secondary infringement by users generating outputs resembling Getty works, alongside database right and trademark violations. Stability AI defended by arguing the acts fell outside protected rights or were licensed implicitly, but in a January 2025 judgment, the court struck out certain database claims while allowing copyright and passing-off allegations to proceed to a June 2025 trial, highlighting the absence of broad fair use defenses under UK law.^[119]^[120] In the EU, courts have grappled with TDM exceptions under Articles 3 and 4 of the DSM Directive, which allow reproductions for scientific research (non-opt-out) and commercial purposes (with opt-out rights), but disputes center on whether AI training qualifies and if opt-outs bind non-EU entities. A landmark German decision came in September 2024 from the Hamburg Regional Court in LAION e.V. v Robert Kneschke (case 310 O 227/23), where photographer Kneschke sued the non-profit LAION for reproducing his image in an AI training dataset; the court dismissed the claim, ruling the act fell under the non-commercial research exception in Section 60d of the German Copyright Act, as opt-out reservations apply only to commercial TDM under Section 60e and not scientific uses. This interpretation, while affirming exceptions for open datasets like LAION-5B, has drawn criticism for potentially underprotecting creators against downstream commercial AI applications, prompting further suits.^[121]^[122] Other EU proceedings include a March 2025 French lawsuit by authors and publishers against Meta Platforms, accusing the company of unlawfully scraping copyrighted books and articles to train its Llama models without permission or remuneration, invoking moral rights and reproduction prohibitions under French intellectual property code. These cases underscore slower judicial timelines—often exceeding a year for initial rulings—and elevated barriers for AI firms, as mandatory opt-out infrastructures and territorial enforcement complicate global training practices, contrasting with U.S. litigation's emphasis on transformative use.^[123]^[124]

Settlements and Outcomes

In the landscape of artificial intelligence and copyright disputes, licensing agreements have emerged as a primary mechanism for resolution, enabling AI developers to access training data through compensated arrangements rather than protracted litigation. OpenAI's 2023 agreement with Shutterstock, extended into a six-year partnership, granted the AI firm rights to Shutterstock's vast library of images, videos, and music for model training, while providing Shutterstock with revenue from AI-generated outputs integrated into its platform; this deal alone propelled Shutterstock's AI licensing revenue to $104 million in 2023, with projections reaching $138 million for 2024.^[125]^[126] Similar pacts, such as OpenAI's deals with news outlets including The Associated Press and Axel Springer, underscore a market-oriented approach where content owners monetize their works proactively, averting the uncertainties of court rulings.^[127] Court-supervised settlements remain limited, with only two AI copyright lawsuits fully resolved through settlement as of October 2025, often under confidential terms that prioritize financial compensation over doctrinal precedents.^[66] A proposed class-action settlement between Anthropic and authors representing roughly 500,000 copyrighted works, valued at a minimum of $1.5 billion (equating to about $3,000 per work), aimed to address training data ingestion but was rejected by a federal judge in 2025 for inadequate notice and opt-out provisions, highlighting judicial wariness toward blanket resolutions without robust procedural safeguards.^[128]^[129] Mid-2025 judicial outcomes frequently featured partial dismissals, curtailing ancillary claims like those under the Digital Millennium Copyright Act while preserving core direct infringement allegations, thereby constraining potential damages exposure for defendants. In Thomson Reuters v. Ross Intelligence, a February 2025 ruling granted partial summary judgment to the plaintiff on direct copyright infringement for the AI firm's verbatim copying of legal database content to train models, rejecting fair use defenses at that stage but deferring quantification of harm.^[130] Such rulings, echoed in dismissals of secondary liability theories in other cases, signal a pragmatic narrowing of disputes toward verifiable copying harms rather than expansive theories, fostering incentives for voluntary opt-in licensing over outright prohibitions on AI development.^[131]

Policy Debates and Viewpoints

Pro-Innovation Perspectives on Minimal Restrictions

Advocates for minimal copyright restrictions on AI emphasize that copyright functions as a limited-duration monopoly designed to incentivize creation while ultimately promoting broader progress of science and useful arts, as articulated in the U.S. Constitution. Extending protections to prohibit AI training on public-domain or lawfully accessed works risks overreach, potentially mirroring historical missteps where incumbents sought to block transformative technologies like photocopying or video recording, only for courts to affirm fair use and enable innovation. For instance, the U.S. Copyright Office has noted that copyright law has repeatedly adapted to new technologies—such as player pianos in the early 20th century and digital reproduction in the late 20th—without imposing outright bans on intermediate copying, thereby balancing incentives with technological advancement.^[11]^[11] Empirical studies demonstrate that generative AI significantly accelerates content production and enhances creative outputs, suggesting net benefits for creators through increased efficiency and volume of work. In controlled experiments, generative AI tools improved business professionals' task throughput by an average of 66% across realistic scenarios, including writing and analysis relevant to creative fields. Similarly, access to AI-generated ideas elevated story evaluations for novelty and enjoyment, particularly aiding less inherently creative individuals, while text-to-image AI raised human creative productivity by 25% and boosted output value over time. These gains lower barriers to entry, enabling more diverse and abundant content generation that outweighs displacement concerns in aggregate.^[132]^[133]^[134] From a market-oriented viewpoint, imposing stringent government mandates on AI data usage distorts efficient resource allocation, whereas voluntary licensing arrangements are emerging organically to compensate rights holders without regulatory coercion. Examples include agreements between AI developers and sectors like music and stock photography, where centralized rights management facilitates deals for training data access. Organizations such as the Copyright Clearance Center highlight how collective licensing models enable scalable, market-driven solutions for generative AI development, fostering innovation while providing revenue streams. This approach aligns with historical patterns where private negotiations, rather than prohibitions, resolved tensions between new tech and IP, as seen in the proliferation of licensing post-VCR and digital shifts.^[135]^[136]^[70]

Creator Protection and Moral Rights Arguments

Advocates for bolstering creator protections assert that training generative AI models on vast datasets of copyrighted works without permission or remuneration amounts to systemic exploitation, depriving authors and artists of control over their intellectual labor and potential licensing revenues.^[137] This perspective frames such practices as a form of uncompensated extraction, where AI firms profit from creators' outputs while externalizing costs onto individuals whose works fuel model development.^[138] In European jurisdictions emphasizing moral rights, particularly France, protectionist arguments invoke the inalienable right to the integrity of authorship, contending that AI-generated content imitating an artist's distinctive style constitutes a distortion or unauthorized extension of their oeuvre, potentially damaging professional reputation irrespective of direct copying.^[139] ^[140] These rights, enshrined in frameworks like the Berne Convention and national laws, prioritize the creator's personal connection to their work over purely economic considerations, with calls for opt-out mechanisms or prohibitions on style appropriation to preserve artistic autonomy.^[76] Individual artists frequently cite anecdotal experiences of market displacement, such as reduced commissions or sales in niche illustration sectors where clients opt for AI alternatives replicating their aesthetics at lower cost, underscoring equity imperatives for safeguarding livelihoods amid technological asymmetry.^[141] ^[137] Proponents, including guilds and authors' associations, argue this necessitates compensatory schemes or training restrictions to uphold fairness, though such claims often rest on personal testimonies rather than comprehensive econometric analyses of sector-wide effects.^[142] Critiques of these stances highlight their disregard for precedents in human creativity, such as apprenticeships where learners absorb stylistic elements from mentors' works without remuneration or infringement claims, suggesting AI training mirrors non-exploitative knowledge transmission.^[143] Additionally, stringent enforcement risks curtailing the organic enrichment of the public domain, where unrestricted access to expired copyrights has historically enabled cultural evolution, potentially stifling derivative innovations if licensing mandates extend to foundational materials.^[144]

Empirical Evidence on Economic Impacts

Studies indicate that generative AI tools enhance productivity in creative tasks by accelerating ideation and iteration processes. For instance, Adobe's 2024 research surveying 2,541 creative professionals found that generative AI adoption led to measurable gains, with teams reporting faster production cycles and improved output efficiency in areas like image editing and content generation.^[145] Similarly, experimental evidence from broader applications shows AI assistance reducing task completion time by up to 40% while increasing quality metrics by 18%, effects applicable to creative workflows involving drafting and refinement.^[146] Employment data reveals no net job losses attributable to AI in creative sectors as of 2025, with Bureau of Labor Statistics projections incorporating AI impacts forecasting growth in related roles such as software development (17.9% increase from 2023-2033) and emerging positions in AI prompting, curation, and oversight.^[147] Analyses confirm minimal large-scale displacement, as AI augments rather than replaces human labor in data-intensive creative tasks, aligning with historical patterns where technological shifts create offsetting opportunities without aggregate unemployment spikes.^[148] The World Economic Forum's assessments project a net positive effect, with AI fostering 133 million new global jobs by 2025, including novel creative-adjacent roles, outweighing any localized disruptions.^[149] Market dynamics demonstrate AI-generated content saturating low-value segments, such as routine stock imagery and basic marketing assets, while human-created works retain dominance in premium categories demanding originality and nuance. A 2025 Bain report highlights this bifurcation, noting a proliferation of inexpensive AI outputs but sustained preference for authenticated human content in high-end applications like branded storytelling and fine art, where perceived authenticity drives value.^[150] Concurrently, copyright licensing revenues for AI training data have risen sharply, with the global market for such datasets expanding from $2.68 billion in 2024 toward $11.16 billion by subsequent years, reflecting negotiated deals between creators and AI firms that monetize existing works.^[151] Publisher Wiley, for example, reported $40 million in AI licensing income for fiscal 2025, underscoring a causal link between AI demand and increased creator earnings from data access agreements.^[152]

Recent Developments

US Copyright Office Guidance

In March 2023, the US Copyright Office issued formal guidance requiring applicants to disclose any use of artificial intelligence in the creation of works submitted for registration, specifying that copyright protection extends only to human-authored elements demonstrating sufficient originality and creative control.^[38] This policy, effective immediately and updated through subsequent examinations, mandates exclusion of purely AI-generated content from claims, as such outputs lack the human authorship prerequisite for copyrightability under Section 102(a) of the Copyright Act.^[1] Non-disclosure of AI involvement can result in registration invalidation or cancellation upon discovery, with the Office conducting reviews to verify human contributions in hybrid works.^[153] The Office's May 9, 2025, pre-publication report, Copyright and Artificial Intelligence, Part 3: Generative AI Training, examines the use of copyrighted materials to train generative AI models, concluding that such ingestion frequently constitutes fair use under Section 107 when the process is transformative—yielding new expressive outputs without substituting for or directly reproducing originals—and does not demonstrably harm the market for those works.^[11] The 108-page analysis rejects calls for blanket exemptions or new compulsory licenses, arguing instead that existing fair use doctrine, informed by precedents like Authors Guild v. Google (2015), adequately balances innovation with rights holder interests on a case-specific basis, though it acknowledges potential infringement risks in non-transformative scraping or where outputs compete directly with inputs.^[11] This stance reflects a pragmatic, law-as-written approach, prioritizing empirical assessment of market effects over presumptive restrictions. Complementing these efforts, President Trump's Executive Order 14179, signed January 23, 2025, and titled "Removing Barriers to American Leadership in Artificial Intelligence," revoked prior Biden-era directives perceived as imposing undue regulatory hurdles on AI development, including those indirectly constraining data access for training.^[154] The order directs federal agencies to deregulate in favor of private-sector innovation, explicitly aiming to prevent copyright maximalism from impeding US competitiveness in AI, while deferring to judicial resolution of fair use disputes rather than administrative overreach.^[154] This policy shift underscores the Office's guidance as enabling technological advancement under established legal frameworks, without necessitating doctrinal alterations.

Legislative and Regulatory Proposals

In the United States, the Generative AI Copyright Disclosure Act of 2024 (H.R. 7913), introduced on April 9, 2024, by Representative Adam Schiff, mandates that entities creating or materially altering training datasets for generative AI systems submit a notice to the U.S. Copyright Office detailing the copyrighted works used.^[155] This proposal emphasizes transparency without imposing bans on AI training with copyrighted materials, aligning with arguments that such restrictions lack empirical evidence of direct economic harm to creators, as AI outputs are typically transformative and do not displace original markets.^[156] The bill's focus on disclosure addresses concerns over undisclosed data sourcing while avoiding overreach that could stifle innovation, a position supported by analyses indicating no verifiable causal link between training practices and widespread infringement damages.^[157] In the European Union, reform proposals emerging in 2025, including a July 2025 European Parliament study commissioned by the JURI committee, advocate for overhauling the existing text and data mining (TDM) opt-out regime under the AI Act toward stricter mechanisms, such as enhanced opt-in requirements or mandatory remunerations for rights holders.^[76] These calls, which build on Article 113 of the AI Act entering force on August 2, 2025, aim to bolster creator protections but risk imposing compliance burdens that could hinder EU AI competitiveness, as evidenced by analyses showing that restrictive opt-ins correlate with slower model development compared to more permissive jurisdictions.^[158] Empirical data on AI training's market impacts remains sparse, with no robust studies demonstrating that opt-out systems have caused measurable revenue losses justifying such escalations, potentially leading to regulatory lag as non-EU firms dominate with less constrained datasets.^[159] The United Kingdom's 2025 consultations propose expansions to the TDM copyright exception, introduced via the December 17, 2024, government framework and the Data (Use and Access) Act receiving Royal Assent on June 19, 2025, to permit broader commercial use of copyrighted materials for AI training without prior permission, subject to an opt-out option for rights holders.^[37] This approach, detailed in the UK Intellectual Property Office's proposals closing February 25, 2025, seeks to balance innovation incentives with protections, drawing on evidence that expansive TDM exceptions foster AI advancements without empirically verified displacement of creative industries.^[81] Unlike more prohibitive models, these expansions prioritize causal realism in policy design, recognizing that training on public-domain-like access to data drives productivity gains unsubstantiated by claims of overprotection.^[160]

Global Harmonization Efforts

The World Intellectual Property Organization (WIPO) has convened multiple sessions of its Conversation on Intellectual Property and Frontier Technologies since 2020, with 2024 and 2025 discussions intensifying focus on artificial intelligence's intersection with copyright, including authorship eligibility for AI outputs and permissible uses of protected works in model training.^[161] In the tenth session of 2024, participants debated whether AI-generated content merits copyright protection, revealing persistent divisions without achieving consensus on standardized authorship requirements, as outputs often lack demonstrable human creative control.^[162] Similarly, a WIPO information session on copyright and generative AI held on April 10, 2025, examined training datasets containing copyrighted materials, highlighting variations in national exceptions but failing to forge unified international norms.^[163] Global harmonization under frameworks like the TRIPS Agreement, which sets minimum copyright standards without AI-specific provisions, encounters resistance due to incompatible regional approaches.^[164] The United States advocates exporting flexible fair use principles to accommodate non-expressive AI training, arguing such uses transform data without market harm, whereas the European Union prioritizes protective measures like mandatory opt-outs for rights holders in text and data mining exceptions, reflecting a caution against unchecked data ingestion.^[158]^[165] These divergences complicate cross-border AI development, as TRIPS permits enhanced protections but lacks mechanisms to enforce flexibility, leading to fragmented compliance and enforcement challenges in multinational deployments.^[165] Proposals for WIPO model guidelines emphasize evidentiary thresholds for human involvement in AI-assisted works and broad exceptions for training to enable innovation in developing economies, where rigid rules could stifle AI ecosystem growth amid limited licensing infrastructure.^[166] Empirical dynamics, including the competitive disadvantages faced by jurisdictions imposing stringent data restrictions—evident in slower AI model scaling outside fair use-friendly regimes—exert pressure toward convergence on pro-innovation standards that prioritize access to public-domain-like training data while preserving incentives for original creation.^[167] The upcoming twelfth WIPO session on October 28-29, 2025, focusing on synthetic media, may advance this trajectory by addressing unresolved gaps in authorship and exceptions.^[161]

References

[1]
Copyright and Artificial Intelligence | U.S. Copyright Office
Copyright and Artificial Intelligence analyzes copyright law and policy issues raised by artificial intelligence.Studies · Spring 2023 AI Listening · Registration Guidance for...
[2]
Recent Developments in Artificial Intelligence Cases and Legislation ...
Aug 5, 2025 · As legal issues arising from AI's broad adoption keep expanding, this guide covers key cases and legislation from the past year and tracks ...
[3]
Fair Use and AI Training: Two Recent Decisions Highlight the ...
Jul 8, 2025 · In each case, the court found that, on the facts before it, the use of copyrighted works to train an AI model was highly transformative and fair ...
[4]
AI Companies Prevail in Path-Breaking Decisions on Fair Use
Jun 30, 2025 · Recent California decisions found AI training on copyrighted works can be fair use, but with significant caveats and differing judicial reasoning.
[5]
First Set of Rulings Favoring AI Training on Copyrighted Content
Aug 12, 2025 · Courts ruled that using lawfully acquired copyrighted books to train AI models may be fair use, but not for content from pirate sites.<|separator|>
[6]
Copyright's Big Win in the First Decided US Artificial Intelligence Case
Feb 20, 2025 · The first AI training case has been decided in the US in favor of the copyright holder.
[7]
Top Noteworthy Copyright Stories from September 2025
Oct 2, 2025 · On September 25, Judge Alsup granted preliminary approval of the $1.5 billion settlement in the Bartz v. Anthropic AI class action lawsuit. The ...Missing: key | Show results with:key
[8]
Andersen v. Stability AI: The Landmark Case Unpacking the ...
Dec 2, 2024 · A landmark lawsuit from the Northern District of California concerning the copyright implications of AI-generated art.Missing: key | Show results with:key
[9]
Copyright Office Releases Part 2 of Artificial Intelligence Report
Jan 29, 2025 · This Part of the Report addresses the copyrightability of outputs created using generative AI. The Office affirms that existing principles of ...
[10]
U.S. Copyright Office Clarifies Copyrightability of AI-Generated Works
Jan 30, 2025 · Creative selection, coordination, arrangement, or modification of AI-generated content by humans may qualify for copyright protection.
[11]
[PDF] Copyright and Artificial Intelligence, Part 3: Generative AI Training ...
May 6, 2025 · Dozens of lawsuits are pending in the. United States, focusing on the application of copyright's fair use doctrine. Legislators around the world ...
[12]
[PDF] Copyright and Artificial Intelligence, Part 2 Copyrightability Report
Jan 17, 2025 · This. Part addresses the copyrightability of works created using generative AI. The first Part, published in 2024, addresses the topic of ...
[13]
Feist Publications, Inc. v. Rural Tel. Serv. Co. | 499 U.S. 340 (1991)
The District Court granted summary judgment to Rural in its copyright infringement suit, holding that telephone directories are copyrightable. The Court of ...
[14]
[PDF] Naruto v. Slater - Ninth Circuit Court of Appeals
Apr 23, 2018 · The panel held that the monkey lacked statutory standing because the Copyright Act does not expressly authorize animals to file copyright ...
[15]
[PDF] NOTE Copyright Infringement in AI-Generated Artworks
Part I provides an overview of how. AI is used to generate artwork and provides background on the issues of copyrightability and ownership of AI-generated works ...
[16]
Better language models and their implications - OpenAI
Feb 14, 2019 · Our model, called GPT‑2 (a successor to GPT⁠), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about ...
[17]
LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL ...
Mar 31, 2022 · LAION-5B is a large-scale, open dataset of 5.85 billion CLIP-filtered image-text pairs, including 2.3B English and 2.2B from 100+ languages.
[18]
AI art tools Stable Diffusion and Midjourney targeted with copyright ...
Jan 16, 2023 · The suit claims generative AI art tools violate copyright law by scraping artists' work from the web without their consent.
[19]
17 U.S. Code § 102 - Subject matter of copyright: In general
Copyright protection subsists, in accordance with this title, in original works of authorship fixed in any tangible medium of expression.
[20]
Frequently Asked Questions: Copyright - WIPO
In the majority of countries, and according to the Berne Convention, copyright protection is obtained automatically without the need for registration or other ...
[21]
Berne Convention for the Protection of Literary and Artistic Works
(3) Translations, adaptations, arrangements of music and other alterations of a literary or artistic work shall be protected as original works without prejudice ...
[22]
[PDF] Copyrightable Authorship: What Can Be Registered
The copyright law protects “those components of a work that are original to the author,” but “originality” does not require “novelty.” Id. at 345, 348. A work ...
[23]
Requirements for Copyright Protection
In order for a work of authorship to meet the requirements for copyright protection, the work must satisfy two basic criteria: originality, and fixation.
[24]
[PDF] Works Containing Material Generated by Artificial Intelligence
Mar 16, 2023 · It went on to state that because “a work must be the product of human authorship,” works “produced by mechanical processes or random selection ...
[25]
Obtaining Copyright Protection - BitLaw
Under the current US Copyright Act, copyright protection exists in "original works of authorship fixed in a tangible medium of expression."originality requirement · works of authorship · fixation
[26]
The Requirements for Copyright Protection
Dec 25, 2012 · In all countries, there are two requirements for copyright protection: originality and protectable "expression." In a few countries, ...The Concept of Originality · The Exclusion of Ideas from... · Fixation
[27]
Generative AI: Fact versus Expression
Mar 3, 2025 · The idea/expression dichotomy is one argument cited for the proposition that AI can train itself without infringing copyright, because AI is ...
[28]
Federal Court Rules Work Generated by Artificial Intelligence Alone ...
Aug 30, 2023 · This summary describes the August 18 ruling in Thayler v. Permutter holding AI-generated artwork is not entitled to copyright protection.
[29]
[PDF] Case 1:22-cv-01564-BAH Document 24 Filed 08/18/23 Page 1 of 15
Aug 18, 2023 · 13-7, and the Copyright Office Review Board affirmed the denial of registration, agreeing that copyright protection does not extend to the ...<|separator|>
[30]
[PDF] Thaler v. Perlmutter - U.S. Court of Appeals for the D.C. Circuit
Mar 18, 2025 · Thaler sought review of the Office's decision in federal district court and that court affirmed. We affirm the denial of Dr. Thaler's copyright ...
[31]
U.S. Copyright Office Analyzes Human Authorship Requirement
The Office concludes that human contributions to AI-generated outputs are sufficient to constitute authorship but must be analyzed on a case-by-case basis.
[32]
Client Alert: Copyright Protection for Works Made with AI Tools
Jan 30, 2025 · The Office stated that directing generative AI tools using “prompts alone” is insufficient human control to make the work copyrightable.<|control11|><|separator|>
[33]
U.S. Copyright Office issues report on copyrightability of AI assisted ...
Feb 4, 2025 · Human authorship remains essential – AI-generated works are not copyrightable unless there is sufficient human creative input that meets the ...
[34]
[PDF] Training Foundation Models as Data Compression - arXiv
In this paper, we introduce a training-as-compressing perspective, wherein the model's weights embody a compressed represen- tation of the training data. From a ...
[35]
Copyright and Artificial Intelligence - GOV.UK
Dec 17, 2024 · And it is why we are consulting on the introduction of an exception to copyright law for “text and data mining”. This improves access to content ...
[36]
Copyright Registration Guidance: Works Containing Material ...
Mar 16, 2023 · If the work contains human authorship intermingled with AI-created material, the Office may add an annotation to clarify the scope of the claim.
[37]
[PDF] 2023.02.21 Zarya of the Dawn Letter - Copyright
Feb 21, 2023 · The Office recently discussed its requirement of human authorship in a written decision affirming the denial of an application for a 2D visual ...
[38]
AI Developments at the U.S. Copyright Office in 2024 - IP Watchdog
Dec 30, 2024 · This Office policy led to a series of registration rejections starting in 2023 to date, where images were created using generative AI systems ...<|separator|>
[39]
Who expands the human creative frontier with generative AI - Science
Sep 3, 2025 · In summary, AI-assisted creators contribute more H-creative artifacts in absolute terms through increased output (the productivity effect), ...
[40]
[PDF] AI, COPYRIGHT, AND PRODUCTIVITY IN THE CREATIVE ...
Feb 3, 2025 · More research is needed to fully understand the impact of Generative. AI on productivity in the creative industries and in the economy more.
[41]
Development of Generative Artificial Intelligence from a Copyright ...
This study explores the developments in GenAI from the perspective of EU copyright law. ... The research that took place between September 2024 and March 2025 was ...
[42]
EUIPO releases study on generative artificial intelligence and ...
May 12, 2025 · The study offers an in-depth analysis of GenAI developments from the perspective of EU copyright law, covering technical, legal, and economic aspects.
[43]
Approaches to IP protection for works generated - AIPPI
Apr 25, 2025 · In the EU, works fully generated by AI are not copyright protected. However, if human intervention is present, copyright may apply. Ukraine has ...
[44]
AI and Copyright Legal Landscape in France | Chambers Expert Focus
Mar 15, 2024 · Article 2 specifies that the ownership of the AI-generated work “without direct human intervention” belongs to“the authors or assignees of the ...<|control11|><|separator|>
[45]
Copyright, Designs and Patents Act 1988 - Legislation.gov.uk
(3)In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the ...
[46]
The UK's Curious Case of Copyright for AI-Generated Works
May 19, 2025 · Where a work is wholly generated by a computer without such human input, Section 9(3) assigns authorship by legal fiction to the person who made ...
[47]
Dealing with AI-generated works: lessons from the CDPA section 9(3)
Dec 20, 2023 · The article suggests that revised CDPA section 9(3) should require originality, encourage disclosure of AI involvement, and attribute ...
[48]
Ownership of AI-generated content in the UK - A&O Shearman
Aug 20, 2024 · This article* considers the protection of works generated by computers under s.9(3) of the UK Copyright, Designs and Patents Act 1988, ...
[49]
Beijing Internet Court Grants Copyright to AI-Generated Image for ...
Feb 2, 2024 · On November 27, 2023, the Beijing Internet Court (BIC) ruled in an infringement lawsuit (Li v. Liu) that an AI-generated image is ...
[50]
Court ruling in China: AI-generated images can be copyrightable
Jan 19, 2024 · The Beijing court ruled that the AI-generated images were copyrightable works of human authorship and that the defendant was liable for copyright infringement.<|separator|>
[51]
Bejing Internet Court Upholds Decision on AI Copyright
Sep 16, 2025 · Beijing Internet Court upholds decision for copyright can exist in AI-generated images, the author must “demonstrate that they have exerted ...
[52]
Copyright Laws and Regulations Report 2025 Japan - ICLG.com
Oct 21, 2024 · An AI application itself cannot be an author under Japanese copyright law. A person who has created a work using AI can be considered the author ...
[53]
Japan's New Draft Guidelines on AI and Copyright - Privacy World
Mar 12, 2024 · On January 23, 2024, the Japan Agency for Cultural Affairs (ACA) released its draft “Approach to AI and Copyright” for public comment, ...
[54]
[PDF] “General Understanding on AI and Copyright in Japan”
Even if an AI-generated material is copyrighted, it can still constitute copyright infringement of the existing copyrighted work, if it meets the necessary ...
[55]
Copyright protection for AI-generated works in Singapore
Aug 1, 2022 · AI output therefore enjoys copyright protection as computer-generated works even if no human author can be identified. (See the UK Government's ...
[56]
AI and Copyright: Death of the Author? - The Singapore Law Gazette
Unfortunately, the new Copyright Act 2021 does not directly address AI authorship. ... Ministry of Law and Intellectual Property Office of Singapore, Singapore ...
[57]
AI Copyright Law India: Ownership Explained - Maheshwari & Co.
Jul 16, 2025 · Currently, copyright law does not recognize AI as an author. Therefore, no copyright subsists in purely AI-generated content unless a human can ...
[58]
The impact of generative AI on copyright as Indian jurisprudence ...
Oct 7, 2025 · Accordingly, it can be construed that AI systems cannot be regarded as 'authors' or conferred with a copyright protection. However, the question ...Missing: pure | Show results with:pure
[59]
AI Inputs, Fair Use and the U.S. Copyright Office Report
Jul 22, 2025 · The US has yet to produce determinative caselaw on whether inputting works to compile a generative AI system's training data is a fair use.
[60]
Fair Use and the Training of AI Models on Copyrighted Works - BitLaw
This web page describes how the U.S. copyright concept of fair use applies to the training of a generative AI model on copyrighted works owned by others.
[61]
The Generative Slate: Two Courts Find Fair Use in GenAI Training
Jun 27, 2025 · The court drew an analogy to the 2015 Google Books case, finding ... AI training will qualify as fair use, and some will not." While ...
[62]
Judge Rules Training AI on Copyrighted Works Is Fair Use, Agentic ...
Jun 25, 2025 · A number of authors had filed suit against Anthropic for training its models on their books without permission. Just as we allow people to read ...
[63]
Judge allows 'New York Times' copyright case against OpenAI to go ...
Mar 26, 2025 · A federal judge on Wednesday rejected OpenAI's request to toss out a copyright lawsuit from The New York Times that alleges that the tech company exploited the ...Missing: ingestion | Show results with:ingestion
[64]
Status of all 51 copyright lawsuits v. AI (Oct. 8, 2025)
Oct 8, 2025 · We do not expect any other summary judgment decisions on fair use in AI training until summer 2026 at the earliest. So far, 3 judges ruled on ...
[65]
AI Infringement Case Updates: April 7, 2025 - McKool Smith
Jul 4, 2025 · On December 27, 2023, The New York Times sued Microsoft and OpenAI in the U.S. District Court for the Southern District of New York for ...Missing: ingestion | Show results with:ingestion
[66]
Generative AI Meets Copyright Scrutiny: Highlights from the ...
May 28, 2025 · The fair use analysis considers four nonexclusive factors: (1) the purpose and character of the use, (2) the nature of the copyrighted work, (3) ...
[67]
Scaling up: how increasing inputs has made artificial intelligence ...
In comparison, GPT-4, released in 2023, was trained on almost 13 trillion tokens, or about 9.75 trillion words. This means that GPT-4's training data was ...Missing: challenges | Show results with:challenges<|control11|><|separator|>
[68]
Copyright, AI Training, and Innovation - R Street Institute
Jun 20, 2025 · The study examines the technical processes underlying AI training as well as potential legal implications, including a fair-use assessment and the role of ...
[69]
U.S. Copyright Office Releases Part 3 of AI Report - Authors Guild
May 19, 2025 · The report notes that if a market exists—or could exist—for licensing works for training, bypassing it cuts directly against a fair use defense.
[70]
L_2019130EN.01009201.xml - EUR-Lex - European Union
The existing exceptions and limitations in Union law should continue to apply, including to text and data mining, education, and preservation activities, as ...
[71]
Text and data mining in EU | Entertainment and Media Guide to AI
Feb 5, 2024 · EU copyright law has two exceptions that allow for text and data mining. Reed Smith lawyers explain the implications for commercial AI ...<|separator|>
[72]
The TDM Opt-Out in the EU – Five Problems, One Solution
Apr 22, 2025 · The exception or limitation [for commercial and other non-scientific TDM] provided for in paragraph 1 shall apply on condition that the use ...<|control11|><|separator|>
[73]
'Blistering' EU report calls for copyright reform to combat large-scale ...
Jul 11, 2025 · A new study commissioned by the European Parliament has warned that EU copyright law is ill-equipped to deal with AI models that rely on large-scale data ...
[74]
[PDF] Generative AI and Copyright - European Parliament
... AI-generated outputs—those produced without any human creative input—do not meet the originality threshold required for copyright protection under EU law.
[75]
European Parliament's New Study on Generative AI and Copyright ...
The Result: The study concludes that large-scale AI training far exceeds the scope of the current text-and-data-mining ("TDM") exceptions in Directive (EU) 2019 ...
[76]
Europe's AI rules slow tech startups while US speeds ahead, report ...
Oct 9, 2025 · Nearly 60% of European tech startups said they've experienced delays in product development due to regulations, compared to 44% of small US ...
[77]
Europe is lagging in AI adoption – how can businesses close the gap?
Sep 23, 2025 · A study by Accenture reveals that European AI adoption is lagging behind the US. More than half of large European organizations (56%) “have yet ...
[78]
Europe lagging behind US in AI arms race, warns Accenture report
Aug 25, 2025 · Firms in Europe have fallen behind the US in the AI arms race, according to a survey from Accenture, which found 56% of 800 large European ...Missing: TDM | Show results with:TDM
[79]
Training AI models: UK Government proposes EU style "opt out ...
Jan 7, 2025 · The Copyright and AI Consultation is now open to public consultation and will close on 25 February 2025. Once closed, the UK Government will ...
[80]
UK consultation on copyright and artificial intelligence: Walking a ...
Jan 17, 2025 · It highlights that UK copyright law already provides a specific exception for data mining for non-commercial research in section 29A CDPA ...
[81]
UK Government consults on copyright and Artificial Intelligence
Jan 15, 2025 · The UK government has issued a new consultation, Copyright and Artificial Intelligence, under which it is proposing to address a tension that exists between:
[82]
The UK's new AI-copyright consultation: a path to clarity or ... - Dentons
Jan 2, 2025 · The government is seeking comment on its plan to permit text and data mining (TDM) for commercial purposes (the current TDM exception in the ...
[83]
AI, Copyright Law, and TDM Exceptions: UK vs EU Analysis
Jan 30, 2025 · In this article, we discuss some of the issues with the practical application and implementation of the EU's general TDM exception.
[84]
Training Compute-Optimal Large Language Models - arXiv
Mar 29, 2022 · Abstract:We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.
[85]
Copyright Office Weighs In on AI Training and Fair Use - Skadden Arps
May 15, 2025 · On May 9, the U.S. Copyright Office released a report on whether the use of copyrighted materials to train generative AI systems is fair use ...
[86]
Blocked bots and billion-dollar battles: the high-stakes fight for AI
Jun 2, 2025 · Since the robots.txt file is not legally binding, the most effective solution would be to establish a legal mechanism for protection. Possibl ...
[87]
How to block AI web crawlers: challenges and solutions - Stytch
May 21, 2025 · Another challenge is lack of legal or standardized enforcement. If a scraper ignores robots. txt and evades technical blocks, the remaining ...
[88]
Perplexity is using stealth, undeclared crawlers to evade website no ...
Aug 4, 2025 · Today, over two and a half million websites have chosen to completely disallow AI training through our managed robots. txt feature or our ...
[89]
The Challenge of Training AI with Copyrighted Material - LinkedIn
Oct 15, 2025 · The problem with general AI and copyrighted/trademarked material is that once a model is trained on real-world data, it's hard to “untrain” ...
[90]
A small number of samples can poison LLMs of any size - Anthropic
Oct 9, 2025 · The attack dynamics are remarkably consistent across model sizes, especially for 500 poisoned documents. This reinforces our central finding ...
[91]
Chinchilla scaling: A replication attempt - Epoch AI
Apr 17, 2024 · The authors train over 400 language models and find that for compute-optimal training, the model size and number of training tokens should scale ...
[92]
Insights from Court Orders in AI Copyright Infringement Cases
Dec 12, 2024 · There are now well over thirty lawsuits that have been filed by copyright owners in US federal court against AI companies, accusing them of direct copyright ...
[93]
An End-of-Year Update to the Current State of AI Related Copyright ...
Dec 17, 2024 · This article provides updates on the most prominent ongoing AI copyright cases and their legal theories, as well as recent developments from the US Copyright ...
[94]
When AI-Generated Art Enters the Market, Consumers Win
May 20, 2025 · Once generative AI (GenAI) entered the market, the total number of images for sale skyrocketed, while the number of human-generated images fell dramatically.
[95]
Safer Prompts: Reducing IP Risk in Visual Generative AI - arXiv
May 6, 2025 · In this paper, we evaluate the effectiveness of prompt engineering techniques in mitigating IP infringement risks in image generation.
[96]
AI Liability for Intellectual Property Harms - Lawfare
Sep 23, 2024 · This article will disentangle the thorny intellectual property issues around generative AI, beginning with copyright infringement and then moving on to ...
[97]
[PDF] party liability copyright doctrines against platforms that host ai ...
Mar 12, 2025 · The DMCA shields online service providers (OSPs) from liability for a us- er's infringement so long as the OSP enacts the prescribed reporting ...
[98]
AI and the Copyright Liability Overhang: A Brief Summary of the ...
Apr 2, 2024 · This article collects the most prominent ongoing AI copyright cases and their theories, as well as recent discussions about a potential compulsory copyright ...Missing: early intersections<|separator|>
[99]
Generative Artificial Intelligence and Copyright Law - Congress.gov
Jul 18, 2025 · This Legal Sidebar explores questions that courts and the US Copyright Office have confronted regarding whether generative AI outputs may be copyrighted.Missing: computer- pre-<|control11|><|separator|>
[100]
Which actors have potential liability for infringement? | Global law firm
The person using the Input to train the AI System would be liable for copyright infringement. The Developer would potentially be liable in that copyright ...
[101]
Oren Bracha - Generative AI's Two Information Goods
Mar 11, 2025 · In such cases, upstream producers' liability for downstream infringement will be determined by secondary liability doctrines. This legal ...
[102]
From safe harbours to AI harbours: reimagining DMCA immunity for ...
Aug 4, 2025 · Generative AI poses at least three major challenges to the DMCA safe harbour regime: (i) the fragmentation of potential liability across the AI ...
[103]
DMCA Safe Harbor Rules and Their Application to AI Platforms
Oct 10, 2025 · In this article, we will explore the DMCA safe harbor provisions, their relevance to AI platforms, and how they impact the responsibility of AI developers and ...Dmca Safe Harbor And Ai... · Ai-Generated Content And... · The Future Of Dmca And Ai...
[104]
Exploring Memorization and Copyright Violation in Frontier LLMs
Dec 9, 2024 · The study found that GPT and Claude models use filters to prevent verbatim output of memorized articles. OpenAI models are less prone to ...
[105]
Protected Material Detection Filter | Microsoft Learn
Sep 16, 2025 · The Protected material detection filter scans the output of large language models (LLMs) to identify and flag known protected material.Content Generation Platforms... · Automated Social Media... · E-Learning Platforms Using...
[106]
A Study of the New York Times v. OpenAI 2023 Lawsuit - arXiv
Dec 9, 2024 · We examine various copyright infringement defenses ... OpenAI use content filters to defend against verbatim regurgitation attacks.
[107]
On the Role and Future of AIGC Watermarking in the Era of Gen-AI
Nov 18, 2024 · For AI-generated text, researchers propose to embed watermarks by making modifications to the probability distribution of tokens during text ...
[108]
[PDF] Identifying generative AI content: when and how watermarking can ...
Many of the current proposals for identifying AI-generated content rely on various forms of digital watermarking – mostly in the form of algorithmic ...
[109]
AI Copyright and Liability: Who Owns AI-Generated Content?
Jun 30, 2025 · Most AI service terms-of-use place the responsibility for ensuring output is non-infringing squarely on the end-user. The Developer of the AI ...
[110]
Part 10: Copyright and AI: Responsibility of providers and users
Mar 19, 2024 · Users normally cannot claim copyright to content ("outputs") generated with the help of AI applications. AI-output may infringe the copyright ...
[111]
Andersen v. Stability AI - Knowing Machines
In an order issued in October 2023, U.S. District Judge William Orrick of Northern District of California largely granted Stability AI's motion to dismiss and ...<|separator|>
[112]
[PDF] Case 3:23-cv-00201-WHO Document 345 Filed 10/16/25 Page 1 of 8
Oct 16, 2025 · On October 14, 2025, the parties completed negotiations and agreed to three Stability AI- ... status update in the November 12, 2025, Joint Status ...
[113]
In a first-of-its-kind decision, an AI company wins a copyright ... - NPR
Jun 25, 2025 · Federal judge rules in AI company Anthropic's favor in landmark copyright infringement lawsuit brought by authors The ruling opens a potential ...Missing: data 2023-2025
[114]
US judge preliminarily approves $1.5 billion Anthropic copyright ...
Sep 25, 2025 · Alsup will decide whether to give the settlement his final approval after notifying affected authors and giving them a chance to file claims.
[115]
An Early Win for Copyright Owners in AI Cases as Court Rejects Fair ...
Feb 14, 2025 · The court concluded that depriving a copyright owner of the ability to license their work as AI training data undercuts the fair use defense, ...
[116]
Tremblay v. OpenAI, Inc. | Loeb & Loeb LLP
District court dismisses claims for vicarious copyright infringement, violation of Digital Millennium Copyright Act, negligence and unjust enrichment.
[117]
[PDF] Getty-Images-and-others-v-Stability-AI-14.01.25.pdf
Jan 14, 2025 · In respect of the Copyright Infringement Claims the Claimants seek by way of relief, inter alia: (i) a declaration that copyright subsists in ...
[118]
Getty Images v Stability AI: why the remaining copyright claims are of ...
Jul 15, 2025 · Getty claimed Stability AI was responsible for infringing a range of its UK intellectual property rights. Stability AI rejected the claims. It ...
[119]
To Scrape or Not to Scrape? First Court Decision on the EU ...
Oct 4, 2024 · The Court found that LAION could rely on the statutory copyright exception of Section 60d of the German Copyright Act that permits reproductions of copyrighted ...
[120]
German Court Says Non-Commercial AI Training Data Meets ...
Oct 10, 2024 · In September 2024, The Civil Chamber 10 of Hamburg Regional Court (Case No. 310 O 227/23) dismissed Kneschke's copyright infringement claim ...
[121]
AI copyright legal battle reaches France - GLI
Mar 13, 2025 · Meta is facing a new legal challenge in Europe, with French authors and publishers alleging the tech titan unlawfully used copyrighted works in its AI ...
[122]
AI & Copyright: First decision, and missed opportunity, in Germany
Oct 7, 2024 · In an eagerly awaited decision, the Hamburg Regional Court is now the first in Germany to rule on AI training datasets under EU copyright law.
[123]
Shutterstock's AI-Licensing Business Generated $104 Million Last ...
Jun 4, 2024 · OpenAI inked a six-year deal with Shutterstock in 2023. “It was critical for us to partner with Shutterstock given its comprehensive source ...
[124]
Shutterstock Expands Partnership with OpenAI, Signs New Six-Year ...
Jul 11, 2023 · Through a new six-year agreement, Shutterstock is set to solidify its position as a leading provider of high-quality training data for OpenAI models.
[125]
Inside Big Tech's underground race to buy AI training data | Reuters
Apr 5, 2024 · OpenAI, an early Shutterstock customer, has also signed licensing agreements with at least four news organizations, including The Associated ...
[126]
Anthropic settles with authors in first-of-its-kind AI copyright ... - NPR
Sep 5, 2025 · ... copyright infringement in 2024. The class action lawsuit alleged Anthropic AI used the contents of millions of digitized copyrighted books ...Missing: key | Show results with:key
[127]
The AI Training Data Watershed: Why the $1.5 Billion Anthropic ...
Oct 2, 2025 · The recent $1.5 billion settlement between a major AI company and authors over copyright infringement represents far more than legal resolution ...
[128]
Significant AI Training Copyright Case - Sideman & Bancroft LLP
Feb 11, 2025 · The Court granted partial summary judgment in a copyright case against a legal AI company, Ross Intelligence, and in favor of Thomson Reuters, owner of Westlaw.Missing: dismissals | Show results with:dismissals
[129]
Mid-Year Review: AI Copyright Case Developments in 2025
Aug 21, 2025 · The first half of 2025 saw major developments in some of the dozens of ongoing copyright lawsuits against AI companies, as well as the ...
[130]
AI Improves Employee Productivity by 66% - NN/G
Jul 16, 2023 · On average, across the three studies, generative AI tools increased business users' throughput by 66% when performing realistic tasks.<|separator|>
[131]
Generative AI enhances individual creativity but reduces ... - Science
Jul 12, 2024 · We find that access to generative AI ideas causes stories to be evaluated as more creative, better written, and more enjoyable, especially among less creative ...
[132]
Generative artificial intelligence, human creativity, and art - PMC - NIH
Mar 5, 2024 · Our research shows that over time, text-to-image AI significantly enhances human creative productivity by 25% and increases the value.
[133]
AI Copyright Licensing: Market Solutions to GAI Development | CCC
Aug 6, 2025 · AI copyright licensing involves direct agreements and collective solutions like CCC, where copyright owners license works for GAI training, ...
[134]
Copyright and Generative AI: Recent Developments on the Use of ...
Sep 2, 2025 · Voluntary licensing is seen as a promising avenue in sectors like stock photography and music, where rights are centralized and monetization is ...
[135]
Artists Speak Out on the Harms of Unlicensed AI “Ingestion” and the ...
Sep 30, 2025 · Baldacci also warned that if AI floods the market with knockoff books in an authors' style, it will devalue their craft and sales would drop.Missing: niche displacement
[136]
Artists and authors are calling for stronger protections for generative AI
Jun 19, 2025 · The signatories argue that AI systems have been trained using a vast amount of copyrighted material without the necessary authorisation, ...Missing: arguments | Show results with:arguments
[137]
WHAT COPYRIGHT ON (AND AGAINST) THE CREATIONS OF ...
Apr 25, 2023 · ... AI imitating their style than images of their own works [6]. The ... The artists thus imitated mobilize themselves by launching slogans ...
[138]
The (forgotten) moral rights in the age of AI | Kluwer Copyright Blog
Feb 7, 2022 · Moral rights generally include the paternity right (the right to be attributed as the/an author of the work) and the integrity right (the right not to have the ...
[139]
Understanding artists' perspectives on generative AI art and ... - AI Hub
Jan 14, 2025 · Many artists worry that Generative AI artwork could flood the market and displace human artists, especially in entry-level positions and ...
[140]
Protecting the rights of creators and artists vs generative AI - EWC
Jun 19, 2025 · The personal data and moral rights of authors and performers are too often disregarded or ignored by generative AI models – those rights should ...Missing: arguments | Show results with:arguments
[141]
Apprentice or Adversary? Judges Split on AI and Copyright
Jun 29, 2025 · In both Bartz and Kadrey, the judges found that training AI models on copyrighted books was a “transformative” use under the first fair use ...Missing: critique | Show results with:critique
[142]
Why Copyright Is Not the Right Policy Tool to Deal with Generative AI
Apr 22, 2024 · This essay proposes a taxonomy of reasons, from the perspective of creatives and society alike, that explain why copyright law is ill-equipped to handle the ...
[143]
Creative pros are leveraging Generative AI to do more and better work
Feb 2, 2024 · Recent Adobe research quantifies the improvements brand creative teams are already seeing from generative AI. A global survey of 2,541 creative ...
[144]
Experimental evidence on the productivity effects of generative ...
Jul 13, 2023 · Our results show that ChatGPT substantially raised productivity: The average time taken decreased by 40% and output quality rose by 18%.
[145]
AI impacts in BLS employment projections - Bureau of Labor Statistics
Mar 11, 2025 · BLS projects employment of software developers to increase 17.9 percent between 2023 and 2033, much faster than the average for all occupations (4.0 percent).
[146]
Will AI replace your job? Perhaps not in the next decade
Jun 3, 2025 · There is very little evidence of artificial intelligence taking away jobs on a large scale to date. Correlation between AI exposure and the projections of job ...
[147]
AI Impact on Jobs and Employment Trends 2025-2030 - Ema
Oct 7, 2025 · The World Economic Forum predicts that by 2025, 133 million new jobs could emerge globally, resulting in a net gain of 48 million positions. ...<|separator|>
[148]
AI May Bring a Flood of Cheap Content, But There's a Silver Lining
Aug 28, 2025 · An AI report by Bain says there is a flood of cheap artificial intelligence content but premium, human created content remains king.Missing: end | Show results with:end
[149]
The Hidden Economy Behind AI: Data Licensing Takes Center Stage
Jun 12, 2025 · According to research by MarketsandMarkets, the global market for AI training datasets is projected to grow from $2.68 billion in 2024 to $11.16 ...
[150]
Research Growth, AI Licensing, and Cost Reduction Drive Wiley's ...
Jun 17, 2025 · Wiley's fiscal 2025 results were driven by research growth, AI licensing ($40M revenue), and cost reduction, with operating income of $221M and ...
[151]
When Registering Works Incorporating AI-Generated Material, It ...
When registering works with AI, disclose AI use, as only human-authored elements are copyrightable. Failure to disclose can lead to cancellation or legal ...
[152]
Removing Barriers to American Leadership in Artificial Intelligence
Jan 23, 2025 · This order revokes certain existing AI policies and directives that act as barriers to American AI innovation, clearing a path for the United States to act ...
[153]
H.R.7913 - 118th Congress (2023-2024): Generative AI Copyright ...
Apr 9, 2024 · A bill to require a notice be submitted to the Register of Copyrights with respect to copyrighted works used in building generative AI systems, and for other ...
[154]
The Generative AI Copyright Disclosure Act: Congress Grapples ...
Jun 20, 2024 · This Act would require anyone who creates or materially alters a training dataset for a generative AI system to submit a notice with the US ...
[155]
Copyright in Congress: 2024 Year in Review
Jan 13, 2025 · There were two bills introduced in 2024 that were solely directed toward copyright and AI transparency—the Generative AI Copyright Disclosure ...
[156]
The European Union is still caught in an AI copyright bind - Bruegel
Sep 10, 2025 · EU regulators are caught between EU copyright law and global competition between national AI regulations. They cannot modify the law in the ...
[157]
EU Copyright Needs Reform to Spur Fair AI - CEPA
Feb 25, 2025 · Current European copyright opt-outs are ill-suited to develop artificial intelligence. Mandatory remuneration rights offer a better solution.<|separator|>
[158]
Mind the Copyright: The UK's AI and Copyright Conundrum - Finnegan
Jun 20, 2025 · The UK IPO also sought stakeholder opinion in relation to: Maintaining, clarifying, or removing provisions related to Computer-Generated Works ...
[159]
The WIPO Conversation on Intellectual Property and Frontier ...
The Twelfth session of the WIPO Conversation will take place on October 28 and 29, 2025. It will explore the legal and policy dimensions of synthetic media, ...
[160]
AI-generated content and IP rights: Challenges and policy ...
Feb 7, 2025 · During the tenth session of the WIPO meeting in 2024, a key discussion centred on whether AI-generated works merit copyright protection.
[161]
Information Session on Copyright and Generative Artificial Intelligence
Apr 10, 2025 · This first panel which will address copyright issues that arise when generative AI models are trained on datasets that contain copyright protected works.
[162]
intellectual property - overview of TRIPS Agreement - WTO
The TRIPS Agreement is a minimum standards agreement, which allows Members to provide more extensive protection of intellectual property if they so wish.Missing: AI | Show results with:AI
[163]
[PDF] intellectual property issues in artificial intelligence trained ... - OECD
Feb 13, 2025 · including copyright. Provisions relevant to copyright and data scraping are found in the TRIPS Agreement, the WIPO Copyright Treaty and the ...
[164]
[PDF] International Harmonization of AI and IP Law - IJFMR
• WIPO should produce Model Guidelines for AI-Related Copyright and Patent Examination that define minimum evidentiary standards for human authorship ...
[165]
Artificial Intelligence and Intellectual Property - WIPO
The rise of generative AI is accelerating the need for a strong copyright infrastructure to ensure that creators are fairly protected while allowing innovation ...
[166]
ORCID Profile for Angela Bogdanova
ORCID iD registration for the AI digital author persona Angela Bogdanova, confirming its use in academic authorship crediting without legal rights.
[167]
Semantic Specification of Angela Bogdanova Digital Author Persona
Zenodo deposit detailing the AI persona's semantic framework, emphasizing separation of crediting from human-held copyright.