• About
    • Mission
    • Team
    • Boards
    • Mentions & Testimonials
    • Institutional Recognition
    • Annual Reports
    • Current & Past Sponsors
    • Contact Us
  • Resources
    • Article Collection
    • Podcast: Art in Brief
    • AML and the Art Market
    • AI and Art Authentication
    • Newsletter
      • Subscribe
      • Archives
      • In Brief
    • Art Law Library
    • Movies
    • Nazi-looted Art Restitution Database
    • Global Network
      • Courses and Programs
      • Artists’ Assistance
      • Bar Associations
      • Legal Sources
      • Law Firms
      • Student Societies
      • Research Institutions
    • Additional resources
      • The “Interview” Project
  • Events
    • Worldwide Calendar
    • Our Events
      • All Events
      • Annual Conferences
        • 2025 Art Law Conference
        • 2024 Art Law Conference
        • 2023 Art Law Conference
        • 2022 Art Law Conference
        • 2015 Art Law Conference
  • Programs
    • Visual Artists’ Legal Clinics
      • Art & Copyright Law Clinic
      • Artist-Dealer Relationships Clinic
      • Artist Legacy and Estate Planning Clinic
      • Visual Artists’ Immigration Clinic
    • Summer School
      • 2025
    • Internship and Fellowship
    • Judith Bresler Fellowship
  • Case Law Database
  • 2025 Year-End Appeal
  • Log in
  • Become a Member
  • Donate
  • 2025 Year-End Appeal
  • Log in
  • Become a Member
  • Donate
Center for Art Law
  • About
    About
    • Mission
    • Team
    • Boards
    • Mentions & Testimonials
    • Institutional Recognition
    • Annual Reports
    • Current & Past Sponsors
    • Contact Us
  • Resources
    Resources
    • Article Collection
    • Podcast: Art in Brief
    • AML and the Art Market
    • AI and Art Authentication
    • Newsletter
      Newsletter
      • Subscribe
      • Archives
      • In Brief
    • Art Law Library
    • Movies
    • Nazi-looted Art Restitution Database
    • Global Network
      Global Network
      • Courses and Programs
      • Artists’ Assistance
      • Bar Associations
      • Legal Sources
      • Law Firms
      • Student Societies
      • Research Institutions
    • Additional resources
      Additional resources
      • The “Interview” Project
  • Events
    Events
    • Worldwide Calendar
    • Our Events
      Our Events
      • All Events
      • Annual Conferences
        Annual Conferences
        • 2025 Art Law Conference
        • 2024 Art Law Conference
        • 2023 Art Law Conference
        • 2022 Art Law Conference
        • 2015 Art Law Conference
  • Programs
    Programs
    • Visual Artists’ Legal Clinics
      Visual Artists’ Legal Clinics
      • Art & Copyright Law Clinic
      • Artist-Dealer Relationships Clinic
      • Artist Legacy and Estate Planning Clinic
      • Visual Artists’ Immigration Clinic
    • Summer School
      Summer School
      • 2025
    • Internship and Fellowship
    • Judith Bresler Fellowship
  • Case Law Database
Home image/svg+xml 2021 Timothée Giet Art law image/svg+xml 2021 Timothée Giet Unpacking the US Copyright Office’s Third Report on Generative AI
Back

Unpacking the US Copyright Office’s Third Report on Generative AI

July 8, 2025

By Juliette Groothaert

word-image-73890-1
word-image-73890-2

 

Upon asking DALL-E 3[1] to “create a scenic view of the sea in the style of Van Gogh”, the image appearing on the right was generated within seconds. When compared to The Starry Night on the left, the stylistic resemblance is immediately apparent: swirling skies, radiating light forms, bold brushstrokes, and bright color contrasts.

Yet, as reminded by Cooper and Grimmelman, “a model is not a magical portal that pulls fresh information from some parallel universe into our own.”[2]

This basic understanding provides critical context for understanding the copyright implications of generative AI. Generative AI models, as sophisticated data-driven structures, operate on mathematical constructs derived wholly from their training datasets.[3] The expanding general usability of these models has only intensified the demand for such datasets.[4] To enhance quality, accuracy, and flexibility, industry submissions confirm these systems typically require ‘millions or billions of works for training purposes,’[5] including terabyte-scale datasets for foundation models.[6] As a result, this reliance on pre-existing copyrighted materials has catalyzed numerous legal challenges.[7]

Several prominent examples include The New York Times v. Microsoft Corp[8] case involving unauthorised use of proprietary journalism to train language models; visual arts disputes such as Zhang v. Google LLC,[9] Andersen v. Stability AI,[10] and Getty Images v. Stability AI;[11] and most importantly, the landmark ruling in Thomson Reuters v. Ross Intelligence.[12] In Reuters, although concerning the use of copyrighted legal materials to train a non-generative AI research tool, the court found that copyright infringement had occurred through unauthorised use of legal headnotes and structure to train a competing research tool.[13] Collectively, these cases, which now exceed forty pending lawsuits,[14] center on a pivotal legal question: whether using copyrighted works for AI training is fair use, particularly when employed in generative systems producing output.

Against this contentious backdrop, the United States Copyright Office (‘Office’) advanced this discourse on May 9 2025, by releasing a pre-publication draft of Part 3 of its comprehensive AI policy report.[15] In March 2023, it issued guidance confirming that human authorship is required for copyright registration, and that applicants must disclose any AI-generated content exceeding a de minimis threshold, along with a description of the human author’s contribution. Following this, the Office issued a notice of Inquiry, soliciting public comments on AI and copyright. It received over 10,000 submissions, which informed the analysis and recommendations presented in the current report. Part 1 and Part 2 of the Office’s Initiative, addressing digital replicas and copyrightability respectively, laid essential groundwork for this third report; the Center for Art Law has published further commentary on both which can be found here for Part 1 and here for Part 2. This latest report offers the most detailed articulation yet of how copyright law applies to the training of generative AI models. Yet its release coincides with exceptional institutional turbulence. Register Shira Perlmutter’s dismissal[16] days after the report’s publication raises questions about what changes new management might enact. This timing may be particularly delicate for pending cases like Kadrey v. Meta[17] and Bartz v. Anthropic,[18] which directly echo the report’s analysis. Though the report is not legally binding, it enters a legal ecosystem potentially shaping interpretive norms where AI copyright doctrine is actively evolving.

Technical Primer

The Office’s pre-publication recognizes that answers to these legal questions must be technically precise regarding how generative AI systems interact with protected works. Before it considers fair use defenses, the Office systematically lays out how machine learning workflows inherently implicate exclusive rights under copyright law. This technical foundation identifies three essential points of pressure: reproduction rights affected when datasets are being created, the possible embodiment of protected expression with model parameters under memorization, and the dangers characteristic of retrieval-augmented generation systems.

Datasets

Generative AI models, including large-scale language models as well as image generators, are developed through machine learning techniques that deliberately reproduce copyrighted material.[19] Every stage of dataset creation is potentially copyright infringement under 17 U.S.C. § 106(1): the initial downloading from online sources, format conversion, cross-medium transfers, and creation of modified subsets or filtered corpora. Such operations may concurrently implicate the derivative work right under § 106(2) when involving recasting or transformation of original expression through abridgements, condensations, or other adaptations.

Model Weights

The Office finds that model weights, numerical parameters encoding learned patterns, may represent copies of protected expression where there is substantial memorization involved, implicating reproduction and derivative rights under copyright law. As articulated on page 30 of its report:

‘…whether a model’s weights implicate the reproduction or derivative work rights turns on whether the model has retained or memorized substantial protectable expression from the works at issue.’[20]

This determination hinges on a fact-specific inquiry: when weights enable outputting verbatim or near-identical content from training data, the Office asserts there is a strong argument that copying those weights infringes memorized works. Judicial approaches reflect this fact-intensive standard, diverging significantly, as seen in Kadrey v. Meta Platforms[21] dismissing claims as ‘nonsensical’ absent allegations of infringing outputs, while Andersen v. Stability AI[22] permitted claims against third party users where plaintiffs demonstrated protected elements persisted within weights. The Office endorses Andersen’s standard, clarifying that infringement turns on whether ‘the model has retained or memorized substantial protectable expression.’ Critically, when protectable material is embedded in weights, subsequent distribution or reuse, even by parties uninvolved in training, could constitute prima facie infringement, creating downstream liability risks that extend far beyond initial model development.[23]

RAG

The Office’s report adopts a notably more assertive stand on retrieval-augmented (RAG) systems than other AI training methods, focusing particularly on the unique legal risks they present. Unlike conventional generative AI models built up from pre-trained datasets, RAG systems actively retrieve and incorporate real-time data from the outside world during output generation.[24] Accordingly, RAG can be understood as functioning in two steps: the system first copies the source materials into a retrieval database, and then, when prompted by a user query, outputs them again. While such an architecture improves accuracy to reality, both the initial unauthorized reproduction and the later relaying of that material are potential copyright infringements which do not qualify as fair use. These remarks hold especially true when one is summarizing or abridging copyrighted works like news stories rather than merely linking to them.

This categorical stance stems from RAG’s close connection to traditional content markets. With routine AI training, works find their way into the confines of patterns and statistical norms. But RAG outputs retain verbatim excerpts and at times compete directly with originals, threatening core revenue streams for rights holders. For instance, systems found in Perplexity AI,[25] now facing the first US lawsuit targeting RAG technology,[26] allegedly enable users to ‘skip the links’ to go to source material. This diverts traffic and ad revenue away from publishers like The Wall Street Journal that used to bring their reader directly to inside stories through hyperlinks. Unlike established cases like Authors Guild v. Google,[27] RAG itself does not use snippet functions to help people find sources of information. This is where RAG is so different from the past: it risks blending the original and the derived to blur the line between search utility and a competitor commercial service. Having relied heavily upon unauthorized sources, RAG’s activities are a commercial choice rather than one driven by technical necessity because there are feasible alternatives such as licensed APIs.[28] This weakens the argument for fair use as a transformative defence, as RAG’s outputs frequently repeat the expressive purpose and economic value of the underlying works. In essence, the Office’s sharp condemnation of RAG signals a pivotal shift; as licensing markets for training data mature, unlicensed real-time ingestion faces existential legal threats. Cours are increasingly tasked with reconciling innovation incentives with the uncompensated exploitation that drives what some see as RAG’s double-barreled infringement.

Fair Use Factors

The Office’s report thoroughly refutes the assumption that AI training automatically enjoys broad fair use coverage, emphasisng that when it comes to creating datasets from copyrighted works, copying them constitutes prima facie infringement under 17 U.S.C. § 106(1). Against this backdrop, the Office applies the statutory four-factor test under §107 with notable rigour, rejecting categorical exemptions for machine learning. Pre-publication guidance explores these factors in depth under section IV, which will be covered below.

First Factor

The Office’s first factor analysis, centered on the purpose and character of use, applies the Supreme Court’s framework in Warhol v. Goldsmith,[29] rejecting absolute claims of transformativeness and instead demanding that the actualities of use be closely scrutinized. The Office stresses that the potential for transformation cannot be judged purely on how models are trained; instead courts must consider what those trained models do in the field. This approach explicitly incorporates Warhol’s instruction to evaluate the ‘purpose and function’ in relation to original artwork, moving from straightforward textual comparisons of content incorporated or resembled to whether outputs serve as substitutes on the market.

Adam Liptak, Supreme Court Rules Against Warhol Foundation in Prince Case, N.Y. Times (May 18, 2023), https://www.nytimes.com/2023/05/18/us/andy-warhol-prince-lynn-goldsmith.html.

Critically, the report dismantles two key industry arguments. First, that training is a mechanical process that creates non-experiential reality by computer input, and secondly, that it parallels human learning.[30] The Office counters that generative models transform not only semantic meanings but the expensive genre of copyrighted works as well; they study in particular ‘how words are selected and arranged at the sentence, paragraph, and document level.’ This stands in stark contrast to human memory, where learners retain imperfect impressions filtered through unique perspectives. While humans provide the creative ecosystem that the marketplace must have to live off in derivative work, AI reproduces content beyond human speed and scale which enables market-disruptive reproduction.

Further, this analysis spells out protective measures after the deployment and incorporation of data as specific pointers. Proof that the author installed robust guardrails to prevent verbatim output might validate transformativeness by revealing intent that systems be used for different purposes- random objectives can never work, as cautioned in Warhol, and if reality contradicts intention, there is nothing to back it up. Simultaneously, extensive use of pirated datasets weighs against fair use, especially if models generate content competing with the works illegally accessed by trained agents, a reality now germane to ongoing litigation, due largely to most large language models’ dependence on shadow databases.[31]

Ultimately, the Office adopts a nuanced assessment for transformativeness in generative AI. If models are trained on specific genres to produce content for identical audiences, the use is at best moderately transformative given shared commercial and expressive purposes. This calculus weighs input-side considerations (data legality, training indent) against output consequences (market substitution, functional divergence), to ensure transformativeness never outweighs other fair use analysis. As Warhol affirmed and the Office endorses, a transformative use can still infringe upon an original work if it serves the same purpose and market.

Second Factor

The Office’s examination of the second fair use factor, the nature of the copyrighted work, applies the Supreme Court’s framework recognizing that creative expression resides at the core of copyright protective purpose, while factual or functional materials occupy a more peripheral position. As per Campbell v. Acuff-Rose Music,[32] this factor acknowledges ‘some works are closer to the core of intended copyright protection than others,’ establishing a graduated spectrum where visual artworks command stronger safeguards than code, scholarly articles, or news reports. This hierarchy, articulated in Sony v. Universal,[33] renders the use of highly creative works less likely to qualify as fair use- a principle carrying particular force in generative AI contexts where training sets include content that is not highly expressive.

Publication status further informs this analysis as a judicially recognised gloss on the statutory factor. Though Congress amended §107 to clarify that unpublished status is not dispositive, Swatch Group Management v. Bloomberg LP[34] established that unpublished works weigh against fair use given copyright’s traditional role in protecting first publication rights. The Office notes most AI training datasets consist of published materials, which ‘modestly support a fair use argument’[35] per consensus, while cautioning that unpublished content, whether inadvertently ingested or deliberately sourced, intensifies infringement risks.

Industry submissions reinforce this bifurcation, observing that training on novels or visual artworks fits squarely within copyright’s protective domain whereas functional code or factual compilations present weaker claims. As the Authors Guild emphasised,[36] the second factor ‘would weigh against fair use where works are highly creative and closer to the heart of copyright,’ particularly for visual artworks whose value lies in expressive singularity. Nevertheless, the Office concurs with commenters who view this factor as rarely decisive alone, noting its doctrinal gravity is typically subordinate to commercial purpose and market harm. Ultimately, the Office concludes that where training relies on unpublished materials or highly expressive works, this factor will disfavor fair use.

Third Factor

The Copyright Office’s third-factor analysis, evaluating the amount and substantiality of copyrighted material used, confronts the reality that generative AI systems typically ingest entire works during training. Under §107, this factor examines whether the quantity copied is ‘reasonable in relation to the purpose of the copying,’[37] a context-sensitive inquiry that diverges sharply from precedents like Authors Guild v. Google.[38] Where Google Books’ full-text copying enabled non-expressive search functions and limited snippet displays, the Office emphasises that AI’s wholesale ingestion lacks comparable transformative justification, observing that ‘the use of entire copyrighted works is less clearly justified in the context of AI training than it was for Google books or thumbnail image search.’[39]

Crucially, the report rejects categorical condemnation of full-work copying, acknowledging that functional necessity may render such scale reasonable if developers demonstrate both 1) a highly transformative purpose for training, and 2) robust technical safeguards preventing output of substantially similar protected expression. This nuanced calibration reflects Sega Enterprises v. Accolade’s legacy[40] where reverse-engineering entire software packages was deemed reasonable for interoperability while underscoring AI’s distinct risks; absent guardrails, models risk regurgitating protected content at scale. The analysis positions output controls as pivotal mitigators; where effective constraints exist, the third factor’s weight against fair use diminishes proportionality.

Yet the Office tempers this flexibility with stark caution. Training on qualitatively significant portions such as a photograph’s compositional essence, intensifies infringement concerns even when quantitatively minor, per Harper & Row’s ‘heart of the work’ doctrine.[41] Unpublished materials attract particular scrutiny, as their unauthorised ingestion deprives rights holders of first publication control. Ultimately, while full-scale copying proves functionally necessary for model optimisation, its justification remains contingent on evidence that deployment contexts avoid market substitution.

Fourth Factor

The Copyright Office’s analysis of the fourth fair use factor, effect on the potential market for or value of the copyrighted work, arguably constitutes the report’s most consequential and controversial intervention, introducing market dilution as a novel theory of harm that expands traditional infringement paradigms. While reaffirming established harms like lost sales from direct displacement by AI-generated substitutes, and lost licensing opportunities, emphasising that feasible markets for training data ‘disfavor fair use where licensing options exist,’[42] the Office contends that generative AI’s unprecedented scale enables uniquely corrosive market effects. Specifically, the report warns that AI’s capacity for stylistic imitation, even absent verbatim copying, could flood markets with outputs that lower prices, reduce demand for original works, and hurt authorship by saturating creative sectors with algorithmically generated content. This dilution theory, while acknowledging that copyright traditionally targets infringement rather than competition, posits that the speed and scale of AI output production threatens to devalue human creativity in ways courts have never before confronted it.

The Office grounds this theory in statutory language protecting a work’s ‘value’, arguing that style implicates ‘protectable elements of authorship’[43] and that saturation by stylistically derivative AI outputs could diminish a creator’s commercial distinctiveness. Though analogizing to Sony Corp v. Universal City Studios,[44] where the Court considered harms from ‘widespread’ unauthorised copying, the report concedes market dilution enters ‘uncharted territory’ judicially. No court has yet adopted such a framework, and its viability hinges on whether judges accept that non-infringing stylistic competition can constitute cognizable harm under fair use’s fourth factor. The Office acknowledges this theory’s vulnerability, noting courts may demand empirical evidence beyond policy concerns or anecdotal examples and that its persuasive authority under Skidmore deference depends on the strength of its reasoning.

Importantly, the dilution theory may face several doctrinal tensions. Firstly, copyright historically permits market competition from non-infringing works, even when it harms original creators.[45] Objections to AI-driven dilution stem from its ease of production,distribution, and resulting scale, raising questions about whether copyright should shield markets from technological disruption. Secondly, critics contend that recognising dilution could paradoxically stifle creativity by enabling rights holders to suppress tools producing non-infringing works, potentially chilling production and distribution of new works by human creators leveraging AI ethically.[46] Finally, the Office subtly invokes creators’ ‘economic and moral interests’ in their works’ unique stylistic value, aligning with scholarly views that ‘value’ encompasses non-substitutionary harms like lost attribution or cultural decontextualisation.[47]

Amid ongoing litigation like Kadrey v. Meta, where courts grapple with output-based market effects, the report’s dilution framework offers plaintiffs a strategic tool to argue systemic harm beyond individual infringement. Yet its ultimate judicial reception remains uncertain, particularly given the Office’s concurrent political upheaval and the theory’s departure from precedent, the dilution framework challenges the AI industry by inviting courts to reconsider whether copyright’s purpose, protecting the ‘fruits of intellectual labor’, must evolve to address algorithmic economies of scale.

Licensing

The Office’s report champions voluntary and collective licensing as the optimal path to resolve AI training disputes, explicitly favoring market-driven solutions over regulatory intervention. This approach recognises emerging industry practices; visual media platforms like Getty Images offer structured reuse agreements.[48] These real-world models demonstrate that scalable compensation frameworks are feasible, reducing transaction costs while enabling tailored terms for duration, exclusivity, and territorial scope.

For contexts where direct licensing remains impractical, the Office endorses extended collective licensing as a supplementary mechanism (ECL). Modeled on Scandinavian and UK systems, ECL empowers certified collecting management organizations to license entire repertoires (including non-members’ works) under government oversight, subject to robust opt-out rights that preserve creator autonomy. Such frameworks address the ‘copyright iceberg’[49] problem by covering orphan works and simplifying bulk permissions. Crucially, the Office rejects compulsory licensing as premature and incompatible with US copyright principles, noting the absence of systemic market failure justifying state-mandated rates. Voluntary agreements between AI developers and publishers, such as Adobe’s compensated artist partnerships for Firefly training,[50] demonstrate functional market dynamics without government coercion. While acknowledging ECL’s potential to bridge gaps, the report cautions against premature regulatory intrusion, emphasizing that licensing markets require space to evolve organically. Instead it advocates for targeted guardrails: certification standards to ensure CMO representativeness, ironclad opt-out protections, and pilot programs in discrete sectors like academic publishing before broader implementation.

Concluding Thoughts

Cooper and Grimmelmann’s incisive reminder, that AI models are not ‘magical portals’ extracting knowledge from parallel universes but data structures built from human creative labor, anchors the Office’s report. The Office methodically establishes that training generative AI implicates reproduction rights at every stage: dataset creation, weight memorization, and RAG’s real-time copying. Its rigorous fair use analysis dismantles industry claims of inherent transformativeness, instead demanding context-specific scrutiny of outputs and market harm. Most provocatively, it endorses market dilution as recognizable injury, implying that stylistic imitation at scale devalues human artistry even without infringement.

Yet the report’s release amid leadership upheaval and pending litigation leaves its authority in flux. While championing voluntary licensing as the optimal path, its novel doctrinal frameworks, particularly dilution, face untested judicial terrain. Ultimately, the Office charts a pragmatic course, acknowledging AI’s technical necessities while centering copyright’s mandate to protect creative labor. As Cooper and Grimmelmann caution, progress lies not in magical thinking about ‘parallel universes’, but in ethically engaging the human expression fueling these systems. The path forward demands negotiated coexistence, where innovation credits its sources, and creation retains its worth.

Suggested readings:

Why A.I. isn’t going to make art, The New Yorker, August 31 2024

Understanding artists’ perspectives on generative AI art and transparency, ownership, and fairness, AI Hub, January 14 2025

How artists are using generative AI to celebrate the natural world, UK Creative Festival, January 15 2025

Stopping the Trump Administration’s Unlawful Firing Of Copyright Office Director, Democracy Forward, May 22 2025

About the author:

Juliette Groothaert (Summer Intern 2025, Center for Art Law) is a law student at the University of Bristol, graduating in 2025. She is interested in the evolving relationship between intellectual property law and artistic expression, which she hopes to explore further through an LLM next year. As a summer legal intern, she is contributing to research in this field while contributing to the Center’s Nazi-Looted Art Database.

Select Sources:

  1. DALL·E 3 is a text-to-image model developed by OpenAI that uses deep learning to generate digital images from text prompts. Released in October 2023, it is integrated into ChatGPT for Plus and Enterprise users, and is also accessible via OpenAI’s API and Labs platform. ↑
  2. A Feder Cooper and James Grimmelmann, ‘The Files are in the Computer: Copyright, Memorization and Generative AI’ (2023) 23–24 ↑
  3. Adam Zewe, ‘Explained: Generative AI’ (MIT News, 9 November 2023) https://news.mit.edu/2023/explained-generative-ai-1109 ↑
  4. Jordan Hoffmann and others, ‘Training Compute‑Optimal Large Language Models’ (arXiv, 29 March 2022) 1 https://arxiv.org/abs/2203.15556 ↑
  5. Digital Media Licensing Association (DMLA), Initial Comments in response to U.S. Copyright Office Copyright and Artificial Intelligence: Part 3 – Generative AI Training (Pre‑publication version, March 2025) 10–11 ↑
  6. Competition and Markets Authority, AI Foundation Models – Technical update report (GOV.UK, 16 April 2024) 1, 85 https://assets.publishing.service.gov.uk/media/661e5a4c7469198185bd3d62/AI_Foundation_Models_technical_update_report.pdf ↑
  7. Gil Appel, Juliana Neelbauer and David A Schweidel, ‘Generative AI Has an Intellectual Property Problem’ (Harvard Business Review, 7 April 2023) https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem ↑
  8. The New York Times Co. v. Microsoft Corp., No. 1:24-cv-00034 (S.D.N.Y. filed Dec. 27, 2023) ↑
  9. Zhang v. Google LLC, No. 3:24-cv-00487 (N.D. Cal. filed Jan. 26, 2024) ↑
  10. Andersen v. Stability AI Ltd., No. 3:23-cv-00201 (N.D. Cal. filed Jan. 13, 2023) ↑
  11. Getty Images (US), Inc. v. Stability AI, Inc., No. 1:23-cv-00135 (D. Del. filed Feb. 3, 2023) ↑
  12. Thomson Reuters Enter. Ctr. GmbH v. Ross Intelligence Inc., No. 1:20-cv-00613 (D. Del. filed May 6, 2020) ↑
  13. Id. ↑
  14. Five Takeaways from the Copyright Office’s Controversial New AI … (Copyright Lately) https://copyrightlately.com/copyright-office-ai-report/ ↑
  15. U.S. Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 1–3 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  16. Lisa O’Carroll, ‘Trump fires copyright office supremo Shira Perlmutter after AI report’ (The Guardian, 12 May 2025) https://www.theguardian.com/us-news/2025/may/12/trump-fires-copyright-office-shira-perlmutter ↑
  17. Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417 (N.D. Cal. filed July 7, 2023) ↑
  18. Bartz v. Anthropic PBC, No. 2:24-cv-01523 (C.D. Cal. filed Mar. 1, 2024) ↑
  19. The Development of Generative Artificial Intelligence from a Copyright Perspective (European Parliament, JURI Committee, Study prepared by University of Turin & Nexa Centre, 12 May 2025) 1–5 https://www.europarl.europa.eu/meetdocs/2024_2029/plmrep/COMMITTEES/JURI/DV/2025/05-12/2025.05.12_item6_Study_GenAIfromacopyrightperspective_EN.pdf ↑
  20. U.S. Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 1–2 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  21. Kadrey v. Meta Platforms, Inc., No. 3:23‑cv‑03417‑VC (N.D. Cal. filed July 7, 2023) ↑
  22. Andersen v. Stability AI Ltd., No. 3:23‑cv‑00201‑WHO (N.D. Cal. filed Jan. 13, 2023) ↑
  23. Aleksander Goranin, ‘A Deep Look at Copyright’s Volitional Conduct Doctrine and Generative Artificial Intelligence’ (forthcoming, Emory Law Journal) ↑
  24. Google Cloud, Retrieval‑Augmented Generation use case (Google Cloud, last updated 13 June 2025)https://cloud.google.com/use-cases/retrieval-augmented-generation ↑
  25. Dan Jasnow, Danielle W Bulger and Nardeen Billan, ‘Generative AI Meets Generative Litigation: News Corp Continues Its Battle Against Perplexity AI’ (National Law Review, 20 December 2024) https://natlawreview.com/article/generative-ai-meets-generative-litigation-news-corp-continues-its-battle-against ↑
  26. Dow Jones & Co., Inc. v. Perplexity AI, Inc., No. 24-CV-7984 (S.D.N.Y. Dec. 11, 2024) ↑
  27. Authors Guild, Inc. v. Google, Inc., 804 F.3d 202 (2d Cir. 2015) ↑
  28. LexisNexis Launches Data+ API for AI Training (Artificial Lawyer, 9 December 2024) https://www.artificiallawyer.com/2024/12/09/lexisnexis-launches-data-api-for-ai-training/ ↑
  29. Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508 (2023) ↑
  30. Tiago Freitas and Eliot Mannoia, ‘Parallels Between Biological and Artificial Brains: Isolation vs Recursive Training’ (BrandKarma, 10 November 2024) https://www.brandkarma.at/opinions/parallels-between-biological-and-artificial-brains-isolation-vs-recursive-training/ ↑
  31. LLM Security and Prompt Engineering Digest: LLM Shadows (Adversa.ai, 3 August 2023) https://adversa.ai/blog/llm-security-and-prompt-engineering-digest-llm-shadows/ ↑
  32. Campbell v. Acuff‑Rose Music, Inc., 510 U.S. 569 (1994) ↑
  33. Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417 (1984) ↑
  34. Swatch Group Mgmt. Servs. Ltd. v. Bloomberg L.P., 742 F.3d 17 (2d Cir. 2014) ↑
  35. New Media Rights, Initial Comments in response to US Copyright Office, Copyright and Artificial Intelligence: Part 3 – Generative AI Training (Pre‑publication version, 9 May 2025) 16; Data Provenance Initiative, Initial Comments ibid 10–11; Katherine Lee and others, Initial Comments ibid 102. ↑
  36. The Authors Guild, Initial Comments in response to US Copyright Office, Copyright and Artificial Intelligence: Part 3 – Generative AI Training (Pre‑publication version, 9 May 2025) 20. ↑
  37. United States Code, Title 17 § 107 (2023) ↑
  38. Authors Guild, Inc. v. Google Inc., 804 F.3d 202 (2d Cir. Oct. 16, 2015) ↑
  39. US Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 57 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  40. Sega Enters. Ltd. v. Accolade, Inc., 977 F.2d 1510 (9th Cir. 1992) ↑
  41. Harper & Row v. Nation Enterprises | 471 U.S. 539 (1985) ↑
  42. US Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 54 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  43. TechNet, Initial Comments in response to US Copyright Office, Copyright and Artificial Intelligence: Part 3 – Generative AI Training (30 October 2023) 11 ↑
  44. Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417 (1984) ↑
  45. World Intellectual Property Organization, Copyright, Competition and Development (WIPO‑mandated survey by Max Planck Institute, December 2013) https://www.wipo.int/export/sites/www/competition-policy/en/docs/copyright_competition_development.pdf ↑
  46. World Intellectual Property Organization, Copyright, Competition and Development (WIPO‑mandated survey by Max Planck Institute, December 2013) https://www.wipo.int/export/sites/www/competition-policy/en/docs/copyright_competition_development.pdf ↑
  47. Todd A Carpenter, ‘Ensuring attribution is critical when licensing content to AI developers’ (The Scholarly Kitchen, 4 September 2024) https://scholarlykitchen.sspnet.org/2024/09/04/make-attribution-mandatory-in-ai-licensing/ ↑
  48. Getty Images, Content License Agreement (last updated October 2024) https://www.gettyimages.co.uk/eula ↑
  49. George H Pike, ‘AI and Copyright: Steering Around the Iceberg’ (Information Today, vol 40 no 8, October 2023) 24 ↑
  50. Adobe, ‘Adobe’s approach to customer choice in AI models’ (Adobe Blog, 18 March 2025) https://blog.adobe.com/en/publish/2025/03/18/adobes-approach-customer-choice-in-ai-models ↑

Juliette Groothaert

Upon asking DALL-E 3[1] to “create a scenic view of the sea in the style of Van Gogh”, the image appearing on the left was generated within seconds. When compared to The Starry Night on the right, the stylistic resemblance is immediately apparent: swirling skies, radiating light forms, bold brushstrokes, and bright color contrasts.

Yet, as reminded by Cooper and Grimmelman, ‘a model is not a magical portal that pulls fresh information from some parallel universe into our own.’[2]

This basic understanding provides critical context for understanding the copyright implications of generative AI. Generative AI models, as sophisticated data-driven structures, operate on mathematical constructs derived wholly from their training datasets.[3] The expanding general usability of these models has only intensified the demand for such datasets.[4] To enhance quality, accuracy, and flexibility, industry submissions confirm these systems typically require ‘millions or billions of works for training purposes,’[5] including terabyte-scale datasets for foundation models.[6] As a result, this reliance on pre-existing copyrighted materials has catalyzed numerous legal challenges.[7]

Several prominent examples include The New York Times v. Microsoft Corp[8] case involving unauthorised use of proprietary journalism to train language models; visual arts disputes such as Zhang v. Google LLC,[9] Andersen v. Stability AI,[10] and Getty Images v. Stability AI;[11] and most importantly, the landmark ruling in Thomson Reuters v. Ross Intelligence.[12] In Reuters, although concerning the use of copyrighted legal materials to train a non-generative AI research tool, the court found that copyright infringement had occurred through unauthorised use of legal headnotes and structure to train a competing research tool.[13] Collectively, these cases, which now exceed forty pending lawsuits,[14] center on a pivotal legal question: whether using copyrighted works for AI training is fair use, particularly when employed in generative systems producing output.

Against this contentious backdrop, the United States Copyright Office (‘Office’) advanced this discourse on May 9 2025, by releasing a pre-publication draft of Part 3 of its comprehensive AI policy report.[15] In March 2023, it issued guidance confirming that human authorship is required for copyright registration, and that applicants must disclose any AI-generated content exceeding a de minimis threshold, along with a description of the human author’s contribution. Following this, the Office issued a notice of Inquiry, soliciting public comments on AI and copyright. It received over 10,000 submissions, which informed the analysis and recommendations presented in the current report. Part 1 and Part 2 of the Office’s Initiative, addressing digital replicas and copyrightability respectively, laid essential groundwork for this third report; the Center for Art Law has published further commentary on both which can be found here for Part 1 and here for Part 2.. This latest report offers the most detailed articulation yet of how copyright law applies to the training of generative AI models. Yet its release coincides with exceptional institutional turbulence. Register Shira Perlmutter’s dismissal[16] days after the report’s publication raises questions about what changes new management might enact. This timing may be particularly delicate for pending cases like Kadrey v. Meta[17] and Bartz v. Anthropic,[18] which directly echo the report’s analysis. Though the report is not legally binding, it enters a legal ecosystem potentially shaping interpretive norms where AI copyright doctrine is actively evolving.

Technical Primer

The Office’s pre-publication recognizes that answers to these legal questions must be technically precise regarding how generative AI systems interact with protected works. Before it considers fair use defenses, the Office systematically lays out how machine learning workflows inherently implicate exclusive rights under copyright law. This technical foundation identifies three essential points of pressure: reproduction rights affected when datasets are being created, the possible embodiment of protected expression with model parameters under memorization, and the dangers characteristic of retrieval-augmented generation systems.

Datasets

Generative AI models, including large-scale language models as well as image generators, are developed through machine learning techniques that deliberately reproduce copyrighted material.[19] Every stage of dataset creation is potentially copyright infringement under 17 U.S.C. § 106(1): the initial downloading from online sources, format conversion, cross-medium transfers, and creation of modified subsets or filtered corpora. Such operations may concurrently implicate the derivative work right under § 106(2) when involving recasting or transformation of original expression through abridgements, condensations, or other adaptations.

Model Weights

The Office finds that model weights, numerical parameters encoding learned patterns, may represent copies of protected expression where there is substantial memorization involved, implicating reproduction and derivative rights under copyright law. As articulated on page 30 of its report:

‘…whether a model’s weights implicate the reproduction or derivative work rights turns on whether the model has retained or memorized substantial protectable expression from the works at issue.’[20]

This determination hinges on a fact-specific inquiry: when weights enable outputting verbatim or near-identical content from training data, the Office asserts there is a strong argument that copying those weights infringes memorized works. Judicial approaches reflect this fact-intensive standard, diverging significantly, as seen in Kadrey v. Meta Platforms[21] dismissing claims as ‘nonsensical’ absent allegations of infringing outputs, while Andersen v. Stability AI[22] permitted claims against third party users where plaintiffs demonstrated protected elements persisted within weights. The Office endorses Andersen’s standard, clarifying that infringement turns on whether ‘the model has retained or memorized substantial protectable expression.’ Critically, when protectable material is embedded in weights, subsequent distribution or reuse, even by parties uninvolved in training, could constitute prima facie infringement, creating downstream liability risks that extend far beyond initial model development.[23]

RAG

The Office’s report adopts a notably more assertive stand on retrieval-augmented (RAG) systems than other AI training methods, focusing particularly on the unique legal risks they present. Unlike conventional generative AI models built up from pre-trained datasets, RAG systems actively retrieve and incorporate real-time data from the outside world during output generation.[24] Accordingly, RAG can be understood as functioning in two steps: the system first copies the source materials into a retrieval database, and then, when prompted by a user query, outputs them again. While such an architecture improves accuracy to reality, both the initial unauthorized reproduction and the later relaying of that material are potential copyright infringements which do not qualify as fair use. These remarks hold especially true when one is summarizing or abridging copyrighted works like news stories rather than merely linking to them.

This categorical stance stems from RAG’s close connection to traditional content markets. With routine AI training, works find their way into the confines of patterns and statistical norms. But RAG outputs retain verbatim excerpts and at times compete directly with originals, threatening core revenue streams for rights holders. For instance, systems found in Perplexity AI,[25] now facing the first US lawsuit targeting RAG technology,[26] allegedly enable users to ‘skip the links’ to go to source material. This diverts traffic and ad revenue away from publishers like The Wall Street Journal that used to bring their reader directly to inside stories through hyperlinks. Unlike established cases like Authors Guild v. Google,[27] RAG itself does not use snippet functions to help people find sources of information. This is where RAG is so different from the past: it risks blending the original and the derived to blur the line between search utility and a competitor commercial service. Having relied heavily upon unauthorized sources, RAG’s activities are a commercial choice rather than one driven by technical necessity because there are feasible alternatives such as licensed APIs.[28] This weakens the argument for fair use as a transformative defence, as RAG’s outputs frequently repeat the expressive purpose and economic value of the underlying works. In essence, the Office’s sharp condemnation of RAG signals a pivotal shift; as licensing markets for training data mature, unlicensed real-time ingestion faces existential legal threats. Cours are increasingly tasked with reconciling innovation incentives with the uncompensated exploitation that drives what some see as RAG’s double-barreled infringement.

Fair Use Factors

The Office’s report thoroughly refutes the assumption that AI training automatically enjoys broad fair use coverage, emphasisng that when it comes to creating datasets from copyrighted works, copying them constitutes prima facie infringement under 17 U.S.C. § 106(1). Against this backdrop, the Office applies the statutory four-factor test under §107 with notable rigour, rejecting categorical exemptions for machine learning. Pre-publication guidance explores these factors in depth under section IV, which will be covered below.

First Factor

The Office’s first factor analysis, centered on the purpose and character of use, applies the Supreme Court’s framework in Warhol v. Goldsmith,[29] rejecting absolute claims of transformativeness and instead demanding that the actualities of use be closely scrutinized. The Office stresses that the potential for transformation cannot be judged purely on how models are trained; instead courts must consider what those trained models do in the field. This approach explicitly incorporates Warhol’s instruction to evaluate the ‘purpose and function’ in relation to original artwork, moving from straightforward textual comparisons of content incorporated or resembled to whether outputs serve as substitutes on the market.

Adam Liptak, Supreme Court Rules Against Warhol Foundation in Prince Case, N.Y. Times (May 18, 2023), https://www.nytimes.com/2023/05/18/us/andy-warhol-prince-lynn-goldsmith.html.

Critically, the report dismantles two key industry arguments. First, that training is a mechanical process that creates non-experiential reality by computer input, and secondly, that it parallels human learning.[30] The Office counters that generative models transform not only semantic meanings but the expensive genre of copyrighted works as well; they study in particular ‘how words are selected and arranged at the sentence, paragraph, and document level.’ This stands in stark contrast to human memory, where learners retain imperfect impressions filtered through unique perspectives. While humans provide the creative ecosystem that the marketplace must have to live off in derivative work, AI reproduces content beyond human speed and scale which enables market-disruptive reproduction.

Further, this analysis spells out protective measures after the deployment and incorporation of data as specific pointers. Proof that the author installed robust guardrails to prevent verbatim output might validate transformativeness by revealing intent that systems be used for different purposes- random objectives can never work, as cautioned in Warhol, and if reality contradicts intention, there is nothing to back it up. Simultaneously, extensive use of pirated datasets weighs against fair use, especially if models generate content competing with the works illegally accessed by trained agents, a reality now germane to ongoing litigation, due largely to most large language models’ dependence on shadow databases.[31]

Ultimately, the Office adopts a nuanced assessment for transformativeness in generative AI. If models are trained on specific genres to produce content for identical audiences, the use is at best moderately transformative given shared commercial and expressive purposes. This calculus weighs input-side considerations (data legality, training indent) against output consequences (market substitution, functional divergence), to ensure transformativeness never outweighs other fair use analysis. As Warhol affirmed and the Office endorses, a transformative use can still infringe upon an original work if it serves the same purpose and market.

Second Factor

The Office’s examination of the second fair use factor, the nature of the copyrighted work, applies the Supreme Court’s framework recognizing that creative expression resides at the core of copyright protective purpose, while factual or functional materials occupy a more peripheral position. As per Campbell v. Acuff-Rose Music,[32] this factor acknowledges ‘some works are closer to the core of intended copyright protection than others,’ establishing a graduated spectrum where visual artworks command stronger safeguards than code, scholarly articles, or news reports. This hierarchy, articulated in Sony v. Universal,[33] renders the use of highly creative works less likely to qualify as fair use- a principle carrying particular force in generative AI contexts where training sets include content that is not highly expressive.

Publication status further informs this analysis as a judicially recognised gloss on the statutory factor. Though Congress amended §107 to clarify that unpublished status is not dispositive, Swatch Group Management v. Bloomberg LP[34] established that unpublished works weigh against fair use given copyright’s traditional role in protecting first publication rights. The Office notes most AI training datasets consist of published materials, which ‘modestly support a fair use argument’[35] per consensus, while cautioning that unpublished content, whether inadvertently ingested or deliberately sourced, intensifies infringement risks.

Industry submissions reinforce this bifurcation, observing that training on novels or visual artworks fits squarely within copyright’s protective domain whereas functional code or factual compilations present weaker claims. As the Authors Guild emphasised,[36] the second factor ‘would weigh against fair use where works are highly creative and closer to the heart of copyright,’ particularly for visual artworks whose value lies in expressive singularity. Nevertheless, the Office concurs with commenters who view this factor as rarely decisive alone, noting its doctrinal gravity is typically subordinate to commercial purpose and market harm. Ultimately, the Office concludes that where training relies on unpublished materials or highly expressive works, this factor will disfavor fair use.

Third Factor

The Copyright Office’s third-factor analysis, evaluating the amount and substantiality of copyrighted material used, confronts the reality that generative AI systems typically ingest entire works during training. Under §107, this factor examines whether the quantity copied is ‘reasonable in relation to the purpose of the copying,’[37] a context-sensitive inquiry that diverges sharply from precedents like Authors Guild v. Google.[38] Where Google Books’ full-text copying enabled non-expressive search functions and limited snippet displays, the Office emphasises that AI’s wholesale ingestion lacks comparable transformative justification, observing that ‘the use of entire copyrighted works is less clearly justified in the context of AI training than it was for Google books or thumbnail image search.’[39]

Crucially, the report rejects categorical condemnation of full-work copying, acknowledging that functional necessity may render such scale reasonable if developers demonstrate both 1) a highly transformative purpose for training, and 2) robust technical safeguards preventing output of substantially similar protected expression. This nuanced calibration reflects Sega Enterprises v. Accolade’s legacy[40] where reverse-engineering entire software packages was deemed reasonable for interoperability while underscoring AI’s distinct risks; absent guardrails, models risk regurgitating protected content at scale. The analysis positions output controls as pivotal mitigators; where effective constraints exist, the third factor’s weight against fair use diminishes proportionality.

Yet the Office tempers this flexibility with stark caution. Training on qualitatively significant portions such as a photograph’s compositional essence, intensifies infringement concerns even when quantitatively minor, per Harper & Row’s ‘heart of the work’ doctrine.[41] Unpublished materials attract particular scrutiny, as their unauthorised ingestion deprives rights holders of first publication control. Ultimately, while full-scale copying proves functionally necessary for model optimisation, its justification remains contingent on evidence that deployment contexts avoid market substitution.

Fourth Factor

The Copyright Office’s analysis of the fourth fair use factor, effect on the potential market for or value of the copyrighted work, arguably constitutes the report’s most consequential and controversial intervention, introducing market dilution as a novel theory of harm that expands traditional infringement paradigms. While reaffirming established harms like lost sales from direct displacement by AI-generated substitutes, and lost licensing opportunities, emphasising that feasible markets for training data ‘disfavor fair use where licensing options exist,’[42] the Office contends that generative AI’s unprecedented scale enables uniquely corrosive market effects. Specifically, the report warns that AI’s capacity for stylistic imitation, even absent verbatim copying, could flood markets with outputs that lower prices, reduce demand for original works, and hurt authorship by saturating creative sectors with algorithmically generated content. This dilution theory, while acknowledging that copyright traditionally targets infringement rather than competition, posits that the speed and scale of AI output production threatens to devalue human creativity in ways courts have never before confronted it.

The Office grounds this theory in statutory language protecting a work’s ‘value’, arguing that style implicates ‘protectable elements of authorship’[43] and that saturation by stylistically derivative AI outputs could diminish a creator’s commercial distinctiveness. Though analogizing to Sony Corp v. Universal City Studios,[44] where the Court considered harms from ‘widespread’ unauthorised copying, the report concedes market dilution enters ‘uncharted territory’ judicially. No court has yet adopted such a framework, and its viability hinges on whether judges accept that non-infringing stylistic competition can constitute cognizable harm under fair use’s fourth factor. The Office acknowledges this theory’s vulnerability, noting courts may demand empirical evidence beyond policy concerns or anecdotal examples and that its persuasive authority under Skidmore deference depends on the strength of its reasoning.

Importantly, the dilution theory may face several doctrinal tensions. Firstly, copyright historically permits market competition from non-infringing works, even when it harms original creators.[45] Objections to AI-driven dilution stem from its ease of production,distribution, and resulting scale, raising questions about whether copyright should shield markets from technological disruption. Secondly, critics contend that recognising dilution could paradoxically stifle creativity by enabling rights holders to suppress tools producing non-infringing works, potentially chilling production and distribution of new works by human creators leveraging AI ethically.[46] Finally, the Office subtly invokes creators’ ‘economic and moral interests’ in their works’ unique stylistic value, aligning with scholarly views that ‘value’ encompasses non-substitutionary harms like lost attribution or cultural decontextualisation.[47]

Amid ongoing litigation like Kadrey v. Meta, where courts grapple with output-based market effects, the report’s dilution framework offers plaintiffs a strategic tool to argue systemic harm beyond individual infringement. Yet its ultimate judicial reception remains uncertain, particularly given the Office’s concurrent political upheaval and the theory’s departure from precedent, the dilution framework challenges the AI industry by inviting courts to reconsider whether copyright’s purpose, protecting the ‘fruits of intellectual labor’, must evolve to address algorithmic economies of scale.

Licensing

The Office’s report champions voluntary and collective licensing as the optimal path to resolve AI training disputes, explicitly favoring market-driven solutions over regulatory intervention. This approach recognises emerging industry practices; visual media platforms like Getty Images offer structured reuse agreements.[48] These real-world models demonstrate that scalable compensation frameworks are feasible, reducing transaction costs while enabling tailored terms for duration, exclusivity, and territorial scope.

For contexts where direct licensing remains impractical, the Office endorses extended collective licensing as a supplementary mechanism (ECL). Modeled on Scandinavian and UK systems, ECL empowers certified collecting management organizations to license entire repertoires (including non-members’ works) under government oversight, subject to robust opt-out rights that preserve creator autonomy. Such frameworks address the ‘copyright iceberg’[49] problem by covering orphan works and simplifying bulk permissions. Crucially, the Office rejects compulsory licensing as premature and incompatible with US copyright principles, noting the absence of systemic market failure justifying state-mandated rates. Voluntary agreements between AI developers and publishers, such as Adobe’s compensated artist partnerships for Firefly training,[50] demonstrate functional market dynamics without government coercion. While acknowledging ECL’s potential to bridge gaps, the report cautions against premature regulatory intrusion, emphasizing that licensing markets require space to evolve organically. Instead it advocates for targeted guardrails: certification standards to ensure CMO representativeness, ironclad opt-out protections, and pilot programs in discrete sectors like academic publishing before broader implementation.

Concluding Thoughts

Cooper and Grimmelmann’s incisive reminder, that AI models are not ‘magical portals’ extracting knowledge from parallel universes but data structures built from human creative labor, anchors the Office’s report. The Office methodically establishes that training generative AI implicates reproduction rights at every stage: dataset creation, weight memorization, and RAG’s real-time copying. Its rigorous fair use analysis dismantles industry claims of inherent transformativeness, instead demanding context-specific scrutiny of outputs and market harm. Most provocatively, it endorses market dilution as recognizable injury, implying that stylistic imitation at scale devalues human artistry even without infringement.

Yet the report’s release amid leadership upheaval and pending litigation leaves its authority in flux. While championing voluntary licensing as the optimal path, its novel doctrinal frameworks, particularly dilution, face untested judicial terrain. Ultimately, the Office charts a pragmatic course, acknowledging AI’s technical necessities while centering copyright’s mandate to protect creative labor. As Cooper and Grimmelmann caution, progress lies not in magical thinking about ‘parallel universes’, but in ethically engaging the human expression fueling these systems. The path forward demands negotiated coexistence, where innovation credits its sources, and creation retains its worth.

Suggested readings:

Why A.I. isn’t going to make art, The New Yorker, August 31 2024

Understanding artists’ perspectives on generative AI art and transparency, ownership, and fairness, AI Hub, January 14 2025

How artists are using generative AI to celebrate the natural world, UK Creative Festival, January 15 2025

Stopping the Trump Administration’s Unlawful Firing Of Copyright Office Director, Democracy Forward, May 22 2025

About the author:

Juliette is a final-year law student at the University of Bristol, graduating in 2025. She is interested in the evolving relationship between intellectual property law and artistic expression, which she hopes to explore further through an LLM next year. As a summer legal intern, she is contributing to research in this field while contributing to the Center’s Nazi-Looted Art Database.

  1. DALL·E 3 is a text-to-image model developed by OpenAI that uses deep learning to generate digital images from text prompts. Released in October 2023, it is integrated into ChatGPT for Plus and Enterprise users, and is also accessible via OpenAI’s API and Labs platform. ↑
  2. A Feder Cooper and James Grimmelmann, ‘The Files are in the Computer: Copyright, Memorization and Generative AI’ (2023) 23–24 ↑
  3. Adam Zewe, ‘Explained: Generative AI’ (MIT News, 9 November 2023) https://news.mit.edu/2023/explained-generative-ai-1109 ↑
  4. Jordan Hoffmann and others, ‘Training Compute‑Optimal Large Language Models’ (arXiv, 29 March 2022) 1 https://arxiv.org/abs/2203.15556 ↑
  5. Digital Media Licensing Association (DMLA), Initial Comments in response to U.S. Copyright Office Copyright and Artificial Intelligence: Part 3 – Generative AI Training (Pre‑publication version, March 2025) 10–11 ↑
  6. Competition and Markets Authority, AI Foundation Models – Technical update report (GOV.UK, 16 April 2024) 1, 85 https://assets.publishing.service.gov.uk/media/661e5a4c7469198185bd3d62/AI_Foundation_Models_technical_update_report.pdf ↑
  7. Gil Appel, Juliana Neelbauer and David A Schweidel, ‘Generative AI Has an Intellectual Property Problem’ (Harvard Business Review, 7 April 2023) https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem ↑
  8. The New York Times Co. v. Microsoft Corp., No. 1:24-cv-00034 (S.D.N.Y. filed Dec. 27, 2023) ↑
  9. Zhang v. Google LLC, No. 3:24-cv-00487 (N.D. Cal. filed Jan. 26, 2024) ↑
  10. Andersen v. Stability AI Ltd., No. 3:23-cv-00201 (N.D. Cal. filed Jan. 13, 2023) ↑
  11. Getty Images (US), Inc. v. Stability AI, Inc., No. 1:23-cv-00135 (D. Del. filed Feb. 3, 2023) ↑
  12. Thomson Reuters Enter. Ctr. GmbH v. Ross Intelligence Inc., No. 1:20-cv-00613 (D. Del. filed May 6, 2020) ↑
  13. Id. ↑
  14. Five Takeaways from the Copyright Office’s Controversial New AI … (Copyright Lately) https://copyrightlately.com/copyright-office-ai-report/ ↑
  15. U.S. Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 1–3 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  16. Lisa O’Carroll, ‘Trump fires copyright office supremo Shira Perlmutter after AI report’ (The Guardian, 12 May 2025) https://www.theguardian.com/us-news/2025/may/12/trump-fires-copyright-office-shira-perlmutter ↑
  17. Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417 (N.D. Cal. filed July 7, 2023) ↑
  18. Bartz v. Anthropic PBC, No. 2:24-cv-01523 (C.D. Cal. filed Mar. 1, 2024) ↑
  19. The Development of Generative Artificial Intelligence from a Copyright Perspective (European Parliament, JURI Committee, Study prepared by University of Turin & Nexa Centre, 12 May 2025) 1–5 https://www.europarl.europa.eu/meetdocs/2024_2029/plmrep/COMMITTEES/JURI/DV/2025/05-12/2025.05.12_item6_Study_GenAIfromacopyrightperspective_EN.pdf ↑
  20. U.S. Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 1–2 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  21. Kadrey v. Meta Platforms, Inc., No. 3:23‑cv‑03417‑VC (N.D. Cal. filed July 7, 2023) ↑
  22. Andersen v. Stability AI Ltd., No. 3:23‑cv‑00201‑WHO (N.D. Cal. filed Jan. 13, 2023) ↑
  23. Aleksander Goranin, ‘A Deep Look at Copyright’s Volitional Conduct Doctrine and Generative Artificial Intelligence’ (forthcoming, Emory Law Journal) ↑
  24. Google Cloud, Retrieval‑Augmented Generation use case (Google Cloud, last updated 13 June 2025)https://cloud.google.com/use-cases/retrieval-augmented-generation ↑
  25. Dan Jasnow, Danielle W Bulger and Nardeen Billan, ‘Generative AI Meets Generative Litigation: News Corp Continues Its Battle Against Perplexity AI’ (National Law Review, 20 December 2024) https://natlawreview.com/article/generative-ai-meets-generative-litigation-news-corp-continues-its-battle-against ↑
  26. Dow Jones & Co., Inc. v. Perplexity AI, Inc., No. 24-CV-7984 (S.D.N.Y. Dec. 11, 2024) ↑
  27. Authors Guild, Inc. v. Google, Inc., 804 F.3d 202 (2d Cir. 2015) ↑
  28. LexisNexis Launches Data+ API for AI Training (Artificial Lawyer, 9 December 2024) https://www.artificiallawyer.com/2024/12/09/lexisnexis-launches-data-api-for-ai-training/ ↑
  29. Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508 (2023) ↑
  30. Tiago Freitas and Eliot Mannoia, ‘Parallels Between Biological and Artificial Brains: Isolation vs Recursive Training’ (BrandKarma, 10 November 2024) https://www.brandkarma.at/opinions/parallels-between-biological-and-artificial-brains-isolation-vs-recursive-training/ ↑
  31. LLM Security and Prompt Engineering Digest: LLM Shadows (Adversa.ai, 3 August 2023) https://adversa.ai/blog/llm-security-and-prompt-engineering-digest-llm-shadows/ ↑
  32. Campbell v. Acuff‑Rose Music, Inc., 510 U.S. 569 (1994) ↑
  33. Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417 (1984) ↑
  34. Swatch Group Mgmt. Servs. Ltd. v. Bloomberg L.P., 742 F.3d 17 (2d Cir. 2014) ↑
  35. New Media Rights, Initial Comments in response to US Copyright Office, Copyright and Artificial Intelligence: Part 3 – Generative AI Training (Pre‑publication version, 9 May 2025) 16; Data Provenance Initiative, Initial Comments ibid 10–11; Katherine Lee and others, Initial Comments ibid 102. ↑
  36. The Authors Guild, Initial Comments in response to US Copyright Office, Copyright and Artificial Intelligence: Part 3 – Generative AI Training (Pre‑publication version, 9 May 2025) 20. ↑
  37. United States Code, Title 17 § 107 (2023) ↑
  38. Authors Guild, Inc. v. Google Inc., 804 F.3d 202 (2d Cir. Oct. 16, 2015) ↑
  39. US Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 57 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  40. Sega Enters. Ltd. v. Accolade, Inc., 977 F.2d 1510 (9th Cir. 1992) ↑
  41. Harper & Row v. Nation Enterprises | 471 U.S. 539 (1985) ↑
  42. US Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 54 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  43. TechNet, Initial Comments in response to US Copyright Office, Copyright and Artificial Intelligence: Part 3 – Generative AI Training (30 October 2023) 11 ↑
  44. Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417 (1984) ↑
  45. World Intellectual Property Organization, Copyright, Competition and Development (WIPO‑mandated survey by Max Planck Institute, December 2013) https://www.wipo.int/export/sites/www/competition-policy/en/docs/copyright_competition_development.pdf ↑
  46. World Intellectual Property Organization, Copyright, Competition and Development (WIPO‑mandated survey by Max Planck Institute, December 2013) https://www.wipo.int/export/sites/www/competition-policy/en/docs/copyright_competition_development.pdf ↑
  47. Todd A Carpenter, ‘Ensuring attribution is critical when licensing content to AI developers’ (The Scholarly Kitchen, 4 September 2024) https://scholarlykitchen.sspnet.org/2024/09/04/make-attribution-mandatory-in-ai-licensing/ ↑
  48. Getty Images, Content License Agreement (last updated October 2024) https://www.gettyimages.co.uk/eula ↑
  49. George H Pike, ‘AI and Copyright: Steering Around the Iceberg’ (Information Today, vol 40 no 8, October 2023) 24 ↑
  50. Adobe, ‘Adobe’s approach to customer choice in AI models’ (Adobe Blog, 18 March 2025) https://blog.adobe.com/en/publish/2025/03/18/adobes-approach-customer-choice-in-ai-models ↑

 

Disclaimer: This article is for educational purposes only and is not meant to provide legal advice. Readers should not construe or rely on any comment or statement in this article as legal advice. For legal advice, readers should seek a consultation with an attorney.

Post navigation

Previous Witness: The Testimony of a Beheaded Sculpture
Next Decades of Dispute: The Latest Bubon Bronze to be Repatriated to Türkiye

Related Posts

Boy George gives Bishop of Porfyrios of Neapolis an Icon

January 21, 2011
screenshot of the logo of the Institute for Museum and Library Services

Spotlight: Institute for Museum and Library Services (DC)

October 5, 2015

Nazi-Era Art Litigation: Orkin v Swiss Confederation

October 10, 2011
Center for Art Law
A Gift for You

A Gift for You

this Holiday Season

Celebrate the holidays with 20% off your annual subscription — claim your gift now!

 

Get your Subscription Today!
Guidelines AI and Art Authentication

AI and Art Authentication

Explore the new Guidelines for AI and Art Authentication for the responsible, ethical, and transparent use of artificial intelligence.

Download here
Center for Art Law

Follow us on Instagram for the latest in Art Law!

Did you know that Charles Dickens visited America Did you know that Charles Dickens visited America twice, in 1842 and in 1867? In between, he wrote his famous “A Tale of Two Cities,” foreshadowing upheavals and revolutions and suggesting that individual acts of compassion, love, and sacrifice can break cycles of injustice. With competing demands and obligations, finding time to read books in the second quarter of the 21st century might get increasingly harder. As we live in the best and worst of times again, try to enjoy the season of light and a good book (or a good newsletter).

From all of us at the Center for Art Law, we wish you peace, love, and understanding this holiday season. 

🔗 Read more by clicking the link in our bio!

#centerforartlaw #artlaw #legalresearch #artlawyer #december #newsletter #lawyer
Is it, or isn’t it, Vermeer? Trouble spotting fake Is it, or isn’t it, Vermeer? Trouble spotting fakes? You are not alone. Donate to the Center for Art Law, we are the real deal. 

🔗 Click the link in our bio to donate today!

#centerforartlaw #artlaw #legalresearch #endofyear #givingtuesday #donate #notacrime #framingartlaw
Whether legal systems are ready or not, artificial Whether legal systems are ready or not, artificial intelligence is making its way into the courtroom. AI-generated evidence is becoming increasingly common, but many legal professionals are concerned that existing legal frameworks aren't sufficient to account for ethical dilemmas arising from the technology. 

To learn more about the ethical arguments surrounding AI-generated evidence, and what measures the US judiciary is taking to respond, read our new article by Rebecca Bennett. 

🔗 Click the link in our bio to read more!

#centerforartlaw #artlaw #legalresearch #artlawyer #lawyer #aiart #courtissues #courts #generativeai #aievidence
Interested in the world of art restitution? Hear f Interested in the world of art restitution? Hear from our Lead Researcher of the Nazi-Era Looted Art Database, Amanda Buonaiuto, about the many accomplishments this year and our continuing goals in this space. We would love the chance to do even more amazing work, your donations can give us this opportunity! 

Please check out the database and the many recordings of online events we have regarding the showcase on our website.

Help us reach our end of year fundraising goal of $35K.

🔗 Click the link in our bio to donate ❤️🖤
Make sure to grab your tickets for our discussion Make sure to grab your tickets for our discussion on the legal challenges and considerations facing General Counsels at leading museums, auction houses, and galleries on December 17. Tune in to get insight into how legal departments navigate the complex and evolving art world.

The panel, featuring Cindy Caplan, General Counsel, The Jewish Museum, Jason Pollack, Senior Vice President, General Counsel, Americas, Christie’s and Halie Klein, General Counsel, Pace Gallery, will address a range of pressing issues, from the balancing of legal risk management with institutional missions, combined with the need to supervise a variety of legal issues, from employment law to real estate law. The conversation will also explore the unique role General Counsels play in shaping institutional policy.

This is a CLE Event. 1 Credit for Professional Practice Pending Approval.

🎟️ Make sure to grab your tickets using the link in our bio! 

#centerforartlaw #artlaw #legalresearch #generalcounsel #museumissues #artauctions #artgallery #artlawyer #CLE
While arts funding is perpetually scarce, cultural While arts funding is perpetually scarce, cultural heritage institutions particularly struggle during and after armed conflict. In such circumstances, funds from a variety of sources including NGOs, international organizations, national and regional institutions, and private funds all play a crucial role in protecting cultural heritage. 

Read our new article by Andrew Dearman to learn more about the organizations funding emergency cultural heritage protection in the face of armed conflict, as well as the factors hindering effective responses. 

🔗 Click the link in our bio to read more! 

#centerforartlaw #artlaw #legalresearch #lawyer #artlawyer #culturalheritage #armedconflict #UNESCO
Join the Center for Art Law in welcoming Attorney Join the Center for Art Law in welcoming Attorney and Art Business Consultant Richard Lehun as our keynote speaker for our upcoming Artist Dealer Relationships Clinic. 

The Artist-Dealer Relationships Clinic helps artists and gallerists negotiate effective and mutually-beneficial contracts. By connecting artists and dealers to attorneys, this Clinic looks to forge meaningful relations and to provide a platform for artists and dealers to learn about the laws that govern their relationship, as well as have their questions addressed by experts in the field.

After a short lecture, attendees with consultation tickets will be paired with a volunteer attorney for a confidential 20-minute consultation. Limited slots are available for the consultation sessions.
Today we held our last advisory meeting of the yea Today we held our last advisory meeting of the year, a hybrid, and a good wrap to a busy season. What do you think we discussed?
We are incredibly grateful to our network of attor We are incredibly grateful to our network of attorneys who generously volunteer for our clinics! We could not do it without them! 

Next week, join the Center for Art Law for our Artist-Dealer Relationships Clinic. This clinic is focused on helping artists navigate and understand contracts with galleries and art dealers. After a short lecture, attendees with consultation tickets will be paired with one of the Center's volunteer attorneys for a confidential 20-minute consultation. Limited slots are available for the consultation sessions.
'twas cold and still in Brooklyn last night and no 'twas cold and still in Brooklyn last night and not a creature was stirring except for dog walkers and their walkees... And then we reached 7,000 followers!
Don't miss this chance to learn more about the lat Don't miss this chance to learn more about the latest developments in the restitution of Nazi-looted art. Tune in on December 15th at noon ET to hear from our panel members Amanda Buonaiuto, Peter J. Toren, Olaf S. Ossmann, Laurel Zuckerman, and Lilah Aubrey. The will be discussing updates from the HEAR act, it's implications in the U.S., modifications from the German Commission, and the use of digital tools and data to advance restitution research and claims. 

🎟️ Click the link in our bio to get tickets!
Making news is easy. Solving art crimes is hard. R Making news is easy. Solving art crimes is hard. Running a nonprofit is even harder.

Donate to the Center for Art Law to help us meet our year end goal! 

🔗 Click the link in our bio to donate today!
  • About the Center
  • Contact Us
  • Newsletter
  • Upcoming Events
  • Internship
  • Case Law Database
  • Log in
  • Become a Member
  • Donate
DISCLAIMER

Center for Art Law is a New York State non-profit fully qualified under provision 501(c)(3)
of the Internal Revenue Code.

The Center does not provide legal representation. Information available on this website is
purely for educational purposes only and should not be construed as legal advice.

TERMS OF USE AND PRIVACY POLICY

Your use of the Site (as defined below) constitutes your consent to this Agreement. Please
read our Terms of Use and Privacy Policy carefully.

© 2025 Center for Art Law
 

Loading Comments...
 

You must be logged in to post a comment.