• About
    • Mission
    • Team
    • Boards
    • Mentions & Testimonials
    • Institutional Recognition
    • Annual Reports
    • Current & Past Sponsors
    • Contact Us
  • Resources
    • Article Collection
    • Podcast: Art in Brief
    • AML and the Art Market
    • AI and Art Authentication
    • Newsletter
      • Subscribe
      • Archives
      • In Brief
    • Art Law Library
    • Movies
    • Nazi-looted Art Restitution Database
    • Global Network
      • Courses and Programs
      • Artists’ Assistance
      • Bar Associations
      • Legal Sources
      • Law Firms
      • Student Societies
      • Research Institutions
    • Additional resources
      • The “Interview” Project
  • Events
    • Worldwide Calendar
    • Our Events
      • All Events
      • Annual Conferences
        • 2025 Art Law Conference
        • 2024 Art Law Conference
        • 2023 Art Law Conference
        • 2022 Art Law Conference
        • 2015 Art Law Conference
  • Programs
    • Visual Artists’ Legal Clinics
      • Art & Copyright Law Clinic
      • Artist-Dealer Relationships Clinic
      • Artist Legacy and Estate Planning Clinic
      • Visual Artists’ Immigration Clinic
    • Summer School
      • 2026
      • 2025
    • Internship and Fellowship
    • Judith Bresler Fellowship
  • Case Law Database
  • Log in
  • Become a Member
  • Donate
  • Log in
  • Become a Member
  • Donate
Center for Art Law
  • About
    About
    • Mission
    • Team
    • Boards
    • Mentions & Testimonials
    • Institutional Recognition
    • Annual Reports
    • Current & Past Sponsors
    • Contact Us
  • Resources
    Resources
    • Article Collection
    • Podcast: Art in Brief
    • AML and the Art Market
    • AI and Art Authentication
    • Newsletter
      Newsletter
      • Subscribe
      • Archives
      • In Brief
    • Art Law Library
    • Movies
    • Nazi-looted Art Restitution Database
    • Global Network
      Global Network
      • Courses and Programs
      • Artists’ Assistance
      • Bar Associations
      • Legal Sources
      • Law Firms
      • Student Societies
      • Research Institutions
    • Additional resources
      Additional resources
      • The “Interview” Project
  • Events
    Events
    • Worldwide Calendar
    • Our Events
      Our Events
      • All Events
      • Annual Conferences
        Annual Conferences
        • 2025 Art Law Conference
        • 2024 Art Law Conference
        • 2023 Art Law Conference
        • 2022 Art Law Conference
        • 2015 Art Law Conference
  • Programs
    Programs
    • Visual Artists’ Legal Clinics
      Visual Artists’ Legal Clinics
      • Art & Copyright Law Clinic
      • Artist-Dealer Relationships Clinic
      • Artist Legacy and Estate Planning Clinic
      • Visual Artists’ Immigration Clinic
    • Summer School
      Summer School
      • 2026
      • 2025
    • Internship and Fellowship
    • Judith Bresler Fellowship
  • Case Law Database
Home image/svg+xml 2021 Timothée Giet Art law image/svg+xml 2021 Timothée Giet Unpacking the US Copyright Office’s Third Report on Generative AI
Back

Unpacking the US Copyright Office’s Third Report on Generative AI

July 8, 2025

By Juliette Groothaert

word-image-73890-1
word-image-73890-2

 

Upon asking DALL-E 3[1] to “create a scenic view of the sea in the style of Van Gogh”, the image appearing on the right was generated within seconds. When compared to The Starry Night on the left, the stylistic resemblance is immediately apparent: swirling skies, radiating light forms, bold brushstrokes, and bright color contrasts.

Yet, as reminded by Cooper and Grimmelman, “a model is not a magical portal that pulls fresh information from some parallel universe into our own.”[2]

This basic understanding provides critical context for understanding the copyright implications of generative AI. Generative AI models, as sophisticated data-driven structures, operate on mathematical constructs derived wholly from their training datasets.[3] The expanding general usability of these models has only intensified the demand for such datasets.[4] To enhance quality, accuracy, and flexibility, industry submissions confirm these systems typically require ‘millions or billions of works for training purposes,’[5] including terabyte-scale datasets for foundation models.[6] As a result, this reliance on pre-existing copyrighted materials has catalyzed numerous legal challenges.[7]

Several prominent examples include The New York Times v. Microsoft Corp[8] case involving unauthorised use of proprietary journalism to train language models; visual arts disputes such as Zhang v. Google LLC,[9] Andersen v. Stability AI,[10] and Getty Images v. Stability AI;[11] and most importantly, the landmark ruling in Thomson Reuters v. Ross Intelligence.[12] In Reuters, although concerning the use of copyrighted legal materials to train a non-generative AI research tool, the court found that copyright infringement had occurred through unauthorised use of legal headnotes and structure to train a competing research tool.[13] Collectively, these cases, which now exceed forty pending lawsuits,[14] center on a pivotal legal question: whether using copyrighted works for AI training is fair use, particularly when employed in generative systems producing output.

Against this contentious backdrop, the United States Copyright Office (‘Office’) advanced this discourse on May 9 2025, by releasing a pre-publication draft of Part 3 of its comprehensive AI policy report.[15] In March 2023, it issued guidance confirming that human authorship is required for copyright registration, and that applicants must disclose any AI-generated content exceeding a de minimis threshold, along with a description of the human author’s contribution. Following this, the Office issued a notice of Inquiry, soliciting public comments on AI and copyright. It received over 10,000 submissions, which informed the analysis and recommendations presented in the current report. Part 1 and Part 2 of the Office’s Initiative, addressing digital replicas and copyrightability respectively, laid essential groundwork for this third report; the Center for Art Law has published further commentary on both which can be found here for Part 1 and here for Part 2. This latest report offers the most detailed articulation yet of how copyright law applies to the training of generative AI models. Yet its release coincides with exceptional institutional turbulence. Register Shira Perlmutter’s dismissal[16] days after the report’s publication raises questions about what changes new management might enact. This timing may be particularly delicate for pending cases like Kadrey v. Meta[17] and Bartz v. Anthropic,[18] which directly echo the report’s analysis. Though the report is not legally binding, it enters a legal ecosystem potentially shaping interpretive norms where AI copyright doctrine is actively evolving.

Technical Primer

The Office’s pre-publication recognizes that answers to these legal questions must be technically precise regarding how generative AI systems interact with protected works. Before it considers fair use defenses, the Office systematically lays out how machine learning workflows inherently implicate exclusive rights under copyright law. This technical foundation identifies three essential points of pressure: reproduction rights affected when datasets are being created, the possible embodiment of protected expression with model parameters under memorization, and the dangers characteristic of retrieval-augmented generation systems.

Datasets

Generative AI models, including large-scale language models as well as image generators, are developed through machine learning techniques that deliberately reproduce copyrighted material.[19] Every stage of dataset creation is potentially copyright infringement under 17 U.S.C. § 106(1): the initial downloading from online sources, format conversion, cross-medium transfers, and creation of modified subsets or filtered corpora. Such operations may concurrently implicate the derivative work right under § 106(2) when involving recasting or transformation of original expression through abridgements, condensations, or other adaptations.

Model Weights

The Office finds that model weights, numerical parameters encoding learned patterns, may represent copies of protected expression where there is substantial memorization involved, implicating reproduction and derivative rights under copyright law. As articulated on page 30 of its report:

‘…whether a model’s weights implicate the reproduction or derivative work rights turns on whether the model has retained or memorized substantial protectable expression from the works at issue.’[20]

This determination hinges on a fact-specific inquiry: when weights enable outputting verbatim or near-identical content from training data, the Office asserts there is a strong argument that copying those weights infringes memorized works. Judicial approaches reflect this fact-intensive standard, diverging significantly, as seen in Kadrey v. Meta Platforms[21] dismissing claims as ‘nonsensical’ absent allegations of infringing outputs, while Andersen v. Stability AI[22] permitted claims against third party users where plaintiffs demonstrated protected elements persisted within weights. The Office endorses Andersen’s standard, clarifying that infringement turns on whether ‘the model has retained or memorized substantial protectable expression.’ Critically, when protectable material is embedded in weights, subsequent distribution or reuse, even by parties uninvolved in training, could constitute prima facie infringement, creating downstream liability risks that extend far beyond initial model development.[23]

RAG

The Office’s report adopts a notably more assertive stand on retrieval-augmented (RAG) systems than other AI training methods, focusing particularly on the unique legal risks they present. Unlike conventional generative AI models built up from pre-trained datasets, RAG systems actively retrieve and incorporate real-time data from the outside world during output generation.[24] Accordingly, RAG can be understood as functioning in two steps: the system first copies the source materials into a retrieval database, and then, when prompted by a user query, outputs them again. While such an architecture improves accuracy to reality, both the initial unauthorized reproduction and the later relaying of that material are potential copyright infringements which do not qualify as fair use. These remarks hold especially true when one is summarizing or abridging copyrighted works like news stories rather than merely linking to them.

This categorical stance stems from RAG’s close connection to traditional content markets. With routine AI training, works find their way into the confines of patterns and statistical norms. But RAG outputs retain verbatim excerpts and at times compete directly with originals, threatening core revenue streams for rights holders. For instance, systems found in Perplexity AI,[25] now facing the first US lawsuit targeting RAG technology,[26] allegedly enable users to ‘skip the links’ to go to source material. This diverts traffic and ad revenue away from publishers like The Wall Street Journal that used to bring their reader directly to inside stories through hyperlinks. Unlike established cases like Authors Guild v. Google,[27] RAG itself does not use snippet functions to help people find sources of information. This is where RAG is so different from the past: it risks blending the original and the derived to blur the line between search utility and a competitor commercial service. Having relied heavily upon unauthorized sources, RAG’s activities are a commercial choice rather than one driven by technical necessity because there are feasible alternatives such as licensed APIs.[28] This weakens the argument for fair use as a transformative defence, as RAG’s outputs frequently repeat the expressive purpose and economic value of the underlying works. In essence, the Office’s sharp condemnation of RAG signals a pivotal shift; as licensing markets for training data mature, unlicensed real-time ingestion faces existential legal threats. Cours are increasingly tasked with reconciling innovation incentives with the uncompensated exploitation that drives what some see as RAG’s double-barreled infringement.

Fair Use Factors

The Office’s report thoroughly refutes the assumption that AI training automatically enjoys broad fair use coverage, emphasisng that when it comes to creating datasets from copyrighted works, copying them constitutes prima facie infringement under 17 U.S.C. § 106(1). Against this backdrop, the Office applies the statutory four-factor test under §107 with notable rigour, rejecting categorical exemptions for machine learning. Pre-publication guidance explores these factors in depth under section IV, which will be covered below.

First Factor

The Office’s first factor analysis, centered on the purpose and character of use, applies the Supreme Court’s framework in Warhol v. Goldsmith,[29] rejecting absolute claims of transformativeness and instead demanding that the actualities of use be closely scrutinized. The Office stresses that the potential for transformation cannot be judged purely on how models are trained; instead courts must consider what those trained models do in the field. This approach explicitly incorporates Warhol’s instruction to evaluate the ‘purpose and function’ in relation to original artwork, moving from straightforward textual comparisons of content incorporated or resembled to whether outputs serve as substitutes on the market.

Adam Liptak, Supreme Court Rules Against Warhol Foundation in Prince Case, N.Y. Times (May 18, 2023), https://www.nytimes.com/2023/05/18/us/andy-warhol-prince-lynn-goldsmith.html.

Critically, the report dismantles two key industry arguments. First, that training is a mechanical process that creates non-experiential reality by computer input, and secondly, that it parallels human learning.[30] The Office counters that generative models transform not only semantic meanings but the expensive genre of copyrighted works as well; they study in particular ‘how words are selected and arranged at the sentence, paragraph, and document level.’ This stands in stark contrast to human memory, where learners retain imperfect impressions filtered through unique perspectives. While humans provide the creative ecosystem that the marketplace must have to live off in derivative work, AI reproduces content beyond human speed and scale which enables market-disruptive reproduction.

Further, this analysis spells out protective measures after the deployment and incorporation of data as specific pointers. Proof that the author installed robust guardrails to prevent verbatim output might validate transformativeness by revealing intent that systems be used for different purposes- random objectives can never work, as cautioned in Warhol, and if reality contradicts intention, there is nothing to back it up. Simultaneously, extensive use of pirated datasets weighs against fair use, especially if models generate content competing with the works illegally accessed by trained agents, a reality now germane to ongoing litigation, due largely to most large language models’ dependence on shadow databases.[31]

Ultimately, the Office adopts a nuanced assessment for transformativeness in generative AI. If models are trained on specific genres to produce content for identical audiences, the use is at best moderately transformative given shared commercial and expressive purposes. This calculus weighs input-side considerations (data legality, training indent) against output consequences (market substitution, functional divergence), to ensure transformativeness never outweighs other fair use analysis. As Warhol affirmed and the Office endorses, a transformative use can still infringe upon an original work if it serves the same purpose and market.

Second Factor

The Office’s examination of the second fair use factor, the nature of the copyrighted work, applies the Supreme Court’s framework recognizing that creative expression resides at the core of copyright protective purpose, while factual or functional materials occupy a more peripheral position. As per Campbell v. Acuff-Rose Music,[32] this factor acknowledges ‘some works are closer to the core of intended copyright protection than others,’ establishing a graduated spectrum where visual artworks command stronger safeguards than code, scholarly articles, or news reports. This hierarchy, articulated in Sony v. Universal,[33] renders the use of highly creative works less likely to qualify as fair use- a principle carrying particular force in generative AI contexts where training sets include content that is not highly expressive.

Publication status further informs this analysis as a judicially recognised gloss on the statutory factor. Though Congress amended §107 to clarify that unpublished status is not dispositive, Swatch Group Management v. Bloomberg LP[34] established that unpublished works weigh against fair use given copyright’s traditional role in protecting first publication rights. The Office notes most AI training datasets consist of published materials, which ‘modestly support a fair use argument’[35] per consensus, while cautioning that unpublished content, whether inadvertently ingested or deliberately sourced, intensifies infringement risks.

Industry submissions reinforce this bifurcation, observing that training on novels or visual artworks fits squarely within copyright’s protective domain whereas functional code or factual compilations present weaker claims. As the Authors Guild emphasised,[36] the second factor ‘would weigh against fair use where works are highly creative and closer to the heart of copyright,’ particularly for visual artworks whose value lies in expressive singularity. Nevertheless, the Office concurs with commenters who view this factor as rarely decisive alone, noting its doctrinal gravity is typically subordinate to commercial purpose and market harm. Ultimately, the Office concludes that where training relies on unpublished materials or highly expressive works, this factor will disfavor fair use.

Third Factor

The Copyright Office’s third-factor analysis, evaluating the amount and substantiality of copyrighted material used, confronts the reality that generative AI systems typically ingest entire works during training. Under §107, this factor examines whether the quantity copied is ‘reasonable in relation to the purpose of the copying,’[37] a context-sensitive inquiry that diverges sharply from precedents like Authors Guild v. Google.[38] Where Google Books’ full-text copying enabled non-expressive search functions and limited snippet displays, the Office emphasises that AI’s wholesale ingestion lacks comparable transformative justification, observing that ‘the use of entire copyrighted works is less clearly justified in the context of AI training than it was for Google books or thumbnail image search.’[39]

Crucially, the report rejects categorical condemnation of full-work copying, acknowledging that functional necessity may render such scale reasonable if developers demonstrate both 1) a highly transformative purpose for training, and 2) robust technical safeguards preventing output of substantially similar protected expression. This nuanced calibration reflects Sega Enterprises v. Accolade’s legacy[40] where reverse-engineering entire software packages was deemed reasonable for interoperability while underscoring AI’s distinct risks; absent guardrails, models risk regurgitating protected content at scale. The analysis positions output controls as pivotal mitigators; where effective constraints exist, the third factor’s weight against fair use diminishes proportionality.

Yet the Office tempers this flexibility with stark caution. Training on qualitatively significant portions such as a photograph’s compositional essence, intensifies infringement concerns even when quantitatively minor, per Harper & Row’s ‘heart of the work’ doctrine.[41] Unpublished materials attract particular scrutiny, as their unauthorised ingestion deprives rights holders of first publication control. Ultimately, while full-scale copying proves functionally necessary for model optimisation, its justification remains contingent on evidence that deployment contexts avoid market substitution.

Fourth Factor

The Copyright Office’s analysis of the fourth fair use factor, effect on the potential market for or value of the copyrighted work, arguably constitutes the report’s most consequential and controversial intervention, introducing market dilution as a novel theory of harm that expands traditional infringement paradigms. While reaffirming established harms like lost sales from direct displacement by AI-generated substitutes, and lost licensing opportunities, emphasising that feasible markets for training data ‘disfavor fair use where licensing options exist,’[42] the Office contends that generative AI’s unprecedented scale enables uniquely corrosive market effects. Specifically, the report warns that AI’s capacity for stylistic imitation, even absent verbatim copying, could flood markets with outputs that lower prices, reduce demand for original works, and hurt authorship by saturating creative sectors with algorithmically generated content. This dilution theory, while acknowledging that copyright traditionally targets infringement rather than competition, posits that the speed and scale of AI output production threatens to devalue human creativity in ways courts have never before confronted it.

The Office grounds this theory in statutory language protecting a work’s ‘value’, arguing that style implicates ‘protectable elements of authorship’[43] and that saturation by stylistically derivative AI outputs could diminish a creator’s commercial distinctiveness. Though analogizing to Sony Corp v. Universal City Studios,[44] where the Court considered harms from ‘widespread’ unauthorised copying, the report concedes market dilution enters ‘uncharted territory’ judicially. No court has yet adopted such a framework, and its viability hinges on whether judges accept that non-infringing stylistic competition can constitute cognizable harm under fair use’s fourth factor. The Office acknowledges this theory’s vulnerability, noting courts may demand empirical evidence beyond policy concerns or anecdotal examples and that its persuasive authority under Skidmore deference depends on the strength of its reasoning.

Importantly, the dilution theory may face several doctrinal tensions. Firstly, copyright historically permits market competition from non-infringing works, even when it harms original creators.[45] Objections to AI-driven dilution stem from its ease of production,distribution, and resulting scale, raising questions about whether copyright should shield markets from technological disruption. Secondly, critics contend that recognising dilution could paradoxically stifle creativity by enabling rights holders to suppress tools producing non-infringing works, potentially chilling production and distribution of new works by human creators leveraging AI ethically.[46] Finally, the Office subtly invokes creators’ ‘economic and moral interests’ in their works’ unique stylistic value, aligning with scholarly views that ‘value’ encompasses non-substitutionary harms like lost attribution or cultural decontextualisation.[47]

Amid ongoing litigation like Kadrey v. Meta, where courts grapple with output-based market effects, the report’s dilution framework offers plaintiffs a strategic tool to argue systemic harm beyond individual infringement. Yet its ultimate judicial reception remains uncertain, particularly given the Office’s concurrent political upheaval and the theory’s departure from precedent, the dilution framework challenges the AI industry by inviting courts to reconsider whether copyright’s purpose, protecting the ‘fruits of intellectual labor’, must evolve to address algorithmic economies of scale.

Licensing

The Office’s report champions voluntary and collective licensing as the optimal path to resolve AI training disputes, explicitly favoring market-driven solutions over regulatory intervention. This approach recognises emerging industry practices; visual media platforms like Getty Images offer structured reuse agreements.[48] These real-world models demonstrate that scalable compensation frameworks are feasible, reducing transaction costs while enabling tailored terms for duration, exclusivity, and territorial scope.

For contexts where direct licensing remains impractical, the Office endorses extended collective licensing as a supplementary mechanism (ECL). Modeled on Scandinavian and UK systems, ECL empowers certified collecting management organizations to license entire repertoires (including non-members’ works) under government oversight, subject to robust opt-out rights that preserve creator autonomy. Such frameworks address the ‘copyright iceberg’[49] problem by covering orphan works and simplifying bulk permissions. Crucially, the Office rejects compulsory licensing as premature and incompatible with US copyright principles, noting the absence of systemic market failure justifying state-mandated rates. Voluntary agreements between AI developers and publishers, such as Adobe’s compensated artist partnerships for Firefly training,[50] demonstrate functional market dynamics without government coercion. While acknowledging ECL’s potential to bridge gaps, the report cautions against premature regulatory intrusion, emphasizing that licensing markets require space to evolve organically. Instead it advocates for targeted guardrails: certification standards to ensure CMO representativeness, ironclad opt-out protections, and pilot programs in discrete sectors like academic publishing before broader implementation.

Concluding Thoughts

Cooper and Grimmelmann’s incisive reminder, that AI models are not ‘magical portals’ extracting knowledge from parallel universes but data structures built from human creative labor, anchors the Office’s report. The Office methodically establishes that training generative AI implicates reproduction rights at every stage: dataset creation, weight memorization, and RAG’s real-time copying. Its rigorous fair use analysis dismantles industry claims of inherent transformativeness, instead demanding context-specific scrutiny of outputs and market harm. Most provocatively, it endorses market dilution as recognizable injury, implying that stylistic imitation at scale devalues human artistry even without infringement.

Yet the report’s release amid leadership upheaval and pending litigation leaves its authority in flux. While championing voluntary licensing as the optimal path, its novel doctrinal frameworks, particularly dilution, face untested judicial terrain. Ultimately, the Office charts a pragmatic course, acknowledging AI’s technical necessities while centering copyright’s mandate to protect creative labor. As Cooper and Grimmelmann caution, progress lies not in magical thinking about ‘parallel universes’, but in ethically engaging the human expression fueling these systems. The path forward demands negotiated coexistence, where innovation credits its sources, and creation retains its worth.

Suggested readings:

Why A.I. isn’t going to make art, The New Yorker, August 31 2024

Understanding artists’ perspectives on generative AI art and transparency, ownership, and fairness, AI Hub, January 14 2025

How artists are using generative AI to celebrate the natural world, UK Creative Festival, January 15 2025

Stopping the Trump Administration’s Unlawful Firing Of Copyright Office Director, Democracy Forward, May 22 2025

About the author:

Juliette Groothaert (Summer Intern 2025, Center for Art Law) is a law student at the University of Bristol, graduating in 2025. She is interested in the evolving relationship between intellectual property law and artistic expression, which she hopes to explore further through an LLM next year. As a summer legal intern, she is contributing to research in this field while contributing to the Center’s Nazi-Looted Art Database.

Select Sources:

  1. DALL·E 3 is a text-to-image model developed by OpenAI that uses deep learning to generate digital images from text prompts. Released in October 2023, it is integrated into ChatGPT for Plus and Enterprise users, and is also accessible via OpenAI’s API and Labs platform. ↑
  2. A Feder Cooper and James Grimmelmann, ‘The Files are in the Computer: Copyright, Memorization and Generative AI’ (2023) 23–24 ↑
  3. Adam Zewe, ‘Explained: Generative AI’ (MIT News, 9 November 2023) https://news.mit.edu/2023/explained-generative-ai-1109 ↑
  4. Jordan Hoffmann and others, ‘Training Compute‑Optimal Large Language Models’ (arXiv, 29 March 2022) 1 https://arxiv.org/abs/2203.15556 ↑
  5. Digital Media Licensing Association (DMLA), Initial Comments in response to U.S. Copyright Office Copyright and Artificial Intelligence: Part 3 – Generative AI Training (Pre‑publication version, March 2025) 10–11 ↑
  6. Competition and Markets Authority, AI Foundation Models – Technical update report (GOV.UK, 16 April 2024) 1, 85 https://assets.publishing.service.gov.uk/media/661e5a4c7469198185bd3d62/AI_Foundation_Models_technical_update_report.pdf ↑
  7. Gil Appel, Juliana Neelbauer and David A Schweidel, ‘Generative AI Has an Intellectual Property Problem’ (Harvard Business Review, 7 April 2023) https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem ↑
  8. The New York Times Co. v. Microsoft Corp., No. 1:24-cv-00034 (S.D.N.Y. filed Dec. 27, 2023) ↑
  9. Zhang v. Google LLC, No. 3:24-cv-00487 (N.D. Cal. filed Jan. 26, 2024) ↑
  10. Andersen v. Stability AI Ltd., No. 3:23-cv-00201 (N.D. Cal. filed Jan. 13, 2023) ↑
  11. Getty Images (US), Inc. v. Stability AI, Inc., No. 1:23-cv-00135 (D. Del. filed Feb. 3, 2023) ↑
  12. Thomson Reuters Enter. Ctr. GmbH v. Ross Intelligence Inc., No. 1:20-cv-00613 (D. Del. filed May 6, 2020) ↑
  13. Id. ↑
  14. Five Takeaways from the Copyright Office’s Controversial New AI … (Copyright Lately) https://copyrightlately.com/copyright-office-ai-report/ ↑
  15. U.S. Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 1–3 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  16. Lisa O’Carroll, ‘Trump fires copyright office supremo Shira Perlmutter after AI report’ (The Guardian, 12 May 2025) https://www.theguardian.com/us-news/2025/may/12/trump-fires-copyright-office-shira-perlmutter ↑
  17. Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417 (N.D. Cal. filed July 7, 2023) ↑
  18. Bartz v. Anthropic PBC, No. 2:24-cv-01523 (C.D. Cal. filed Mar. 1, 2024) ↑
  19. The Development of Generative Artificial Intelligence from a Copyright Perspective (European Parliament, JURI Committee, Study prepared by University of Turin & Nexa Centre, 12 May 2025) 1–5 https://www.europarl.europa.eu/meetdocs/2024_2029/plmrep/COMMITTEES/JURI/DV/2025/05-12/2025.05.12_item6_Study_GenAIfromacopyrightperspective_EN.pdf ↑
  20. U.S. Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 1–2 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  21. Kadrey v. Meta Platforms, Inc., No. 3:23‑cv‑03417‑VC (N.D. Cal. filed July 7, 2023) ↑
  22. Andersen v. Stability AI Ltd., No. 3:23‑cv‑00201‑WHO (N.D. Cal. filed Jan. 13, 2023) ↑
  23. Aleksander Goranin, ‘A Deep Look at Copyright’s Volitional Conduct Doctrine and Generative Artificial Intelligence’ (forthcoming, Emory Law Journal) ↑
  24. Google Cloud, Retrieval‑Augmented Generation use case (Google Cloud, last updated 13 June 2025)https://cloud.google.com/use-cases/retrieval-augmented-generation ↑
  25. Dan Jasnow, Danielle W Bulger and Nardeen Billan, ‘Generative AI Meets Generative Litigation: News Corp Continues Its Battle Against Perplexity AI’ (National Law Review, 20 December 2024) https://natlawreview.com/article/generative-ai-meets-generative-litigation-news-corp-continues-its-battle-against ↑
  26. Dow Jones & Co., Inc. v. Perplexity AI, Inc., No. 24-CV-7984 (S.D.N.Y. Dec. 11, 2024) ↑
  27. Authors Guild, Inc. v. Google, Inc., 804 F.3d 202 (2d Cir. 2015) ↑
  28. LexisNexis Launches Data+ API for AI Training (Artificial Lawyer, 9 December 2024) https://www.artificiallawyer.com/2024/12/09/lexisnexis-launches-data-api-for-ai-training/ ↑
  29. Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508 (2023) ↑
  30. Tiago Freitas and Eliot Mannoia, ‘Parallels Between Biological and Artificial Brains: Isolation vs Recursive Training’ (BrandKarma, 10 November 2024) https://www.brandkarma.at/opinions/parallels-between-biological-and-artificial-brains-isolation-vs-recursive-training/ ↑
  31. LLM Security and Prompt Engineering Digest: LLM Shadows (Adversa.ai, 3 August 2023) https://adversa.ai/blog/llm-security-and-prompt-engineering-digest-llm-shadows/ ↑
  32. Campbell v. Acuff‑Rose Music, Inc., 510 U.S. 569 (1994) ↑
  33. Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417 (1984) ↑
  34. Swatch Group Mgmt. Servs. Ltd. v. Bloomberg L.P., 742 F.3d 17 (2d Cir. 2014) ↑
  35. New Media Rights, Initial Comments in response to US Copyright Office, Copyright and Artificial Intelligence: Part 3 – Generative AI Training (Pre‑publication version, 9 May 2025) 16; Data Provenance Initiative, Initial Comments ibid 10–11; Katherine Lee and others, Initial Comments ibid 102. ↑
  36. The Authors Guild, Initial Comments in response to US Copyright Office, Copyright and Artificial Intelligence: Part 3 – Generative AI Training (Pre‑publication version, 9 May 2025) 20. ↑
  37. United States Code, Title 17 § 107 (2023) ↑
  38. Authors Guild, Inc. v. Google Inc., 804 F.3d 202 (2d Cir. Oct. 16, 2015) ↑
  39. US Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 57 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  40. Sega Enters. Ltd. v. Accolade, Inc., 977 F.2d 1510 (9th Cir. 1992) ↑
  41. Harper & Row v. Nation Enterprises | 471 U.S. 539 (1985) ↑
  42. US Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 54 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  43. TechNet, Initial Comments in response to US Copyright Office, Copyright and Artificial Intelligence: Part 3 – Generative AI Training (30 October 2023) 11 ↑
  44. Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417 (1984) ↑
  45. World Intellectual Property Organization, Copyright, Competition and Development (WIPO‑mandated survey by Max Planck Institute, December 2013) https://www.wipo.int/export/sites/www/competition-policy/en/docs/copyright_competition_development.pdf ↑
  46. World Intellectual Property Organization, Copyright, Competition and Development (WIPO‑mandated survey by Max Planck Institute, December 2013) https://www.wipo.int/export/sites/www/competition-policy/en/docs/copyright_competition_development.pdf ↑
  47. Todd A Carpenter, ‘Ensuring attribution is critical when licensing content to AI developers’ (The Scholarly Kitchen, 4 September 2024) https://scholarlykitchen.sspnet.org/2024/09/04/make-attribution-mandatory-in-ai-licensing/ ↑
  48. Getty Images, Content License Agreement (last updated October 2024) https://www.gettyimages.co.uk/eula ↑
  49. George H Pike, ‘AI and Copyright: Steering Around the Iceberg’ (Information Today, vol 40 no 8, October 2023) 24 ↑
  50. Adobe, ‘Adobe’s approach to customer choice in AI models’ (Adobe Blog, 18 March 2025) https://blog.adobe.com/en/publish/2025/03/18/adobes-approach-customer-choice-in-ai-models ↑

Juliette Groothaert

Upon asking DALL-E 3[1] to “create a scenic view of the sea in the style of Van Gogh”, the image appearing on the left was generated within seconds. When compared to The Starry Night on the right, the stylistic resemblance is immediately apparent: swirling skies, radiating light forms, bold brushstrokes, and bright color contrasts.

Yet, as reminded by Cooper and Grimmelman, ‘a model is not a magical portal that pulls fresh information from some parallel universe into our own.’[2]

This basic understanding provides critical context for understanding the copyright implications of generative AI. Generative AI models, as sophisticated data-driven structures, operate on mathematical constructs derived wholly from their training datasets.[3] The expanding general usability of these models has only intensified the demand for such datasets.[4] To enhance quality, accuracy, and flexibility, industry submissions confirm these systems typically require ‘millions or billions of works for training purposes,’[5] including terabyte-scale datasets for foundation models.[6] As a result, this reliance on pre-existing copyrighted materials has catalyzed numerous legal challenges.[7]

Several prominent examples include The New York Times v. Microsoft Corp[8] case involving unauthorised use of proprietary journalism to train language models; visual arts disputes such as Zhang v. Google LLC,[9] Andersen v. Stability AI,[10] and Getty Images v. Stability AI;[11] and most importantly, the landmark ruling in Thomson Reuters v. Ross Intelligence.[12] In Reuters, although concerning the use of copyrighted legal materials to train a non-generative AI research tool, the court found that copyright infringement had occurred through unauthorised use of legal headnotes and structure to train a competing research tool.[13] Collectively, these cases, which now exceed forty pending lawsuits,[14] center on a pivotal legal question: whether using copyrighted works for AI training is fair use, particularly when employed in generative systems producing output.

Against this contentious backdrop, the United States Copyright Office (‘Office’) advanced this discourse on May 9 2025, by releasing a pre-publication draft of Part 3 of its comprehensive AI policy report.[15] In March 2023, it issued guidance confirming that human authorship is required for copyright registration, and that applicants must disclose any AI-generated content exceeding a de minimis threshold, along with a description of the human author’s contribution. Following this, the Office issued a notice of Inquiry, soliciting public comments on AI and copyright. It received over 10,000 submissions, which informed the analysis and recommendations presented in the current report. Part 1 and Part 2 of the Office’s Initiative, addressing digital replicas and copyrightability respectively, laid essential groundwork for this third report; the Center for Art Law has published further commentary on both which can be found here for Part 1 and here for Part 2.. This latest report offers the most detailed articulation yet of how copyright law applies to the training of generative AI models. Yet its release coincides with exceptional institutional turbulence. Register Shira Perlmutter’s dismissal[16] days after the report’s publication raises questions about what changes new management might enact. This timing may be particularly delicate for pending cases like Kadrey v. Meta[17] and Bartz v. Anthropic,[18] which directly echo the report’s analysis. Though the report is not legally binding, it enters a legal ecosystem potentially shaping interpretive norms where AI copyright doctrine is actively evolving.

Technical Primer

The Office’s pre-publication recognizes that answers to these legal questions must be technically precise regarding how generative AI systems interact with protected works. Before it considers fair use defenses, the Office systematically lays out how machine learning workflows inherently implicate exclusive rights under copyright law. This technical foundation identifies three essential points of pressure: reproduction rights affected when datasets are being created, the possible embodiment of protected expression with model parameters under memorization, and the dangers characteristic of retrieval-augmented generation systems.

Datasets

Generative AI models, including large-scale language models as well as image generators, are developed through machine learning techniques that deliberately reproduce copyrighted material.[19] Every stage of dataset creation is potentially copyright infringement under 17 U.S.C. § 106(1): the initial downloading from online sources, format conversion, cross-medium transfers, and creation of modified subsets or filtered corpora. Such operations may concurrently implicate the derivative work right under § 106(2) when involving recasting or transformation of original expression through abridgements, condensations, or other adaptations.

Model Weights

The Office finds that model weights, numerical parameters encoding learned patterns, may represent copies of protected expression where there is substantial memorization involved, implicating reproduction and derivative rights under copyright law. As articulated on page 30 of its report:

‘…whether a model’s weights implicate the reproduction or derivative work rights turns on whether the model has retained or memorized substantial protectable expression from the works at issue.’[20]

This determination hinges on a fact-specific inquiry: when weights enable outputting verbatim or near-identical content from training data, the Office asserts there is a strong argument that copying those weights infringes memorized works. Judicial approaches reflect this fact-intensive standard, diverging significantly, as seen in Kadrey v. Meta Platforms[21] dismissing claims as ‘nonsensical’ absent allegations of infringing outputs, while Andersen v. Stability AI[22] permitted claims against third party users where plaintiffs demonstrated protected elements persisted within weights. The Office endorses Andersen’s standard, clarifying that infringement turns on whether ‘the model has retained or memorized substantial protectable expression.’ Critically, when protectable material is embedded in weights, subsequent distribution or reuse, even by parties uninvolved in training, could constitute prima facie infringement, creating downstream liability risks that extend far beyond initial model development.[23]

RAG

The Office’s report adopts a notably more assertive stand on retrieval-augmented (RAG) systems than other AI training methods, focusing particularly on the unique legal risks they present. Unlike conventional generative AI models built up from pre-trained datasets, RAG systems actively retrieve and incorporate real-time data from the outside world during output generation.[24] Accordingly, RAG can be understood as functioning in two steps: the system first copies the source materials into a retrieval database, and then, when prompted by a user query, outputs them again. While such an architecture improves accuracy to reality, both the initial unauthorized reproduction and the later relaying of that material are potential copyright infringements which do not qualify as fair use. These remarks hold especially true when one is summarizing or abridging copyrighted works like news stories rather than merely linking to them.

This categorical stance stems from RAG’s close connection to traditional content markets. With routine AI training, works find their way into the confines of patterns and statistical norms. But RAG outputs retain verbatim excerpts and at times compete directly with originals, threatening core revenue streams for rights holders. For instance, systems found in Perplexity AI,[25] now facing the first US lawsuit targeting RAG technology,[26] allegedly enable users to ‘skip the links’ to go to source material. This diverts traffic and ad revenue away from publishers like The Wall Street Journal that used to bring their reader directly to inside stories through hyperlinks. Unlike established cases like Authors Guild v. Google,[27] RAG itself does not use snippet functions to help people find sources of information. This is where RAG is so different from the past: it risks blending the original and the derived to blur the line between search utility and a competitor commercial service. Having relied heavily upon unauthorized sources, RAG’s activities are a commercial choice rather than one driven by technical necessity because there are feasible alternatives such as licensed APIs.[28] This weakens the argument for fair use as a transformative defence, as RAG’s outputs frequently repeat the expressive purpose and economic value of the underlying works. In essence, the Office’s sharp condemnation of RAG signals a pivotal shift; as licensing markets for training data mature, unlicensed real-time ingestion faces existential legal threats. Cours are increasingly tasked with reconciling innovation incentives with the uncompensated exploitation that drives what some see as RAG’s double-barreled infringement.

Fair Use Factors

The Office’s report thoroughly refutes the assumption that AI training automatically enjoys broad fair use coverage, emphasisng that when it comes to creating datasets from copyrighted works, copying them constitutes prima facie infringement under 17 U.S.C. § 106(1). Against this backdrop, the Office applies the statutory four-factor test under §107 with notable rigour, rejecting categorical exemptions for machine learning. Pre-publication guidance explores these factors in depth under section IV, which will be covered below.

First Factor

The Office’s first factor analysis, centered on the purpose and character of use, applies the Supreme Court’s framework in Warhol v. Goldsmith,[29] rejecting absolute claims of transformativeness and instead demanding that the actualities of use be closely scrutinized. The Office stresses that the potential for transformation cannot be judged purely on how models are trained; instead courts must consider what those trained models do in the field. This approach explicitly incorporates Warhol’s instruction to evaluate the ‘purpose and function’ in relation to original artwork, moving from straightforward textual comparisons of content incorporated or resembled to whether outputs serve as substitutes on the market.

Adam Liptak, Supreme Court Rules Against Warhol Foundation in Prince Case, N.Y. Times (May 18, 2023), https://www.nytimes.com/2023/05/18/us/andy-warhol-prince-lynn-goldsmith.html.

Critically, the report dismantles two key industry arguments. First, that training is a mechanical process that creates non-experiential reality by computer input, and secondly, that it parallels human learning.[30] The Office counters that generative models transform not only semantic meanings but the expensive genre of copyrighted works as well; they study in particular ‘how words are selected and arranged at the sentence, paragraph, and document level.’ This stands in stark contrast to human memory, where learners retain imperfect impressions filtered through unique perspectives. While humans provide the creative ecosystem that the marketplace must have to live off in derivative work, AI reproduces content beyond human speed and scale which enables market-disruptive reproduction.

Further, this analysis spells out protective measures after the deployment and incorporation of data as specific pointers. Proof that the author installed robust guardrails to prevent verbatim output might validate transformativeness by revealing intent that systems be used for different purposes- random objectives can never work, as cautioned in Warhol, and if reality contradicts intention, there is nothing to back it up. Simultaneously, extensive use of pirated datasets weighs against fair use, especially if models generate content competing with the works illegally accessed by trained agents, a reality now germane to ongoing litigation, due largely to most large language models’ dependence on shadow databases.[31]

Ultimately, the Office adopts a nuanced assessment for transformativeness in generative AI. If models are trained on specific genres to produce content for identical audiences, the use is at best moderately transformative given shared commercial and expressive purposes. This calculus weighs input-side considerations (data legality, training indent) against output consequences (market substitution, functional divergence), to ensure transformativeness never outweighs other fair use analysis. As Warhol affirmed and the Office endorses, a transformative use can still infringe upon an original work if it serves the same purpose and market.

Second Factor

The Office’s examination of the second fair use factor, the nature of the copyrighted work, applies the Supreme Court’s framework recognizing that creative expression resides at the core of copyright protective purpose, while factual or functional materials occupy a more peripheral position. As per Campbell v. Acuff-Rose Music,[32] this factor acknowledges ‘some works are closer to the core of intended copyright protection than others,’ establishing a graduated spectrum where visual artworks command stronger safeguards than code, scholarly articles, or news reports. This hierarchy, articulated in Sony v. Universal,[33] renders the use of highly creative works less likely to qualify as fair use- a principle carrying particular force in generative AI contexts where training sets include content that is not highly expressive.

Publication status further informs this analysis as a judicially recognised gloss on the statutory factor. Though Congress amended §107 to clarify that unpublished status is not dispositive, Swatch Group Management v. Bloomberg LP[34] established that unpublished works weigh against fair use given copyright’s traditional role in protecting first publication rights. The Office notes most AI training datasets consist of published materials, which ‘modestly support a fair use argument’[35] per consensus, while cautioning that unpublished content, whether inadvertently ingested or deliberately sourced, intensifies infringement risks.

Industry submissions reinforce this bifurcation, observing that training on novels or visual artworks fits squarely within copyright’s protective domain whereas functional code or factual compilations present weaker claims. As the Authors Guild emphasised,[36] the second factor ‘would weigh against fair use where works are highly creative and closer to the heart of copyright,’ particularly for visual artworks whose value lies in expressive singularity. Nevertheless, the Office concurs with commenters who view this factor as rarely decisive alone, noting its doctrinal gravity is typically subordinate to commercial purpose and market harm. Ultimately, the Office concludes that where training relies on unpublished materials or highly expressive works, this factor will disfavor fair use.

Third Factor

The Copyright Office’s third-factor analysis, evaluating the amount and substantiality of copyrighted material used, confronts the reality that generative AI systems typically ingest entire works during training. Under §107, this factor examines whether the quantity copied is ‘reasonable in relation to the purpose of the copying,’[37] a context-sensitive inquiry that diverges sharply from precedents like Authors Guild v. Google.[38] Where Google Books’ full-text copying enabled non-expressive search functions and limited snippet displays, the Office emphasises that AI’s wholesale ingestion lacks comparable transformative justification, observing that ‘the use of entire copyrighted works is less clearly justified in the context of AI training than it was for Google books or thumbnail image search.’[39]

Crucially, the report rejects categorical condemnation of full-work copying, acknowledging that functional necessity may render such scale reasonable if developers demonstrate both 1) a highly transformative purpose for training, and 2) robust technical safeguards preventing output of substantially similar protected expression. This nuanced calibration reflects Sega Enterprises v. Accolade’s legacy[40] where reverse-engineering entire software packages was deemed reasonable for interoperability while underscoring AI’s distinct risks; absent guardrails, models risk regurgitating protected content at scale. The analysis positions output controls as pivotal mitigators; where effective constraints exist, the third factor’s weight against fair use diminishes proportionality.

Yet the Office tempers this flexibility with stark caution. Training on qualitatively significant portions such as a photograph’s compositional essence, intensifies infringement concerns even when quantitatively minor, per Harper & Row’s ‘heart of the work’ doctrine.[41] Unpublished materials attract particular scrutiny, as their unauthorised ingestion deprives rights holders of first publication control. Ultimately, while full-scale copying proves functionally necessary for model optimisation, its justification remains contingent on evidence that deployment contexts avoid market substitution.

Fourth Factor

The Copyright Office’s analysis of the fourth fair use factor, effect on the potential market for or value of the copyrighted work, arguably constitutes the report’s most consequential and controversial intervention, introducing market dilution as a novel theory of harm that expands traditional infringement paradigms. While reaffirming established harms like lost sales from direct displacement by AI-generated substitutes, and lost licensing opportunities, emphasising that feasible markets for training data ‘disfavor fair use where licensing options exist,’[42] the Office contends that generative AI’s unprecedented scale enables uniquely corrosive market effects. Specifically, the report warns that AI’s capacity for stylistic imitation, even absent verbatim copying, could flood markets with outputs that lower prices, reduce demand for original works, and hurt authorship by saturating creative sectors with algorithmically generated content. This dilution theory, while acknowledging that copyright traditionally targets infringement rather than competition, posits that the speed and scale of AI output production threatens to devalue human creativity in ways courts have never before confronted it.

The Office grounds this theory in statutory language protecting a work’s ‘value’, arguing that style implicates ‘protectable elements of authorship’[43] and that saturation by stylistically derivative AI outputs could diminish a creator’s commercial distinctiveness. Though analogizing to Sony Corp v. Universal City Studios,[44] where the Court considered harms from ‘widespread’ unauthorised copying, the report concedes market dilution enters ‘uncharted territory’ judicially. No court has yet adopted such a framework, and its viability hinges on whether judges accept that non-infringing stylistic competition can constitute cognizable harm under fair use’s fourth factor. The Office acknowledges this theory’s vulnerability, noting courts may demand empirical evidence beyond policy concerns or anecdotal examples and that its persuasive authority under Skidmore deference depends on the strength of its reasoning.

Importantly, the dilution theory may face several doctrinal tensions. Firstly, copyright historically permits market competition from non-infringing works, even when it harms original creators.[45] Objections to AI-driven dilution stem from its ease of production,distribution, and resulting scale, raising questions about whether copyright should shield markets from technological disruption. Secondly, critics contend that recognising dilution could paradoxically stifle creativity by enabling rights holders to suppress tools producing non-infringing works, potentially chilling production and distribution of new works by human creators leveraging AI ethically.[46] Finally, the Office subtly invokes creators’ ‘economic and moral interests’ in their works’ unique stylistic value, aligning with scholarly views that ‘value’ encompasses non-substitutionary harms like lost attribution or cultural decontextualisation.[47]

Amid ongoing litigation like Kadrey v. Meta, where courts grapple with output-based market effects, the report’s dilution framework offers plaintiffs a strategic tool to argue systemic harm beyond individual infringement. Yet its ultimate judicial reception remains uncertain, particularly given the Office’s concurrent political upheaval and the theory’s departure from precedent, the dilution framework challenges the AI industry by inviting courts to reconsider whether copyright’s purpose, protecting the ‘fruits of intellectual labor’, must evolve to address algorithmic economies of scale.

Licensing

The Office’s report champions voluntary and collective licensing as the optimal path to resolve AI training disputes, explicitly favoring market-driven solutions over regulatory intervention. This approach recognises emerging industry practices; visual media platforms like Getty Images offer structured reuse agreements.[48] These real-world models demonstrate that scalable compensation frameworks are feasible, reducing transaction costs while enabling tailored terms for duration, exclusivity, and territorial scope.

For contexts where direct licensing remains impractical, the Office endorses extended collective licensing as a supplementary mechanism (ECL). Modeled on Scandinavian and UK systems, ECL empowers certified collecting management organizations to license entire repertoires (including non-members’ works) under government oversight, subject to robust opt-out rights that preserve creator autonomy. Such frameworks address the ‘copyright iceberg’[49] problem by covering orphan works and simplifying bulk permissions. Crucially, the Office rejects compulsory licensing as premature and incompatible with US copyright principles, noting the absence of systemic market failure justifying state-mandated rates. Voluntary agreements between AI developers and publishers, such as Adobe’s compensated artist partnerships for Firefly training,[50] demonstrate functional market dynamics without government coercion. While acknowledging ECL’s potential to bridge gaps, the report cautions against premature regulatory intrusion, emphasizing that licensing markets require space to evolve organically. Instead it advocates for targeted guardrails: certification standards to ensure CMO representativeness, ironclad opt-out protections, and pilot programs in discrete sectors like academic publishing before broader implementation.

Concluding Thoughts

Cooper and Grimmelmann’s incisive reminder, that AI models are not ‘magical portals’ extracting knowledge from parallel universes but data structures built from human creative labor, anchors the Office’s report. The Office methodically establishes that training generative AI implicates reproduction rights at every stage: dataset creation, weight memorization, and RAG’s real-time copying. Its rigorous fair use analysis dismantles industry claims of inherent transformativeness, instead demanding context-specific scrutiny of outputs and market harm. Most provocatively, it endorses market dilution as recognizable injury, implying that stylistic imitation at scale devalues human artistry even without infringement.

Yet the report’s release amid leadership upheaval and pending litigation leaves its authority in flux. While championing voluntary licensing as the optimal path, its novel doctrinal frameworks, particularly dilution, face untested judicial terrain. Ultimately, the Office charts a pragmatic course, acknowledging AI’s technical necessities while centering copyright’s mandate to protect creative labor. As Cooper and Grimmelmann caution, progress lies not in magical thinking about ‘parallel universes’, but in ethically engaging the human expression fueling these systems. The path forward demands negotiated coexistence, where innovation credits its sources, and creation retains its worth.

Suggested readings:

Why A.I. isn’t going to make art, The New Yorker, August 31 2024

Understanding artists’ perspectives on generative AI art and transparency, ownership, and fairness, AI Hub, January 14 2025

How artists are using generative AI to celebrate the natural world, UK Creative Festival, January 15 2025

Stopping the Trump Administration’s Unlawful Firing Of Copyright Office Director, Democracy Forward, May 22 2025

About the author:

Juliette is a final-year law student at the University of Bristol, graduating in 2025. She is interested in the evolving relationship between intellectual property law and artistic expression, which she hopes to explore further through an LLM next year. As a summer legal intern, she is contributing to research in this field while contributing to the Center’s Nazi-Looted Art Database.

  1. DALL·E 3 is a text-to-image model developed by OpenAI that uses deep learning to generate digital images from text prompts. Released in October 2023, it is integrated into ChatGPT for Plus and Enterprise users, and is also accessible via OpenAI’s API and Labs platform. ↑
  2. A Feder Cooper and James Grimmelmann, ‘The Files are in the Computer: Copyright, Memorization and Generative AI’ (2023) 23–24 ↑
  3. Adam Zewe, ‘Explained: Generative AI’ (MIT News, 9 November 2023) https://news.mit.edu/2023/explained-generative-ai-1109 ↑
  4. Jordan Hoffmann and others, ‘Training Compute‑Optimal Large Language Models’ (arXiv, 29 March 2022) 1 https://arxiv.org/abs/2203.15556 ↑
  5. Digital Media Licensing Association (DMLA), Initial Comments in response to U.S. Copyright Office Copyright and Artificial Intelligence: Part 3 – Generative AI Training (Pre‑publication version, March 2025) 10–11 ↑
  6. Competition and Markets Authority, AI Foundation Models – Technical update report (GOV.UK, 16 April 2024) 1, 85 https://assets.publishing.service.gov.uk/media/661e5a4c7469198185bd3d62/AI_Foundation_Models_technical_update_report.pdf ↑
  7. Gil Appel, Juliana Neelbauer and David A Schweidel, ‘Generative AI Has an Intellectual Property Problem’ (Harvard Business Review, 7 April 2023) https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem ↑
  8. The New York Times Co. v. Microsoft Corp., No. 1:24-cv-00034 (S.D.N.Y. filed Dec. 27, 2023) ↑
  9. Zhang v. Google LLC, No. 3:24-cv-00487 (N.D. Cal. filed Jan. 26, 2024) ↑
  10. Andersen v. Stability AI Ltd., No. 3:23-cv-00201 (N.D. Cal. filed Jan. 13, 2023) ↑
  11. Getty Images (US), Inc. v. Stability AI, Inc., No. 1:23-cv-00135 (D. Del. filed Feb. 3, 2023) ↑
  12. Thomson Reuters Enter. Ctr. GmbH v. Ross Intelligence Inc., No. 1:20-cv-00613 (D. Del. filed May 6, 2020) ↑
  13. Id. ↑
  14. Five Takeaways from the Copyright Office’s Controversial New AI … (Copyright Lately) https://copyrightlately.com/copyright-office-ai-report/ ↑
  15. U.S. Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 1–3 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  16. Lisa O’Carroll, ‘Trump fires copyright office supremo Shira Perlmutter after AI report’ (The Guardian, 12 May 2025) https://www.theguardian.com/us-news/2025/may/12/trump-fires-copyright-office-shira-perlmutter ↑
  17. Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417 (N.D. Cal. filed July 7, 2023) ↑
  18. Bartz v. Anthropic PBC, No. 2:24-cv-01523 (C.D. Cal. filed Mar. 1, 2024) ↑
  19. The Development of Generative Artificial Intelligence from a Copyright Perspective (European Parliament, JURI Committee, Study prepared by University of Turin & Nexa Centre, 12 May 2025) 1–5 https://www.europarl.europa.eu/meetdocs/2024_2029/plmrep/COMMITTEES/JURI/DV/2025/05-12/2025.05.12_item6_Study_GenAIfromacopyrightperspective_EN.pdf ↑
  20. U.S. Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 1–2 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  21. Kadrey v. Meta Platforms, Inc., No. 3:23‑cv‑03417‑VC (N.D. Cal. filed July 7, 2023) ↑
  22. Andersen v. Stability AI Ltd., No. 3:23‑cv‑00201‑WHO (N.D. Cal. filed Jan. 13, 2023) ↑
  23. Aleksander Goranin, ‘A Deep Look at Copyright’s Volitional Conduct Doctrine and Generative Artificial Intelligence’ (forthcoming, Emory Law Journal) ↑
  24. Google Cloud, Retrieval‑Augmented Generation use case (Google Cloud, last updated 13 June 2025)https://cloud.google.com/use-cases/retrieval-augmented-generation ↑
  25. Dan Jasnow, Danielle W Bulger and Nardeen Billan, ‘Generative AI Meets Generative Litigation: News Corp Continues Its Battle Against Perplexity AI’ (National Law Review, 20 December 2024) https://natlawreview.com/article/generative-ai-meets-generative-litigation-news-corp-continues-its-battle-against ↑
  26. Dow Jones & Co., Inc. v. Perplexity AI, Inc., No. 24-CV-7984 (S.D.N.Y. Dec. 11, 2024) ↑
  27. Authors Guild, Inc. v. Google, Inc., 804 F.3d 202 (2d Cir. 2015) ↑
  28. LexisNexis Launches Data+ API for AI Training (Artificial Lawyer, 9 December 2024) https://www.artificiallawyer.com/2024/12/09/lexisnexis-launches-data-api-for-ai-training/ ↑
  29. Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508 (2023) ↑
  30. Tiago Freitas and Eliot Mannoia, ‘Parallels Between Biological and Artificial Brains: Isolation vs Recursive Training’ (BrandKarma, 10 November 2024) https://www.brandkarma.at/opinions/parallels-between-biological-and-artificial-brains-isolation-vs-recursive-training/ ↑
  31. LLM Security and Prompt Engineering Digest: LLM Shadows (Adversa.ai, 3 August 2023) https://adversa.ai/blog/llm-security-and-prompt-engineering-digest-llm-shadows/ ↑
  32. Campbell v. Acuff‑Rose Music, Inc., 510 U.S. 569 (1994) ↑
  33. Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417 (1984) ↑
  34. Swatch Group Mgmt. Servs. Ltd. v. Bloomberg L.P., 742 F.3d 17 (2d Cir. 2014) ↑
  35. New Media Rights, Initial Comments in response to US Copyright Office, Copyright and Artificial Intelligence: Part 3 – Generative AI Training (Pre‑publication version, 9 May 2025) 16; Data Provenance Initiative, Initial Comments ibid 10–11; Katherine Lee and others, Initial Comments ibid 102. ↑
  36. The Authors Guild, Initial Comments in response to US Copyright Office, Copyright and Artificial Intelligence: Part 3 – Generative AI Training (Pre‑publication version, 9 May 2025) 20. ↑
  37. United States Code, Title 17 § 107 (2023) ↑
  38. Authors Guild, Inc. v. Google Inc., 804 F.3d 202 (2d Cir. Oct. 16, 2015) ↑
  39. US Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 57 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  40. Sega Enters. Ltd. v. Accolade, Inc., 977 F.2d 1510 (9th Cir. 1992) ↑
  41. Harper & Row v. Nation Enterprises | 471 U.S. 539 (1985) ↑
  42. US Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre‑publication version, 9 May 2025) 54 https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf ↑
  43. TechNet, Initial Comments in response to US Copyright Office, Copyright and Artificial Intelligence: Part 3 – Generative AI Training (30 October 2023) 11 ↑
  44. Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417 (1984) ↑
  45. World Intellectual Property Organization, Copyright, Competition and Development (WIPO‑mandated survey by Max Planck Institute, December 2013) https://www.wipo.int/export/sites/www/competition-policy/en/docs/copyright_competition_development.pdf ↑
  46. World Intellectual Property Organization, Copyright, Competition and Development (WIPO‑mandated survey by Max Planck Institute, December 2013) https://www.wipo.int/export/sites/www/competition-policy/en/docs/copyright_competition_development.pdf ↑
  47. Todd A Carpenter, ‘Ensuring attribution is critical when licensing content to AI developers’ (The Scholarly Kitchen, 4 September 2024) https://scholarlykitchen.sspnet.org/2024/09/04/make-attribution-mandatory-in-ai-licensing/ ↑
  48. Getty Images, Content License Agreement (last updated October 2024) https://www.gettyimages.co.uk/eula ↑
  49. George H Pike, ‘AI and Copyright: Steering Around the Iceberg’ (Information Today, vol 40 no 8, October 2023) 24 ↑
  50. Adobe, ‘Adobe’s approach to customer choice in AI models’ (Adobe Blog, 18 March 2025) https://blog.adobe.com/en/publish/2025/03/18/adobes-approach-customer-choice-in-ai-models ↑

 

Disclaimer: This article is for educational purposes only and is not meant to provide legal advice. Readers should not construe or rely on any comment or statement in this article as legal advice. For legal advice, readers should seek a consultation with an attorney.

Post navigation

Previous Witness: The Testimony of a Beheaded Sculpture
Next Decades of Dispute: The Latest Bubon Bronze to be Repatriated to Türkiye

Related Art Law Articles

Center for Art Law AI Artibtrator Article
Art lawadr

No Industry Seems Untouched by the AI Avalanche – Where Does AI Stand With ADR? Or Better Asked, Where Does ADR Stand With AI?

February 25, 2026
Center for Art Law AML Laundry Machines Ad
Art law

Regulation Without Legislation: Combatting Money Laundering in the U.S. Art Market

February 21, 2026
Center for Art Law Susan (Central Park) Legacy Over Licensing Josie Goettel
Art lawcopyrightlicensing

Legacy Over Licensing: How Artist Estates and Museums Are Redefining Control in the Digital Age

February 19, 2026
Center for Art Law
Summer School Promo

2026 Art Law Summer School

Applications Now Open

Want to learn MORE about art law? Join us for an unforgettable week of art law in NYC!

 

Apply Now
Center for Art Law

Follow us on Instagram for the latest in Art Law!

Have you seen the 2024 documentary "The Spoils"? O Have you seen the 2024 documentary "The Spoils"? Our latest review covers Jamie Kastner's film that follows the Max Stern Foundation's restitution efforts and asks hard questions about who holds power in the art world. Savannah Weiler reviews it and we want to hear your take. Read it via the link in bio and drop your thoughts in the comments! 👇 

#centerforartlaw #FILMREVIEW #nazieralootedart #maxsternfoundation
Smile — you're at the Center for Art Law! 🌷 Meet o Smile — you're at the Center for Art Law! 🌷 Meet our Spring 2026 intern team, joining us from schools and graduate programs across the country! 🎓 

Our Spring 2026 Interns have been learning and working hard starting January! We are pleased to introduce to you Donyea James (Legal Intern, Fordham Law, 3L), Alexandra Kharchenko (Legal Intern, French LLM Grad of Northwestern Pritzker School of Law), Jacqueline Koutrodimos-Lewis (Graduate Intern, with MA in Classics and BA in Art History), Prisha Mehta (Undergraduate Intern, University of Texas at Austin, Class of 2026), Halle O’Hern (Legal Intern, Brooklyn Law, 2L), Marina Rastorfer (Legal Intern, Cardozo Law, LLM), and Savannah Weiler (Graduate Intern, MA in History of Art). 

From legal research to event planning, our interns are doing it all — under careful supervision!

Interested in joining our team? Fall 2026 internships begin the 2nd week of September — visit the link in our bio to learn more!
📌 We are looking for interns who can commit to working with us the entire academic year. 

#ArtLaw #LegalInterns #SpringInterns #InternSpotlight #ArtAndLaw #LawSchool #Internship BrooklynLawSchool #FordhamLaw #CardozoLaw #Northwestern #UTAustin #ClassicsAndArt #ArtHistory #NextGenLawyers
🏒 🎨⚖️ Thank you to all the applicants interested 🏒 🎨⚖️

Thank you to all the applicants interested in our 2026 summer internship program. We are humbled by the talent and volume of applications received. We only wish we could offer placement to all of you. If we cannot accommodate your interest this summer, please consider joining us as guest writers, volunteers and students at the upcoming summer school.
Grab an Early Bird Discount for our new CLE progra Grab an Early Bird Discount for our new CLE program to train lawyers to assist visual artists and dealers in the unique aspects of their relationship.

Center for Art Law’s Art Lawyering Bootcamp: Artist-Dealer Relationships is an in-person, full-day training aimed at preparing lawyers for working with visual artists and dealers, in the unique aspects of their relationship. The bootcamp will be led by veteran attorneys specializing in art law.

This Bootcamp provides participants -- attorneys, law students, law graduates and legal professionals -- with foundational legal knowledge related to the main contracts and regulations governing dealers' and artists' businesses. Through a combination of instructional presentations and mock consultations, participants will gain a solid foundation in the specificities of the law as applied to the visual arts.

Bootcamp participants will be provided with training materials, including presentation slides and an Art Lawyering Bootcamp handbook with additional reading resources.

The event will take place at DLA Piper, 1251 6th Avenue, New York, NY. 9am -5pm.

Art Lawyering Bootcamp participants with CLE tickets will receive New York CLE credits upon successful completion of the training modules. CLE credits pending board approval. 

🎟️ Grab tickets using the link in our bio! 

#centerforartlaw #artlaw #legal #research #lawyer #artlawyer #bootcamp #artistdealer #CLE #trainingprogram
A recent report by the World Jewish Restitution Or A recent report by the World Jewish Restitution Organization (WRJO) states that most American museums provide inadequate provenance information for potentially Nazi-looted objects held in their collections. This is an ongoing problem, as emphasized by the closure of the Nazi-Era Provenance Internet Portal last year. Established in 2003, the portal was intended to act as a public registry of potentially looted art held in museum collections across the United States. However, over its 21-year lifespan, the portal's practitioners struggled to secure ongoing funding and it ultimately became outdated. 

The WJRO report highlights this failure, noting that museums themselves have done little to make provenance information easily accessible. This lack of transparency is a serious blow to the efforts of Holocaust survivors and their descendants to secure the repatriation of seized artworks. WJRO President Gideon Taylor urged American museums to make more tangible efforts to cooperate with Holocaust survivors and their families in their pursuit of justice.

🔗 Click the link in our bio to read more.

#centerforartlaw #artlaw #museumissues #nazilootedart #wwii #artlawyer #legalresearch
Join us for the Second Edition of Center for Art L Join us for the Second Edition of Center for Art Law Summer School! An immersive five-day educational program designed for individuals interested in the dynamic and ever-evolving field of art law. 

Taking place in the vibrant art hub of New York City, the program will provide participants with a foundational understanding of art law, opportunities to explore key issues in the field, and access to a network of professionals and peers with shared interests. Participants will also have the opportunity to see how things work from a hands-on and practical perspective by visiting galleries, artist studios, auction houses and law firms, and speak with professionals dedicated to and passionate about the field. 

Applications are open now through March 1st!

🎟️ APPLY NOW using the link in our bio! 

#centerforartlaw #artlawsummerschool #newyork #artlaw #artlawyer #legal #lawyer #art
Join us for an informative presentation and pro bo Join us for an informative presentation and pro bono consultations to better understand the current art and copyright law landscape. Copyright law is a body of federal law that grants authors exclusive rights over their original works — from paintings and photographs to sculptures, as well as other fixed and tangible creative forms. Once protection attaches, copyright owners have exclusive economic rights that allow them to control how their work is reproduced, modified and distributed, among other uses.

Albeit theoretically simple, in practice copyright law is complex and nuanced: what works acquire such protection? How can creatives better protect their assets or, if they wish, exploit them for their monetary benefit? 

🎟️ Grab tickets using the link in our bio! 

#centerforartlaw #artlaw #legal #research #lawyer #artlawyer #bootcamp #copyright #CLE #trainingprogram
In October, the Hispanic Society Museum and Librar In October, the Hispanic Society Museum and Library deaccessioned forty five paintings from its collection through an auction at Christie's. The sale included primarily Old-Master paintings of religious and aristocratic subjects. Notable works in the sale included a painting from the workshop of El Greco, a copy of a work by Titian, as well as a portrait of Isabella of Portugal, and Clemente Del Camino y Parladé’s “El Columpio (The Swing). 

The purpose of the sale was to raise funds to further diversify the museum's collection. In a statement, the institution stated that the works selected for sale are not in line with their core mission as they seek to expand and diversify their collection.

🔗 Click the link in our bio to read more.

#centerforartlaw #artlawnews #artlawresearch #legalresearch #artlawyer #art #lawyer
Check out our new episode where Paris and Andrea s Check out our new episode where Paris and Andrea speak with Ali Nour, who recounts his journey from Khartoum to Cairo amid the ongoing civil war, and describes how he became involved with the Emergency Response Committee - a group of Sudanese heritage officials working to safeguard Sudan’s cultural heritage. 

🎙️ Click the link in our bio to listen anywhere you get your podcasts! 

#centerforartlaw #artlaw #artlawyer #legal #research #podcast #february #legalresearch #newepisode #culturalheritage #sudaneseheritage
When you see ‘February’ what comes to mind? Birthd When you see ‘February’ what comes to mind? Birthdays of friends? Olympic games? Anniversary of war? Democracy dying in darkness? Days getting longer? We could have chosen a better image for the February cover but somehow the 1913 work of Umberto Boccioni (an artist who died during World War 1) “Dynamism of a Soccer Player” seemed to hit the right note. Let’s keep going, individuals and team players.

Center for Art Law is pressing on with events and research. We have over 200 applications to review for the Summer Internship Program, meetings, obligations. Reach out if you have questions or suggestions. We cannot wait to introduce to you our Spring Interns and we encourage you to share and keep channels of communication open. 

📚 Read more using the link in our bio! Make sure to subscribe so you don't miss any upcoming newsletters!

#centerforartlaw #artlaw #artlawyer #legal #research #newsletter #february #legalresearch
Join the Center for Art Law for conversation with Join the Center for Art Law for conversation with Frank Born and Caryn Keppler on legacy and estate planning!

When planning for the preservation of their professional legacies and the future custodianship of their oeuvres’, artists are faced with unique concerns and challenges. Frank Born, artist and art dealer, and Caryn Keppler, tax and estate attorney, will share their perspectives on legacy and estate planning. Discussion will focus on which documents to gather, and which professionals to get in touch with throughout the process of legacy planning.

This event is affiliated with the Artist Legacy and Estate Planning Clinic which seeks to connect artists, estate administrators, attorneys, tax advisors, and other experts to create meaningful and lasting solutions for expanding the art canon and art legacy planning. 

🎟️ Grab tickets using the link in our bio! 

#centerforartlaw #artlaw #clinic #artlawyer #estateplanning #artistlegacy #legal #research #lawclinic
Authentication is an inherently uncertain practice Authentication is an inherently uncertain practice, one that the art market must depend upon. Although, auction houses don't have to guarantee  authenticity, they have legal duties related to contract law, tort law, and industry customs. The impact of the Old Master cases, sparked change in the industry including Sotheby's acquisition of Orion Analytical. 

📚 To read more about the liabilities of auction houses and the change in forensic tools, read Vivianne Diaz's published article using the link in our bio!
  • About the Center
  • Contact Us
  • Newsletter
  • Upcoming Events
  • Internship
  • Case Law Database
  • Log in
  • Become a Member
  • Donate
DISCLAIMER

Center for Art Law is a New York State non-profit fully qualified under provision 501(c)(3)
of the Internal Revenue Code.

The Center does not provide legal representation. Information available on this website is
purely for educational purposes only and should not be construed as legal advice.

TERMS OF USE AND PRIVACY POLICY

Your use of the Site (as defined below) constitutes your consent to this Agreement. Please
read our Terms of Use and Privacy Policy carefully.

© 2026 Center for Art Law
 

Loading Comments...
 

You must be logged in to post a comment.