The rapid rise of generative AI has sparked a heated debate about the use of copyrighted material in training datasets. A new bill introduced in the US Congress on Tuesday, the Generative AI Copyright Disclosure Act, aims to bring transparency to this contentious issue by requiring AI companies to disclose the copyrighted works they use to train their models.

Everything you need to know:
✓ The Generative AI Copyright Disclosure Act would require AI companies to disclose copyrighted works used in training datasets.
✓ The bill has garnered support from numerous entertainment industry organizations and unions.
✓ AI companies argue that their use of copyrighted material falls under fair use, setting the stage for a major legal battle.
Introduced by California Democratic congressman Adam Schiff, the bill would mandate that AI companies submit a detailed summary of any copyrighted works used in their training datasets to the Register of Copyrights at least 30 days before releasing new generative AI systems. Failure to comply with this requirement would result in financial penalties.
The legislation comes amid a growing number of lawsuits and government investigations into whether major AI companies have made illegal use of copyrighted works, such as songs, visual art, books, and movies, to build their tools. The bill has received support from numerous entertainment industry organizations and unions, including the Recording Industry Association of America, Professional Photographers of America, Directors Guild of America, and the Screen Actors Guild-American Federation of Television and Radio Artists.
Duncan Crabtree-Ireland, SAG-AFTRA’s national executive director and chief negotiator, emphasized the importance of protecting human creative content, stating, “Everything generated by AI ultimately originates from a human creative source. That’s why human creative content–intellectual property–must be protected.”
However, prominent artificial intelligence companies like OpenAI, which are facing lawsuits over their alleged use of copyrighted works, have denied wrongdoing. They claim that their use of copyrighted material falls under fair use, a legal doctrine that allows for some unlicensed use of copyrighted materials under certain conditions. This legal strategy poses a major test for copyright law, with the potential to significantly impact artists’ livelihoods or OpenAI’s bottom line.
In a submission to a UK government committee earlier this year, lawyers for OpenAI contended that “legally, copyright law does not forbid training.” They also stated that without access to copyrighted works, their tools would cease to function. This stance has drawn criticism from entertainment industry workers who argue that generative AI poses a threat to artists’ rights.
Last week, a group of over 200 high-profile musical artists released an open letter calling for increased protections against AI and demanding that companies refrain from developing tools that could undermine or replace musicians and songwriters. The letter highlights the growing concern among creative professionals about the potential impact of generative AI on their livelihoods.
The Generative AI Copyright Disclosure Act is one of several attempts by lawmakers to address the challenges posed by the rapid advancement of AI technology. While the bill does not go as far as mandating that AI developers license copyrighted works, it aims to bring much-needed transparency to the process of AI training.
The proposed legislation is likely to face pushback from AI companies, who have argued that AI training datasets are often so large that cataloging every instance of copyrighted content would be extremely expensive and inefficient. However, the non-profit group Fairly Trained, established earlier this year to certify AI models based on their respect for creators’ rights, has challenged this claim.
Fairly Trained recently certified its first large language model (LLM), demonstrating that it is possible for AI developers to work in a way that respects creators’ rights. The organization stated:
“There is no fundamental reason that large language model developers can’t work in a way that respects creators’ rights. Today’s announcement answers this question, and strengthens our belief in a future in which a fair approach to training data is the norm.”
As the debate surrounding generative AI and copyrighted material continues, the Generative AI Copyright Disclosure Act represents a significant step towards bringing transparency and accountability to the industry. The outcome of this legislation, along with ongoing legal battles, will likely shape the future of AI development and its relationship with the creative industries.