Skip to content
This week I recommend: Riffle
The AI Musicpreneur
AI Music News

The Atlantic publishes 4 databases of songs used to train AI music models

2 min read Published By Christopher Wieduwilt
Recording studio control room mixing console, illustrating songs used to train AI music models
Photo by Mike Logan, CC BY-SA 3.0, via Wikimedia Commons

The Atlantic has published four searchable databases of music used to train AI models, putting hard numbers on a question the industry has argued about for two years. The largest set lists 12 million tracks. A second holds 9 million. The reporting, by staff writer Alex Reisner, names hit songs from Taylor Swift, Bad Bunny, and a long list of other major artists.

What The Atlantic’s AI music databases reveal

The four datasets map the copyrighted music behind popular AI music generators, including the models from Suno, Udio, and Google. Until now, the training sources were mostly hidden, and the AI companies have leaned on fair use to defend scraping the songs without licenses.

The databases change that by making the sources searchable. Artists and labels can look up specific tracks, which is the kind of evidence that has been hard to produce in court.

The songs Suno reproduced, from “Thriller” to “Shape of You”

The investigation also shows what comes out the other end. Suno has generated tracks that strongly resemble Michael Jackson’s “Thriller,” Ed Sheeran’s “Shape of You,” Chuck Berry’s “Johnny B. Goode,” and others.

The “Thriller” example is one of dozens the major labels submitted in their copyright case against Suno. Asked about it, the company pushed back on the idea that this is normal output.

Suno uses safeguards to protect against unauthorized distribution, impersonation and manipulations, and reproductions of training data should not happen.
— Rachel Racusen, Suno spokesperson

Why this matters for artists and the AI lawsuits

The scale is the story. Spotify said last year it pulled 75 million spammy AI tracks, and Deezer now reports that close to half of the songs uploaded to it each day are AI generated. The Atlantic’s databases show part of what fed that flood.

For artists, searchable proof of which songs trained a model is the hard evidence these cases have lacked. It strengthens the labels’ cases against Suno and Udio, and it raises the pressure on platforms that still will not disclose what their own tools were trained on. You can read the full investigation at The Atlantic, with additional coverage from Engadget.

Frequently asked questions

What did The Atlantic's AI music investigation find?

The Atlantic, in reporting by staff writer Alex Reisner, published four searchable databases of music used to train AI models. The largest holds 12 million tracks and a second holds 9 million, with two smaller sets of about 100,000 each. The work documents the scale of copyrighted music behind tools like Suno, Udio, and Google's models.

How many songs are in The Atlantic's AI training databases?

One database lists 12 million tracks, another lists 9 million, and two more hold roughly 100,000 songs each. Together they map a large share of the copyrighted music used to train popular AI music generators.

What does Suno say about reproducing training data?

A Suno spokesperson said the platform uses safeguards against unauthorized distribution, impersonation, and manipulation, and pointed to a statement from its chief product officer that reproductions of training data should not happen. Suno did not address specific tracks named in the labels' lawsuit.

About the author

Photo of Christopher Wieduwilt

Christopher Wieduwilt

AI Music Educator & Journalist

Covering AI music tools, industry shifts, and news for music creators and professionals. Twice-weekly newsletter at aimusicpreneur.com.

Share this article