Skip to content
This week I recommend: Riffle
The AI Musicpreneur
AI Music News

Anghami and Cyanite’s AI metadata deal reveals how Western training data fails Arabic music

4 min read Published By Christopher Wieduwilt
Anghami and Cyanite logos connected by a link icon on a pastel gradient background.

Key highlights

  • Cyanite’s auto-tagging API now covers 2.5M songs across Anghami’s catalog in 16 MENA countries
  • A 2025 MBZUAI study found 94% of generative AI music training data is Western, with Middle Eastern music making up 0.4% of training hours
  • Audio-based AI tagging bypasses the cold-start problem that buries regional tracks on behavioral recommendation platforms

Cyanite tags 2.5M songs across Anghami’s catalog

Anghami, the MENA streaming platform with 120M registered users across 16 countries (NASDAQ: ANGH), has integrated Cyanite’s auto-tagging API across 2.5 million songs. Each track now gets AI-generated metadata for mood, genre, energy, instrumentation, voice type, era, BPM, and tempo.

Cyanite’s system analyzes audio waveforms directly. It doesn’t rely on user behavior data (skips, saves, playlist adds) the way collaborative filtering models do. The company has tagged 45M+ songs for 150+ clients and acquired aptone in 2023 to handle catalog-scale infrastructure.

Anghami merged with OSN+ in April 2024, expanding its reach as a combined entertainment platform.

Western bias in AI music training data

The deal matters because of what a 2025 study from MBZUAI (published at NAACL) quantified: 94% of generative AI music model training data is Western. The Missing Melodies study found European music accounts for 6,127 hours in major datasets, while Middle Eastern music sits at 569 hours, or 0.4% of total training data.

The MBZUAI training data analysis pinpointed a specific technical failure: Arabic Maqam uses 24 quarter tones, compared to the 12-semitone Western chromatic scale. The researchers wrote: “SunoAI, when attempting to generate Maqamat music of the Middle East, may round off the microtones to the nearest Western equivalent, resulting in a piece that lacks the distinctive sound of Arabic music.”

Their conclusion: “addressing dataset bias is essential to building inclusive music generation systems.”

Audio-based tagging solves the cold-start problem

Cyanite product screenshots showing Auto-Tagging (genre, mood, and BPM tags for tracks by Jungle, Charli xcx, and Skepta) and Auto-Descriptions (a natural-language summary of a Jungle track)
Screenshot

Behavioral recommendation systems need listen and skip data to work. Regional tracks with no play history get zero discovery signals. Cyanite’s audio-based approach tags songs from zero plays, pulling metadata from the waveform itself.

Markus Schwarzer, Cyanite CEO, said: “Ensuring that Arabic repertoire is tagged with the same precision as Western music is not trivial. We’re proud that our audio-based AI can support music discovery at this scale and across such a rich regional landscape.

Elias El Khoury, Anghami’s VP of Information and Content Systems, added: “Arabic music carries immense depth, emotion and cultural nuance. Through our partnership with Cyanite, we’re ensuring that this richness is understood at a data level, allowing us to power more accurate personalisation and elevate discovery for millions of listeners.”

MENA streaming revenue is growing fast

The IFPI 2026 MENA figures show the region’s recorded music revenue grew 15.2% in 2025, making it the joint 2nd fastest-growing market globally. 97.5% of MENA revenues come from streaming.

For music professionals, this deal points to a structural gap: platforms serving non-Western audiences can’t wait for generative AI models to fix their training data. Audio-based metadata fills the gap now. The same problem applies to other regional traditions. A Delhi startup called RaagaPay ethical AI dataset is building purpose-built training data for Hindustani classical music, which carries similar complexity with 80 metadata parameters compared to 12-15 for Western music.

Platforms are still working out how to handle AI-generated content alongside human-made music. Apple Music AI tagging introduced transparency labels, Deezer AI detection catches 60,000 AI tracks daily, and the broader AI streaming platform rules are still evolving. The metadata infrastructure question, who tags what and how accurately, sits underneath all of it.

Frequently asked questions

Why can’t Western AI models tag Arabic music accurately?

Arabic music uses the Maqam system with 24 quarter tones, while Western music theory relies on 12 semitones. AI models trained on Western data tend to round microtones to the nearest Western pitch, stripping the distinctive sound of Arabic scales. The Missing Melodies study found Middle Eastern music makes up 0.4% of major training datasets, so models have almost no reference material for these tonal systems.

What is the cold-start problem in music streaming?

Behavioral recommendation engines (like Spotify’s collaborative filtering) need user interaction data, such as listens, skips, and saves, to recommend tracks. New or regional songs with no play history get no signals, so they don’t surface in recommendations. Audio-based tagging systems like Cyanite’s analyze the sound directly, generating metadata even for tracks with zero plays. You can read more about why Spotify buries AI songs and how recommendation systems shape visibility.

How many songs has Cyanite tagged in total?

Cyanite has tagged over 45M songs across 150+ clients. The Anghami deal adds 2.5M songs to this total. The company’s Cyanite aptone acquisition in 2023 gave it the catalog-scale infrastructure needed for deals of this size.

About the author

Photo of Christopher Wieduwilt

Christopher Wieduwilt

AI Music Educator & Journalist

Covering AI music tools, industry shifts, and news for music creators and professionals. Twice-weekly newsletter at aimusicpreneur.com.

Share this article