Here at Pandora, we have recently completely redesigned the way we analyze music for the Music Genome Project, with a new system we call MGP2. We’ve developed a collection of new taxonomies that we use to describe songs, and a new, text-based tagging system that allows us to annotate music much more accurately and completely. This new way of annotating has improved all of our downstream systems and led to important improvements in the data science, machine learning, and content understanding that make our recommendation systems the best in the world.
What is MGP2?
When the Music Genome Project first began over 20 years ago, music analysis was done on pencil and paper, with analysts manually ripping CDs (remember those?) into the ingestion systems. Music Analysts analyzing music in the early days of PandoraThe Music Genome Project Interface was soon developed, consisting of a series of “genes” that needed to be scored on a ten-point scale. This system enabled us to gather a massive amount of uniform data on millions of songs. However, as the years went by and music evolved, the original fixed set of genes could not keep up. As the tech stack aged, making changes to the system proved too costly and impractical. In the rapidly changing and increasingly genre-agnostic world of modern music, we needed a more fluid and flexible way to analyze songs.
Enter MGP2. This semantic, tag-based system lets us listen to songs and then add tags from a set of taxonomies. These taxonomies include:
Having fluid and dynamic control over these taxonomies allows us to add or adjust things as needed. It also turns out that these tags are much easier for humans to interpret, and also easier for machine learning models to use as inputs.
MGP2 also allows us to provide a more detailed and in-depth level of song analysis. Here is an example: in our old MGP1 system, we could indicate if a song contained Afro-Latin rhythms, and to what degree. In the MGP2 system, we can still indicate a general “Afro-Latin Feel” if we want, but now we can get more specific, with the Afro-Latin Feel tag acting as a parent to a number of more specific tags, such as Merengue Feel, Bomba Feel, or Afro-Peruvian Feel.
Another example: Previously we could indicate if a song was in an odd meter. Now, we can say specifically what meter the song is in, with children of the Odd Meter tag including 5/4 Meter, 7/4 Meter, and Mixed Meter (eg, ⅞ + 5/4). This helps us have a more accurate understanding of the music so we can craft excellent listening experiences.
We also wrote thousands of translations, essentially a set of rules, to convert the numeric scores of the 2.2 million songs we analyzed in our old system into the text-based tags we created for MGP2. Once our science team updated our machine-learning models to use these tags as inputs, we saw immediate improvements in the scale and accuracy of all downstream systems, including predictions, recommendations, track grouping, and more. These models allow us to leverage the information from the 2.2 million songs we have analyzed onto the rest of the tens of millions of songs in our massive complete catalog. You can read more about the groundbreaking work they have published.
Perhaps two of the most significant details that we can now provide with our new MGP2 system are Genre and Mood. Our newest taxonomy is the Analysis Mood Taxonomy (AMT), which allows us to tag songs with a wide range of specific emotional states. Our comprehensive genre taxonomy, AGT, is a detailed hierarchy of genres with over 1400 specific sub-genres, painstakingly organized by our expert Music Analysts.
All together, the new MGP2 system, built on a modern tech stack with flexible and dynamic taxonomies, is a large leap forward for the systems that give Pandora the most thorough and sophisticated content understanding, and the best music recommendations in the world.