By Ben Belchak, Head of Site Reliability Engineering, Shazam
At Shazam, we’ve been heavy users of graphics processing units (GPUs) for our recognition services since 2012, starting with the NVIDIA TESLA M2090 and working our way up to the K80 today. We’ve traditionally used bare metal servers because GPUs in the cloud have not been available, and when they have, they were far too expensive and not performant enough for our needs. Only recently have the economics of GPUs in the cloud really made sense for our business. This is what kicked off our journey to Google Cloud Platform (GCP).
For certain tasks, GPUs are a cost-effective and high-performance alternative to traditional CPUs. They work great with Shazam’s core music recognition workload, in which we match snippets of user-recorded audio fingerprints against our catalog of over 40 million songs. We do that by taking the audio signatures of each and every song, compiling them into a custom database format and loading them into GPU memory. Whenever a user Shazams a song, our algorithm uses GPUs to search that database until it finds a match. This happens successfully over 20 million times per day.
To meet that demand, we’ve been maintaining a fleet of GPUs on dedicated bare metal servers that we lease from a managed services provider. Because of the time it takes to source and provision a new physical server, we provision enough to meet peak demand and then run that capacity 24/7, 365 days a year. We kept costs under control by improving our algorithms and by taking advantage of ever-evolving GPU architectures and the performance improvements they brought. About six months ago, though, we began experimenting with GPUs running on Compute Engine. Thanks to the speed with which we can dial new instances up and down, we maintain GPU infrastructure to handle average use instead of the full capacity for our maximum peak load. Thus far, we’ve migrated about one-third of our infrastructure into Google Cloud.
In order to efficiently search our massive catalog of music, we maintain multiple levels of GPU server clusters that we call "tiers." A first tier searches against a database of the most popular songs’ audio signatures, while subsequent tiers search longer samples against progressively more and more obscure music databases. In this way, Shazam identifies, say, "Thinking Out Loud" by Ed Sheeran in a single pass from a short sample, but might need several passes and a much longer sample to identify a 1950s recording of a Lithuanian polka group (being able to match really obscure music in addition to popular music is what makes using Shazam such a magical experience for our users).
Increasing the hit rate on the first line of servers depends on keeping the index files up to date with the latest popular music. That’s hard to do given how quickly music falls in and out of favor. Some surges in traffic we can plan and pre-scale for, such as the Super Bowl, the Grammy’s, or even our first branded game show, "BEAT SHAZAM." Other surges we cannot predict — say, a local radio station in a large market reviving an old R&B hit, or when a track that was never popular is suddenly featured in a TV advertisement. And that’s not counting new music, which we add to our catalog every day through submissions from labels as well as by in-house music experts who are constantly searching for new music.
Of course, running on bare metal servers, we also need to provision extra capacity for the inevitable failure scenarios we all experience when operating services at scale. One of the amazing benefits of running in Google is that we can now replace a failed node in just minutes with a brand new one “off the shelf” instead of keeping a pool of nodes around just waiting for failures. In our managed service provider, we had to provision GPUs in groups of four cards per machine, with two dies per card. That meant that we could lose up to eight shards of our database when a node failed. Now, in Google, we provision one VM per shard, which localizes the impact of a node failure to a single shard instead of eight.
An unexpected benefit of using Google Cloud GPUs has been to increase how often we recalculate and update our audio signature database, which is actually quite computationally intense. On dedicated infrastructure, we update the index of popular songs daily. On Google Cloud, we can recompile the index and reimage the GPU instance in well under an hour, so the index files are always up-to-date.
This flexibility allows us to begin considering dynamic cluster configurations. For instance, because of the way our algorithm works, it’s much easier for us to identify songs that were Shazamed in a car, which is a relatively quiet environment, than it is to identify songs Shazamed in a restaurant, where talking and clanging obscure the song’s audio signature. With the flexibility that cloud-based GPUs afford us, we have many more options available to us for configuring our tiers to match the specific demands that our users throw at us at different times of day. For example, we may be able to reconfigure our clusters according to time of day — morning drive time, vs. happy hour at the bar.
It’s exciting to think about the possibilities that using GPUs in Google Cloud opens up, and we look forward to working with Google Cloud as it adds new GPU offerings to its lineup.
You can find out more details about our move to Google Cloud Platform here: http://bit.ly/2psMcjc