top of page
  • Writer's pictureTheRealSally

AI everywhere: The stealthy advent of Deep Learning in the Music Industry

Learn how AI is transforming the Music Industry and revolutionising your personal music streaming experience!

— By Preyansh Agarwal, SLAM researcher @ Sally Robotics.

Photo by James Stamler on Unsplash
Photo by James Stamler on Unsplash
“If I were not a physicist, I would probably be a musician. I often think in music. I live my daydreams in music. I see my life in terms of music.” — Albert Einstein

In this fast-paced world, Artificial Intelligence has penetrated almost every industry. But one sector we didn’t expect it to enter was the Music Industry. That’s right, and people have used Deep Learning to optimise and enhance various aspects in the field of music. These include Music Generation primarily, where companies are using AI to create, enhance and complement music content. It also contains Music Streaming, where machine learning and deep learning methods are used to recommend personalised content based on data from user activity, for example — Spotify. Music monetisation is another sector where artificial intelligence platforms are helping artists monetise their music content and generate revenue.

Let us first look into the history of neural networks and music!


History of Neural Networks and Music

This first wave of work was initiated in 1988 by Lewis and Todd, who proposed the use of neural networks for automatic music composition. Lewis used a multi-layer perceptron which led to the creation of DeepDream eventually. Todd, on the other hand, used an auto-regressive neural network (RNN) which is what the current LSTMs (Long Short Term Memory) models and Wavenet are based on. Eck and Schmidhuber in their paper Finding temporal structure in music: blues improvisation with LSTM, try to address one of the significant issues that algorithmically composed music had (and still has): the lack of global coherence or structure.

Before 2009, most works were addressing the problem of algorithmic music composition and trying to solve it via RNNs. In 2009, the AI winter ended, and the first bunch of profound learning works began to impact the field of music and audio AI. People started tackling more complex problems with deep learning classifiers. Lee and his colleagues then built the first deep convolutional neural network for music genre classification. This being the foundational work that established the basis for a generation of deep learning researchers who spent great efforts designing better models to recognise high-level (semantic) concepts from music spectrograms.

Currently, research is being done in two areas mainly: music information retrieval, which aims to design models capable of recognising the semantics present in music signals; and algorithmic composition, to computationally generate new appealing music pieces.

Now, let’s look in the various fields of music where AI has been playing a significant role in the current scenario.


Music Generation

The goal of researchers in this area is to artificially generate human-sounding music or something artist-specific, such as one of Mozart’s compositions. Some of the approaches in this area include:

An open-source research project exploring the role of machine learning as a tool in the creative process.

This project developed by Google Brain aims at creating a new tool for artists to use when working and developing new songs. They have developed several models to generate music. At the end of 2016, they published an LSTMmodel tuned with Reinforcement Learning. The exciting idea was to use Reinforcement Learning to teach the model to follow specific rules, while still allowing it to retain information learned from data. Magenta Studio is also available as a plugin in Ableton Live.

Musenet is a deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles.

This is an OpenAI music generation model. It uses NLP architecture in the form of a large-scale transformer model to predict the next token in a sequence. It can combine styles from different famous composers as well as various music genres.

Popgun’s technology helps everyone sing, play instruments, compose songs and master audio.

This Australian-based startup uses deep learning through a platform called ALICE to accompany or augment musical compositions. ALICE tries to predict what a musician will play, tries to accompany the musician and also attempts to improvise on what the musician is playing.

Amper Score™ enables enterprise teams to compose custom music in seconds and reclaim the time spent searching through stock music.

Amper is a cloud-based, AI-driven music composition platform. The system reportedly generates unique musical selections based on the mood, style, and duration parameters selected by the user. Once these selections are made, the user can make additional edits before the composition is complete.

Music Streaming

Photo by Zarak Khan on Unsplash
Photo by Zarak Khan on Unsplash

The entry of Artificial Intelligence in the domain of Music Streaming has wholly changed the streaming experience of a user. Music streaming app companies like Joox, QQ Music and KuGou have been using AI to analyse the preferences of their listeners and recommend specially curated playlists for personalised customer experience. By using AI-based recommendation engines, the music streaming applications examine the existing history of the listeners and recommend new songs. Spotify is one music giant which has been tremendously successful in this area.

The real impact that AI has made is using filtering engines, which scan through thousands of newly uploaded songs to develop playlists and recommendations targeted to each individual, eliminating the need for listeners to browse through thousands of songs to pick out favourites. Moreover, AI filtering engines do not restrict personalisation to single genres but instead gives a whole new definition to the word genre, by generating a playlist of supposedly unrelated songs considered good music by that individual. Apple Music has been optimising its For You section regularly using the above principle.

Endel is a sound startup signed by Warner Music Group. It is a cross-platform audio ecosystem which creates personalised, sound-based, adaptive environments that help people focus and relax. Even though the music produced wasn’t good enough to top the charts or make it to Billboard’s top 100, Warners signed a 20 album deal with the company. This is the impact of AI in today’s world, where user experience and comfort is of the utmost importance.

Let us look closer into one of the biggest music streaming platform actively utilising AI in their products.


Photo by Heidi Fin on Unsplash
Photo by Heidi Fin on Unsplash
Spotify is a digital music service that gives you access to millions of songs.

Spotify is the most significant on-demand music service application today. The firm has a record of pushing boundaries in technology by using AI and machine learning to enhance the user experience through nuanced customer data insights. One major success they had achieved quite early was the launch of the Discover Weekly Playlist, which reached 40 million people in the first year it was introduced. Each Monday, individual users are presented with a customised list of thirty songs which comprises tracks that the user might not have heard before, but the recommendations are generated based on the user’s search history pattern and potential music preference. The generation of this playlist is possible with a combination of 3 ML models: Collaborative Filtering, Natural Language Processing and Audio Models.


The collaborative filtering algorithm finds users that are similar to each other, based upon their usage — the songs in common they have listened to — and then recommends the songs that only one person has listened to the other. As with any new product, Spotify also faces the cold- start problem where one has no user data to act on. To tackle this, Spotify uses CNN’s (Convolutional Neural Network) and runs them over the acoustics of a song itself to analyse songs with similar acoustic patterns for the recommendation.

In terms of NLP, it uses a technique called Word2Vec, which takes words and encodes them into a vector. So vectors with a similar shape are more likely to have the same meaning. It takes playlists and treats them as a paragraph or big block of text, and treats each song in the playlist as an individual word. This results in vector representations of songs that can be used to determine two pieces of music that are similar. As such, Spotify can decide which songs are similar to each other, thus enabling it to tackle the cold start problem and recommend songs with very few plays.

Audio models are used to analyse data from raw audio tracks and categorise songs accordingly. This helps the platform evaluate all songs to create recommendations, regardless of coverage online. For instance, if there is a new song released by a new artist on the platform, NLP models might not catch it if coverage online and in social media is low. By utilising song data from audio models, however, the collaborative filtering model will be able to analyse the track and recommend it to similar users alongside other more popular songs.

Spotify also uses outlier detection to differentiate between things a user actually likes. For example, a song of a different genre with 1 or 2 plays is likely to be termed as an outlier and not be included in the generation of the playlists as this might just be a misclick. This makes sure my weekly playlist won’t get filled with K-pop songs instead of Anjunabeats releases because my sister decided to experiment using my account 😄.

Music Monetisation

Photo by Samuel Ramos on Unsplash
Photo by Samuel Ramos on Unsplash

Another quite exciting area where AI has laid its roots is Music Monetisation. AI is currently helping a lot of undiscovered artists get signed to big record labels and vice-versa. This is happening in many ways such as Affordable Audio Mastering which is a crucial aspect in the field of Electronic Dance Music production or through A&R (Artist and Repertoire) Discovery where an algorithm is used to review social, streaming and touring data to find promising talent.

LANDR is one of the companies streamlining the process of mixing and mastering for local and underground producers. It uses ML algorithms trained on the standard steps a sound engineer uses to master music. Mastering of tracks is free for MP3 tracks up to 192 kbps in size!. An instant preview of the mastered track is also available after the song is uploaded.



To sum up, we saw how the relationship between Music and Artificial Intelligence started through Neural Networks. We then explored some fantastic technologies in the field of music generation. We learned how Spotify is best at what it does. The above article also explained how AI is helping independent artists to cross the massive hurdle of getting their songs published with mastering. All in all, the next time when your playlist isn’t corrupted because your sibling played K-pop, you know whom to thank! 😂.


The Medium Article - Click Me.

35 views0 comments

Recent Posts

See All


bottom of page