How Netflix Uses Machine Learning To Decide What Content To Create Next For Its 260M Users
Learn how metadata, pre-training and embeddings drive decision-making at Netflix
Get a list of personally curated and freely accessible ML, NLP, and computer vision resources for FREE on newsletter sign-up.
To read more on this topic see the references section at the bottom. Consider sharing this post with someone who wants to study machine learning.
What started as a mail-based DVD movie rental [6] has transformed into a multi-billion-dollar business.
Netflix streams billions of hours (that’s a lot of popcorn) of content monthly across 190 countries.
Netflix has something for everyone: series, movies, anime, you name it.
Making Hit Shows is Hard
Netflix creates content at an unprecedented scale. From movies to series: they release thousands of hours of content across hundreds of titles each year under “Netflix Originals”. But creating something great is expensive! Each project competes for budget and talented people.
The big question? Greenlight or axe the project? Choosing wrong could mean missing out on the next "Squid Game" or “Stranger Things”. That's a huge bummer not only for the producers but also for us viewers!
To make well-informed decisions, Netflix uses data and machine learning to predict a project's success before they film it. This helps them make informed decisions and hopefully avoid ditching the next big hit.
1. Scoping The Problem
Before machine learning (ML) systems were in place, picking what content to make next relied on gut instinct and past trends. Think "Spaghetti westerns are a hit, let's make another one!". Machine Learning goes way beyond the obvious. It analyzes data to discover non-obvious patterns. This is then used to answer key questions [3] and make the decision about a potential project:
Similar Movies and Shows: What are similar movies or series to the candidate project? Is this the next Stranger Things or a forgotten B-movie?
Regional Appeal: Predicting audience sizes across demographics and geographic locations. Will teens in Tokyo love it as much as families in France?
So let’s look at the ML building blocks Netflix uses.
2. Learning Meaningful Embeddings
For each movie and series in its catalog, Netflix has a database of interesting metadata attributes. The genre, runtime, tags, whether it is a series or a movie, plot summary, theme, and additional tags are some interesting metadata.
Once all this is collected, Netflix pre-trains multiple models to predict each of these metadata using the title. They pre-train models on different tasks:
Model #1 is a classification model: Input: Title → Output: Genre
Model #2 is a regression model: Input: Title → Output: Runtime in hours
Model #3 is a classification model: Input: Title → Output: Movie or Series
…
Model #N is a regression model: Input: Title → Output: Box office collection (in millions of dollars)
The output of an intermediate layer of a model is called an embedding. For a given title, embeddings from all the models can be extracted and used as input features to downstream tasks.