How Netflix Uses Machine Learning To Decide What Content To Create Next For Its 260M Users
Learn how metadata, pre-training and embeddings drive decision-making at Netflix
Get a list of personally curated and freely accessible ML, NLP, and computer vision resources for FREE on newsletter sign-up.
To read more on this topic see the references section at the bottom. Consider sharing this post with someone who wants to study machine learning.
What started as a mail-based DVD movie rental [6] has transformed into a multi-billion-dollar business.
Netflix streams billions of hours (that’s a lot of popcorn) of content monthly across 190 countries.
Netflix has something for everyone: series, movies, anime, you name it.
Making Hit Shows is Hard
Netflix creates content at an unprecedented scale. From movies to series: they release thousands of hours of content across hundreds of titles each year under “Netflix Originals”. But creating something great is expensive! Each project competes for budget and talented people.
The big question? Greenlight or axe the project? Choosing wrong could mean missing out on the next "Squid Game" or “Stranger Things”. That's a huge bummer not only for the producers but also for us viewers!
To make well-informed decisions, Netflix uses data and machine learning to predict a project's success before they film it. This helps them make informed decisions and hopefully avoid ditching the next big hit.
1. Scoping The Problem
Before machine learning (ML) systems were in place, picking what content to make next relied on gut instinct and past trends. Think "Spaghetti westerns are a hit, let's make another one!". Machine Learning goes way beyond the obvious. It analyzes data to discover non-obvious patterns. This is then used to answer key questions [3] and make the decision about a potential project:
Similar Movies and Shows: What are similar movies or series to the candidate project? Is this the next Stranger Things or a forgotten B-movie?
Regional Appeal: Predicting audience sizes across demographics and geographic locations. Will teens in Tokyo love it as much as families in France?
So let’s look at the ML building blocks Netflix uses.
2. Learning Meaningful Embeddings
For each movie and series in its catalog, Netflix has a database of interesting metadata attributes. The genre, runtime, tags, whether it is a series or a movie, plot summary, theme, and additional tags are some interesting metadata.
Once all this is collected, Netflix pre-trains multiple models to predict each of these metadata using the title. They pre-train models on different tasks:
Model #1 is a classification model: Input: Title → Output: Genre
Model #2 is a regression model: Input: Title → Output: Runtime in hours
Model #3 is a classification model: Input: Title → Output: Movie or Series
…
Model #N is a regression model: Input: Title → Output: Box office collection (in millions of dollars)
The output of an intermediate layer of a model is called an embedding. For a given title, embeddings from all the models can be extracted and used as input features to downstream tasks.
The model can be broken down into 2 parts: the encode and the decoder. The encoder takes the input and uses a stack of layers to output the intermediate representation i.e. the embeddings. The decoder then takes the intermediate representation and predicts the final output of the model.
To generate feature-rich embeddings, the pre-trained models are designed to have a powerful encoder that generates the feature-rich embeddings and a simple decoder on top that performs the pre-training task. This forces the embeddings to be sufficiently discriminative for a simple decoder to perform well on the pre-training task.
Choice Of Loss Functions
Classification models are optimized for the negative log-likelihood (NLL) loss [8] e.g. to classify the genre. Regression models are optimized for mean squared error (MSE) loss e.g. to predict the runtime in hours.
Encoding Text
For text features such as the plot summary or the theme, Netflix uses BERT [1] as an encoder.
Encoding Categorical Variables
For tags and countries, an embedding module [5] is used. This is a “trainable” dictionary that maps to a learned vector. e.g. {“France”: [0.3, 0.95, …, 0.84], …, “Italy”: [0.54, 0.32, …, 0.12]}. The vectors are parameters and optimized during training.
Note: Netflix learns separate models for each metadata attribute for explainability reasons. Alternatively, they could use a multi-task learning variant of this setup where a single model predicts all the metadata attributes.
3. Using Transfer Learning For Downstream Tasks
Netflix uses transfer learning1 [2] to leverage the feature-rich embeddings from the pre-trained models. These are now used to solve the 2 downstream we mentioned previously.
… used to answer key questions and make the decision about a potential project:
Similar Movies and Shows: What are similar movies or series to the candidate project? Is this the next Stranger Things or a forgotten B-movie?
Regional Appeal: Predicting audience sizes across demographics and geographic locations. Will teens in Tokyo love it as much as families in France?
(a) Similarity Search In Embeddings Space
Using the pre-training scheme outlined above Netflix generates embeddings of the candidate title in high-dimensional space. Since the pre-training is performed on related tasks, this embedding space is highly semantic and organizes itself in clusters of similar items.
This allows Netflix to see in which cluster (in this rich embedding space) would a candidate title fall. Is the project an action movie or is it more of a comedy? Not only that, it puts the project in perspective with existing projects.
(b) Predicting Audience Size
It is extremely important to know ahead of time for what demographics and geographic regions a certain project will see success.
Netflix can then plan ahead of time on:
how to market in different age groups and locations [4],
which languages should the project be dubbed and subtitled in,
what visuals and assets to use for different age groups, etc.
To predict the audience size, Netflix uses supervised learning. The features include:
embedding of the project.
embedding of the geographic location/country.
distribution of audience sizes for past projects similar to the candidate.
Outro
Answering these key questions helps drive Netflix’s decision-making. Netflix pre-trains models to generate meaningful intermediate representations called embeddings. The embeddings are used to train for downstream tasks. This helps Netflix avoid missing out on hidden gems and greenlight shows you'll love.
Did you know Netflix also uses recommender systems [7] another core concept in machine learning? This is used to personalize your experience and help you find your next favorite shows. Netflix generates recommendations based on your and other user preferences. Stay tuned for future posts where we dive deeper into how Netflix personalizes your experience.
Consider subscribing to get it straight into your mailbox:
Continue reading more:
References
[1] Transfer Learning: https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
[2] BERT: https://huggingface.co/docs/transformers/model_doc/bert
[3] Supporting content decision makers with machine learning: https://netflixtechblog.com/supporting-content-decision-makers-with-machine-learning-995b7b76006f
[4] Data Science and the Art of Producing Entertainment at Netflix: https://netflixtechblog.com/studio-production-data-science-646ee2cc21a1
[5] Embeddings: https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html
[6] Netflix: https://en.wikipedia.org/wiki/Netflix
[7] Recommender Systems: https://www.nvidia.com/en-us/glossary/recommendation-system/#:~:text=A%20recommendation%20system%20(or%20recommender,exponentially%20growing%20number%20of%20options
[8] Negative log-likelihood: https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html
Consider sharing this newsletter with somebody who wants to learn about machine learning:
Imagine a student who aces math class. They can probably do well in physics too, right? ML uses the same idea. It takes knowledge from one task (like predicting movie genres) and applies it to another related task (like predicting audience size).