Netflix Recommender System — A Big Data Case Study

The story behind Netflix’s famous Recommendation System

Published in

Towards Data Science

17 min read Jun 28, 2020

What is Netflix and what do they do?

Netflix is a media service provider that is based out of America. It provides movie streaming through a subscription model. It includes television shows and in-house produced content along with movies. Initially, Netflix used to sell DVDs and functioned as a rental service by mail. They have discontinued selling DVDs a year later but continued their rental service. In 2010, they went online and started a streaming service. Since then Netflix has grown to be one of the best and largest streaming services in the world (Netflix,2020).

Netflix has taken up an active role in producing movies and TV shows. The company is heavily data-driven. Netflix lies in the middle of the internet and storytelling. They are inventing new internet television. Their main source of income comes from users’ subscription fees. They allow users to stream data from a wide range of their movies and TV shows at any time on a variety of internet-connected services (Gomez-Uribe et. al., 2016).

What is the domain (subject matter area) of their study ?

The primary asset of Netflix is their technology. Especially their recommendation system. The study of the recommendation system is a branch of information filtering systems (Recommender system, 2020). Information filtering systems deal with removing unnecessary information from the data stream before it reaches a human. Recommendation systems deal with recommending a product or assigning a rating to item. They are mostly used to generate playlists for the audience by companies such as YouTube, Spotify, and Netflix. Amazon uses recommender systems to recommend products to its users. Most of the recommender systems study users by using their history. Recommender systems have two primary approaches. They are collaborative filtering or content-filtering. Collaborative filtering relies on the concept that people who liked something in the past would also like the same experience in the future. Contentbased filtering methods are useful in places where information is known about the item but not about the user. It functions as a classification task-specific to the user. It models a classifier to model the likes and dislikes of the user concerning the characteristics of an item.

Why did they want/need to do a big data project ?

Netflix’s model has changed from renting/selling DVDs to global streaming in a year (Netflix Technology Blog, 2017a). Unlike cable TV, internet TV is all about choice. Netflix wanted to help viewers by choosing among numerous options available to them through their streaming service. Cable TV is very rigid with respect to geography. However, a broad range of items is available on the catalog of internet TV with pieces from different genres, from different demographics to appeal to people of different tastes.

The recommendation problem while selling DVDs was predicting the number of stars a user would give the DVD that ranges from 1 star to 5 stars. That was the only task they concentrated heavily upon as that was the only thing, they would receive from a member who has already watched the video. They would not have any idea about the viewing experience, statistics and get no feedback during viewing. When Netflix turned into a streaming service, they have huge access to activity data of its members. This includes their details associated with the device, the time of the day, the day of the week and the frequency of watching. As the number of people subscribing and watching Netflix grew, the task became a big data project.

What questions did they want to answer ?

Netflix is all about recommending the next content to its user. The only question they would like to answer is ‘How to personalize Netflix as much as possible to a user?’. Though it is a single question, it is almost everything Netflix aims to solve. Recommendation is embedded in every part of their site.

Recommendation starts when you log into Netflix. For example, the first screen you see after you log in consists of 10 rows of titles that you are most likely to watch next. Awareness is another important part of their personalization. They let their audience know how they are adapting to their tastes. They want their customers to give them feedback while also developing trust in their system. They give explanations as to why they think you would watch a particular title. They use phrases like ‘Based on your interest in …’, ‘Your taste preferences created this row’ etc. Similarity is another part of personalization.

Netflix conceptualizes similarity in a broad sense such as the similarity between movies, members, genres, etc. It uses phrases such as ‘Similar titles to watch instantly’, ‘More like …’ etc. Search is also one of the important aspects of the Netflix recommendation system.

Data Sources:

According to (Netflix Technology Blog, 2017b), the data sources for the recommendation system of Netflix are:

What is the size of the data in the study? That is, approximately how much data storage was required ?

Netflix ran a huge contest from 2006 to 2009 asking people to design an algorithm that can improve its famous in-house recommender system ‘Cinematch’ by 10%. Whoever gave the best improvements would be awarded a $1 million. The size of the data set presented to the users was 100 million user ratings. The dataset consisted of 100,480,507 ratings that 480,189 users gave to 17,770 movies. In 2009, the prize was awarded to a team named BellKor’s Pragmatic Chaos. Netflix has since stated that the algorithm was scaled to handle its 5 billion ratings (Netflix Technology Blog, 2017a). Hence, the size of the dataset for the recommender system of Netflix is believed to consist of information of all its titles which are more than 5 billion in number.

What data access rights, data privacy issues, what data quality issues were encountered ?

As mentioned in (Netflix Prize, 2020), though Netflix has tried to anonymize its dataset and protect user’s privacy, a lot of privacy issues arose around the data associated with Netflix competition. In 2007, researchers at the University of Austin were able to figure out the users in the anonymous Netflix dataset by matching their ratings on the Internet Movie Database. In 2009, Four people related to this issue filed a lawsuit against Netflix for the violation of the United States’ fair trade laws and the Video Privacy Protection Act. Following this, Netflix has canceled its competition for 2010 and thereafter.

What organizational (non-technical) challenges did they face ?

As per (Maddodi et al., 2019), during the preliminary days, Netflix suffered large loss however with the boost of internet users and Netflix changed its commercial enterprise model from conventional DVD condo and income to the advent of online video streaming in 2007. Netflix has smartly anticipated the arrival of its competitors like Disney and Amazon and hence invested heavily in Data Science from a very early stage. A majority of those efforts are still paying off Netflix and allowing it to be at the forefront of the media streaming industry.

What technical challenges did they face ?

Some of the challenges the team faced technically while building the system were (Töscher et al., 2009):

With respect to search service related to recommendations, in a paper published by Netflix Engineers (Lamkhede et al., 2019), the challenges mentioned were:

Why was this a “big data” problem?

Volume: As of May 2019, Netflix has around 13,612 titles (Gaël, 2019). Their US library alone consists of 5087 titles. As of 2016, Netflix has completed its migration to Amazon Web Services. Their data of tens of petabytes of data was moved to AWS (Brodkin et al., 2016). It consists of their engineering data, corporate data, and other documentation. From (AutomatedInsights, n.d), it can be calculated approximately that Netflix stores approximately 105TB of data with respect to videos alone. However, their dataset for the recommendation algorithms is expected to be very large as it needs to incorporate all the information mentioned above. Focusing only on the Netflix Prize task, the data given to the users is around 2GB. It consists of only 100 million movie ratings. At that time, Netflix admitted that it had 5 billion ratings. Roughly, it translates to 10,000 GB of rating data alone. The size today would be greater than the mentioned figure.

Velocity: By the end of 2019, Netflix has 1 million subscribers and 159 million viewers (BuisinessofApps, 2020). Every time a viewer watches something on Netflix, it collects usage statistics such as viewing history, ratings over titles, other people who have similar tastes, preferences related to their service, information related to titles such as actors, genres, directors, year of release, etc. In addition, they also collect data about the time of the data, the types of devices you watch content on, the duration of your watch (Netflix, n.d.). On average each Netflix subscriber watches 2 hours of video content per day (Clark, 2019). Though all the features are not explicitly stated anywhere, Netflix is believed to collect a large set of information from its users. On average Netflix streams around 2 million hours of content each day.

Veracity: Veracity consists of bias, noise, and abnormalities in data. With respect to the Netflix Prize challenge, there was a wide variance observed in data. Not all movies were rated equally by an individual. One movie had only 3 ratings whereas a single user rated over 17,000 movies (Töscher et al., 2009). With the type and the amount of information, Netflix data would definitely contain a lot of abnormalities, bias, and noise.

Variety: Netflix says it collects most of the data in a structured format such as time of the day, duration of watch, popularity, social data, search-related information, stream related data, etc. However, Netflix could also be using unstructured data. Netflix has been very outspoken about the thumbnail pictures that it uses for personalization. This means that the thumbnails for the video are different for different people even for the same video. So, it could be dealing with images and filters.

Who are the people/organizations with an interest in the conduct and outcome of the study?

The primary stakeholders of Netflix are its subscribers and viewers. They are the ones who would be directly affected by the actions of this project. Netflix recommender system has been very successful for the company and has been a major factor in boosting the subscriber numbers and the viewers. The secondary stakeholders are its employees, with respect to the task, the secondary stakeholders are the research team of Netflix who are directly involved with the development and maintenance if the algorithm and the system. Competitors such as Amazon, Hulu, Disney+, Sony, HBO, etc are also showing a major interest in the conduct and outcome of Netflix’s experiments. After all, they are the ones who produce movies. Why would they want intermediaries like Netflix to take away the share? Many of them have started streaming their content by launching their own platforms but Netflix has been on the top of the game by investing significantly in data and algorithms since the very beginning.

What HW/SW resources did they use to conduct the project?

In order to build a recommender system and perform large scale analytics, Netflix invested a lot in hardware and software. Netflix presented an architecture of how it handles the task (Basilico, 2013).

There are three stages of how it performs recommendation. From (Netflix Technology Blog, 2017c), offline computation is applied to data and it is not concerned with real-time analytics at the user. Execution time is relaxed, and the algorithm is trained in batches without any pressure on the amount of data to be processed in a fixed time interval. But it needs to be trained frequently to incorporate the latest information. Tasks such as model training and batch computation of results are performed offline. Because they deal with a lot of data, it would be beneficial to run them in Hadoop through Pig or Hive. The results must be published and be supported by not just HDFS but other databases such as S3 and Cassandra. For this, Netflix developed an in-house tool called Hermes. It is also a publish-subscribe framework like Kafka, but it provides additional features such as ‘multi-DC support, a tracking mechanism, JSON to Avro conversion, and a GUI called Hermes console’ (Morgan, 2019). They wanted a tool to effectively monitor, alert and handle errors transparently. At Netflix, the nearline layer consists of results from offline computation and other intermediate results. They use Cassandra, MySQL, and EVCache. The priority is not how much of the data is to be stored by how to store it in the most efficient manner. The real-time event flow in Netflix is supported by a tool called as Manhattan that was developed inhouse. It’s very close to Twitter’s Storm but it meets different demands depending on the internal requirements. The flow of the data is managed by logging in Chukwa to Hadoop. Netflix heavily relies on Amazon Web Services to meet its hardware requirements. More specifically they use EC2 instances that are readily scalable and almost fault-tolerant. All their infrastructure runs on AWS in the cloud.

What people/expertise resources did they need to conduct the project?

Netflix invests heavily in Data Science. They are a data-driven company that uses data analytics for decision making at almost every level. According to (Vanderbilt, 2018), there are around 800 Netflix Engineers who work in Silicon Valley headquarters. Netflix also hires some of the brightest talents and the average salary for a data scientist is very high. It has Engineers with expertise in Data Engineering, Deep Learning, Machine Learning, Artificial Intelligence, and Video Stream Engineering.
With respect to the Netflix Prize challenge, the winning team ‘BellKor’s Pragmatic Chaos’ consisted Andreas Toscher and Michael Jahrer (BigChaos), Robert Bell, Chris Volinsky (AT&T), Yehuda Koren (Yahoo) (team BellKorr) and Martin Piotte, Martin Chabbert (Pragmatic Theory).

What processes and technology did they need?

Apart from the Engineering technology mentioned above, a paper from Netflix Engineers, CARLOS A. GOMEZ-URIBE and NEIL HUNT (Gomez-Uribe et. al., 2016) state that their recommendation system uses supervised approaches such as classification and regression and unsupervised approaches such as dimensionality reduction and clustering/compression using topic modeling. Matrix factorization, Singular Value Decomposition, factorization machines, connections to probabilistic graphical models and methods that can be easily expanded to be tailored for different problems.
With respect to the Netflix Prize challenge, 107 algorithms were used as an ensembling technique to predict a single output. Matrix factorization, Singular Value Decomposition, Restricted Boltzman Machines are some of the most important techniques that gave good results.

What was the approximate project schedule/duration?

According to (Netflix Technology Blog, 2017a), the Engineers who solved the Netflix task have reported that more than 2000 hours of work were required to build an ensemble of 107 algorithms that got them the prize. Netflix has taken its source code and worked to overcome its limitations such as scaling them from 100 million ratings to 5 billion ratings.

What results/answers were achieved? What value to the organization and to the stakeholders was obtained as a result of the project?

As mentioned in (Gomez-Uribe et. al., 2016),

With respect to the Netflix Prize task, the winning algorithm was able to increase the predicting ratings and improved ‘Cinematch’ by 10.06% (Netflix Prize, 2020). According to (Netflix Technology Blog, 2017b), Singular Value Decomposition was able to reduce the RMSE to 89.14% whereas Restricted Boltzmann Machines helped in reducing RMSE to 89.90%. Together, they have reduced the RMSE to 88%.

Was the project successful?

Investing in data science technology has helped Netflix to be the best in the video streaming industry. Personalization and recommendation save $1 billion a year for the company. Also, it is one of the important factors in attracting new subscribers to the platform. Also, with respect to the winning algorithm from the Netflix Prize competition, many of its components are still being used today in its recommendation system (Netflix Technology Blog, 2017b). Hence, the project can be regarded as successful.

Were there any surprises discovered?

As per (Töscher et al., 2009), they have surprisingly discovered binary information which can be understood as the fact that people do not select and rate movies at random. Surprisingly one-day day effect was very strongly observed in the dataset. This could either be due to multiple people using the same account or different moods of a single person.

What lessons were learned from conducting the project?

Ensembling techniques deliver good results. Instead of refining a single technique, multiple techniques were combined to predict a single outcome.
Training models and tuning them individually does not deliver optimal results. The results are best when the whole ensembling method has a precise tradeoff between diversity and accuracy. A lot of open research has been contributed to the domain of collaborative filtering and competitions such as Netflix Prize can promote such open ideas and research.

What specific actions were taken as a result of the project?

As a result of the competition, Netflix has revamped the winning code to scale from 100 million ratings to 5 billion ratings (Netflix Technology Blog, 2017b). It even uses the code from the winning project until today in its most advanced recommender system. Netflix owes its success in the video streaming industry to the project and its further research and continuous development.

How could the project have been improved?

The procedure and the steps for A/B testing can be improved by including the evaluation through circumstances rather than algorithmic. It can use reinforcement algorithms to provide recommendations to users as opposed to the traditional methodology of recommendation systems. The reward can be user satisfaction, the state can be the current content and the action can be the next best content recommendation.

Definitions for Complex Terms:

RMSE (Root Mean Square Error): It measures how far the data points are from the regression line. It can be used to understand the spread of the residuals. It is calculated by taking the square root of the means of error squares.

A/B testing: The A/B testing is a statistical process to check the validity of your test. In the first step, a hypothesis is proposed. In the second step, statistical pieces of evidence are collected to accept or reject the hypothesis. In the third step, the data is analyzed to conclude about the correctness of the hypothesis.

Restricted Boltzmann Machines: It’s an artificial neural network that has the ability to learn the underlying probability distribution given a set of inputs. It can be used in both supervised and unsupervised learning. A lot of applications are found in classification, recommendation engines, topic modeling, etc.

EC2: The term EC2 stands for Elastic Compute Cloud. It is one of the important parts of the Amazon Cloud Computing platform. Any company can deploy its service/application over EC2 machines and get them running within a short period of time.

Hadoop: Hadoop makes distributed computing possible by providing a set of software and tools. It works on the principle of Map Reduce for the storage and processing of Big Data. Many companies today use Hadoop for large scale data processing and analytics today.

HDFS: It stands for Hadoop Distributed File System. It is one of the core components of the Hadoop ecosystem which functions as a storage system. It works on the principles of MapReduce. It can provide high bandwidth along with the cluster.

References
AutomatedInsights. (n.d.). Netflix Statistics: How Many Hours Does the Catalog Hold. Retrieved April
12, 2020, from https://automatedinsights.com/blog/netflix-statistics-how-many-hours-does
catalog-hold

Basilico, J. (2013, October 13). Recommendation at Netflix Scale. Retrieved April 12, 2020, from
https://www.slideshare.net/justinbasilico/recommendation-at-netflix-scale

Brodkin, J., & Utc. (2016, February 11). Netflix finishes its massive migration to the Amazon cloud.
Retrieved April 12, 2020, from https://arstechnica.com/information-technology/2016/02/netflix
finishes-its-massive-migration-to-the-amazon-cloud/

BuisinessofApps. (2020, March 6). Netflix Revenue and Usage Statistics. Retrieved April 12, 2020, from
https://www.businessofapps.com/data/netflix-statistics/

Clark, T. (2019, March 13). Netflix says its subscribers watch an average of 2 hours a day — here’s how
that compares with TV viewing. Retrieved April 12, 2020, from
https://www.businessinsider.com/netflix-viewing-compared-to-average-tv-viewing-nielsen-chart
2019–3

Figure 1. System Architecture for Personalization and Recommendations at Netflix. (2013). System
Architectures for Personalization and Recommendation [Digital Image], by Netflix Technology
Blog. Retrieved April 12, 2020, from https://netflixtechblog.com/system-architectures-for
personalization-and-recommendation-e081aa94b5d8.

Gaël. (2019, May 14). How Many Titles Are Available on Netflix in Your Country? Retrieved April 12,
2020, from https://cordcutting.com/blog/how-many-titles-are-available-on-netflix-in-your
country/

Gomez-Uribe, C. A., & Hunt, N. (2016). The Netflix Recommender System. ACM Transactions on
Management Information Systems, 6(4), 1–19. doi: 10.1145/2843948

Lamkhede, S., & Das, S. (2019). Challenges in Search on Streaming Services. Proceedings of the 42nd
International ACM SIGIR Conference on Research and Development in Information Retrieval —
SIGIR19. doi: 10.1145/3331184.3331440

Maddodi, S., & K, K. P. (2019). Netflix Bigdata Analytics- The Emergence of Data Driven
Recommendation. SSRN Electronic Journal. doi: 10.2139/ssrn.3473148

Morgan, A. (2019, May 20). Allegro Launches Hermes 1.0, a REST-based Message Broker Built on Top
of Kafka. Retrieved April 12, 2020, from https://www.infoq.com/news/2019/05/launch-hermes-1/

Netflix Technology Blog. (2017a, April 18). Netflix Recommendations: Beyond the 5 stars (Part 1).
Retrieved April 12, 2020, from https://netflixtechblog.com/netflix-recommendations-beyond-the
5-stars-part-1–55838468f429

Netflix Technology Blog. (2017b, April 18). Netflix Recommendations: Beyond the 5 stars (Part 2).
Retrieved April 12, 2020, from https://netflixtechblog.com/netflix-recommendations-beyond-the
5-stars-part-2-d9b96aa399f5

Netflix Technology Blog. (2017c, April 18). System Architectures for Personalization and
Recommendation. Retrieved April 12, 2020, from https://netflixtechblog.com/system
architectures-for-personalization-and-recommendation-e081aa94b5d8

Netflix. (2020, April 10). Retrieved April 12, 2020, from https://en.wikipedia.org/wiki/Netflix

Netflix. (n.d.). How Netflix’s Recommendations System Works. Retrieved April 12, 2020, from
https://help.netflix.com/en/node/100639

Recommender system. (2020, April 10). Retrieved April 12, 2020, from
https://en.wikipedia.org/wiki/Recommender_system

Töscher, A., Jahrer, M., & Bell, R. M. (2009). The BigChaos Solution to the Netflix Grand Prize. Netflix
prize documentation, 1–52.