Recommendation Engines
Recommendation Engines
Recommendation Engines: The Problem we want to Solve
Recommendation Engines: The Problem we want to Solve
Example: Netflix wants its users to enjoy movies … but the TV screen can only display a small number of movies …
Example: Netflix wants its users to enjoy movies … but the TV screen can only display a small number of movies …
◼
How can Netflix ensure that users enjoy the movie they watch?
◼
If you leave it for the users to pick …
◼
either they will have to scroll and search a lot (poor experience)
◼
or they might quickly choose a bad movie (poor experience).
◼
Netflix wants to optimize user experience by predicting movies
users will like … and recommending them to users.
users will like … and recommending them to users.
People don’t know what they want until you show it to them.
-Steve JobsWhich Companies Care about this Problem?
Which Companies Care about this Problem?
Others
Others
◼
Goodreads
◼
Yelp
How would you solve this problem?
How would you solve this problem?
One Idea
One Idea
Out[]=
◼
◼
◼
Some Hurdles in Designing Recommendation Engines
Some Hurdles in Designing Recommendation Engines
◼
Say Alice watched W, X, Y … Bob watched X, Y, Z … and now Steve is a new user who has watched X and Y …
◼
What would you recommend to Steve?
◼
Would you take some average of W and Z? What does that mean?
◼
If Steve watched Terminator, Matrix, and Bourne Identity … are you only going to recommend action movies?
◼
Are you sure Steve may not like comedy? Or Sci-Fi?
◼
When you are starting out as a company, you don’t have much user data … what do you do?
◼
How do you know your recommendation worked well or not?
Quick Foundation: Vector Spaces
Quick Foundation: Vector Spaces
Visualizing entities as numbers in vector spaces.
Single numbers (1D)
Single numbers (1D)
{1,4,6,3,19,7}
In[]:=
NumberLinePlot[{1,4,6,3,19,7}]
Out[]=
Pairs of numbers (2D)
Pairs of numbers (2D)
{{1,3},{3,4},{5,6},{8,10}}
In[]:=
ListPlot[{{1,3},{3,4},{5,6},{8,10}}]
Out[]=
{{41.837,-87.681},{39.763,-89.670},{40.115,-88.273}}
In[]:=
GeoListPlot[GeoPosition/@{{41.837,-87.681},{39.763,-89.670},{40.115,-88.273}}]
Out[]=
Triplets (3D)
Triplets (3D)
Out[]//InputForm=
{RGBColor[0.030009094449084506, 0.9458743515708412,
0.5349437361178879], RGBColor[0.45707409799586807,
0.8302259318744558, 0.5787578013378236],
RGBColor[0.054177783055553874, 0.5876155409685255,
0.3283893340070769], RGBColor[0.5054257846493777,
0.07545391475387686, 0.40012318676180625],
RGBColor[0.12168482542261927, 0.7770910870900012,
0.4562468991341806], RGBColor[0.9859967340208537,
0.21309127999083577, 0.1321895946407079],
RGBColor[0.11292691785732512, 0.25649237153334803,
0.005605027142021157], RGBColor[0.14820748332595746,
0.3936331741052075, 0.7080861005922827],
RGBColor[0.1429131750858066, 0.31930070669069477,
0.8171823816627646], RGBColor[0.6952038516304551,
0.9671864021267482, 0.7103507962458251]}
0.5349437361178879], RGBColor[0.45707409799586807,
0.8302259318744558, 0.5787578013378236],
RGBColor[0.054177783055553874, 0.5876155409685255,
0.3283893340070769], RGBColor[0.5054257846493777,
0.07545391475387686, 0.40012318676180625],
RGBColor[0.12168482542261927, 0.7770910870900012,
0.4562468991341806], RGBColor[0.9859967340208537,
0.21309127999083577, 0.1321895946407079],
RGBColor[0.11292691785732512, 0.25649237153334803,
0.005605027142021157], RGBColor[0.14820748332595746,
0.3936331741052075, 0.7080861005922827],
RGBColor[0.1429131750858066, 0.31930070669069477,
0.8171823816627646], RGBColor[0.6952038516304551,
0.9671864021267482, 0.7103507962458251]}
N Dimensions
N Dimensions
{Comedy, Sci Fi, Action}
Instead of three, choose as many dimensions as you want.
Movies in Feature Space
Movies in Feature Space
{Comedy, Tragedy, Informative, Action, SciFi, Romance, Historic, Apocalyptic, ..... Strong Female Protagonist, ..... Year in which it was created, Won an Oscar,
People in Feature Space
People in Feature Space
Three Main Types of Recommendation Engine Techniques
Three Main Types of Recommendation Engine Techniques
◼
Content based filtering
◼
Collaborative filtering
◼
Hybrid techniques
CONTENT BASED FILTERING
CONTENT BASED FILTERING
MOVIES IN FEATURESPACE
- Convert all movies into a point in a “feature space”
- Mark Alice’s already-watched movies in that same “feature space”
- Find movies in the “neighborhood” of Alice’s already-watched movies.
- Mark Alice’s already-watched movies in that same “feature space”
- Find movies in the “neighborhood” of Alice’s already-watched movies.
PEOPLE IN FEATURESPACE
- Convert each person into a point in a "feature space"
- Find other people in the "neighborhood" of Alice
- Recommend movies they have watched
- Find other people in the "neighborhood" of Alice
- Recommend movies they have watched
COLLABORATIVE FILTERING
COLLABORATIVE FILTERING
- Design M representative users — called EIGEN USERS
- Express any new user as weighted combination of eigen users.
- Derive the recommendation from these weights.
- Express any new user as weighted combination of eigen users.
- Derive the recommendation from these weights.
Social Implications (Privacy, Bias, Fairness …)
Social Implications (Privacy, Bias, Fairness …)
◼
Companies need data for content-based or collaborative filtering. Where are they getting the data?
- Cookies in your browser
- Your visited websites
- Your shopping patterns
- Your search queries in the Internet
- Cookies in your browser
- Your visited websites
- Your shopping patterns
- Your search queries in the Internet
◼
This data is feeding recommendation engines … but also leaking a lot of information about you to the Internet.
◼
What if tomorrow, a Government says … you have been eating junk food, so we are revoking your medical insurance!!
◼
Companies using data for shortlisting candidates for a job …
- Suppose the intelligent algorithm uses data from the past candidates who were, or were not, recruited.
- Trains the eigen users from this data
What’s the problem?
- Suppose the intelligent algorithm uses data from the past candidates who were, or were not, recruited.
- Trains the eigen users from this data
What’s the problem?
◼
What kind of other biases can you think of … when data is used to create the “representative” samples … the EIGENITEMS ? Are there other biases or issues with fairness?