Recommendation Engines

Recommendation Engines: The Problem we want to Solve

Example: Netflix wants its users to enjoy movies … but the TV screen can only display a small number of movies …

◼
  • How can Netflix ensure that users enjoy the movie they watch?
  • ◼
  • If you leave it for the users to pick …
  • ◼
  • either they will have to scroll and search a lot (poor experience)
  • ◼
  • or they might quickly choose a bad movie (poor experience).
  • ◼
  • Netflix wants to optimize user experience by predicting movies
    users will like … and recommending them to users.
  • People don’t know what they want until you show it to them.
    ​-Steve Jobs

    Which Companies Care about this Problem?

    Others

    ◼
  • Goodreads
  • ◼
  • Yelp
  • How would you solve this problem?

    One Idea

    Out[]=
    ◼
  • ​
  • ◼
  • ​
  • ◼
  • ​
  • Some Hurdles in Designing Recommendation Engines

    ◼
  • Say Alice watched W, X, Y … Bob watched X, Y, Z … and now Steve is a new user who has watched X and Y …
  • ◼
  • What would you recommend to Steve?
  • ◼
  • Would you take some average of W and Z? What does that mean?
  • ◼
  • If Steve watched Terminator, Matrix, and Bourne Identity … are you only going to recommend action movies?
  • ◼
  • Are you sure Steve may not like comedy? Or Sci-Fi?
  • ◼
  • When you are starting out as a company, you don’t have much user data … what do you do?
  • ◼
  • How do you know your recommendation worked well or not?
  • Quick Foundation: Vector Spaces

    Visualizing entities as numbers in vector spaces.

    Single numbers (1D)

    {1,4,6,3,19,7}
    In[]:=
    NumberLinePlot[{1,4,6,3,19,7}]
    Out[]=

    Pairs of numbers (2D)

    {{1,3},{3,4},{5,6},{8,10}}
    In[]:=
    ListPlot[{{1,3},{3,4},{5,6},{8,10}}]
    Out[]=
    {{41.837,-87.681},{39.763,-89.670},{40.115,-88.273}}
    In[]:=
    GeoListPlot[GeoPosition/@{{41.837,-87.681},{39.763,-89.670},{40.115,-88.273}}]
    Out[]=

    Triplets (3D)

    Out[]//InputForm=
    {RGBColor[0.030009094449084506, 0.9458743515708412,
    0.5349437361178879], RGBColor[0.45707409799586807,
    0.8302259318744558, 0.5787578013378236],
    RGBColor[0.054177783055553874, 0.5876155409685255,
    0.3283893340070769], RGBColor[0.5054257846493777,
    0.07545391475387686, 0.40012318676180625],
    RGBColor[0.12168482542261927, 0.7770910870900012,
    0.4562468991341806], RGBColor[0.9859967340208537,
    0.21309127999083577, 0.1321895946407079],
    RGBColor[0.11292691785732512, 0.25649237153334803,
    0.005605027142021157], RGBColor[0.14820748332595746,
    0.3936331741052075, 0.7080861005922827],
    RGBColor[0.1429131750858066, 0.31930070669069477,
    0.8171823816627646], RGBColor[0.6952038516304551,
    0.9671864021267482, 0.7103507962458251]}

    N Dimensions

    {Comedy, Sci Fi, Action}
    Instead of three, choose as many dimensions as you want.

    Movies in Feature Space

    {Comedy, Tragedy, Informative, Action, SciFi, Romance, Historic, Apocalyptic, ..... Strong Female Protagonist, ..... Year in which it was created, Won an Oscar,

    People in Feature Space

    Three Main Types of Recommendation Engine Techniques

    ◼
  • Content based filtering
  • ◼
  • Collaborative filtering
  • ◼
  • Hybrid techniques
  • CONTENT BASED FILTERING

    MOVIES IN FEATURESPACE
    - Convert all movies into a point in a “feature space”
    - Mark Alice’s already-watched movies in that same “feature space”
    - Find movies in the “neighborhood” of Alice’s already-watched movies.
    PEOPLE IN FEATURESPACE
    - Convert each person into a point in a "feature space"
    - Find other people in the "neighborhood" of Alice
    - Recommend movies they have watched

    COLLABORATIVE FILTERING

    - Design M representative users — called EIGEN USERS
    - Express any new user as weighted combination of eigen users.
    - Derive the recommendation from these weights.

    Social Implications (Privacy, Bias, Fairness …)

    ◼
  • Companies need data for content-based or collaborative filtering. Where are they getting the data?
    - Cookies in your browser
    - Your visited websites
    - Your shopping patterns
    - Your search queries in the Internet
  • ◼
  • This data is feeding recommendation engines … but also leaking a lot of information about you to the Internet.
  • ◼
  • What if tomorrow, a Government says … you have been eating junk food, so we are revoking your medical insurance!!
  • ◼
  • Companies using data for shortlisting candidates for a job …
    - Suppose the intelligent algorithm uses data from the past candidates who were, or were not, recruited.
    - Trains the eigen users from this data
    What’s the problem?
  • ◼
  • What kind of other biases can you think of … when data is used to create the “representative” samples … the EIGENITEMS ? Are there other biases or issues with fairness?