WOLFRAM NOTEBOOK

ECE101 Fall 2024 Exam 2

Study Guide

1
. Search Engines

How do Search Engines work?

Think of it as a five-step process.
  • 1. Crawl the web to gather billions of documents (text, images, videos).
  • 2. Organize the documents for fast searching,
  • called indexing the documents (*like the index in a textbook)
  • to create a set S.
  • 3. Order documents in S by decreasing reputation.
  • These three occur before anyone does a search.
    Two More Steps Starting with the Search Phrase P
  • 4. Use phrase P to filter documents in S in order to find relevant set R.
  • 5. Do another round of reordering
  • based on knowledge of user (search history, YouTube preferences, travel, purchases, and so on)
  • to create the list L for display (ads go in front!).
  • What information is being used?

    The companies also use information about
    your IP address (where are you?),
    your browser, and
    more or less anything else that they can deduce by having figured out who you are and consulting their records on your preferences.

    Tracking can have advantages and disadvantages

    Sometimes tracking is really useful. When?
    Weather tracking - today’s tornado warning Personalized experience
    Find deals you like
    You don’t mind providing info ...

    Tracking can be compromising: Lack of privacy

    GDPR

    Europe Has Led the Way in Privacy Regulation: General Data Protection Regulations or GDPR
    Thanks to European privacy laws
    companies are being forced
    to make the data that they have collected
    on a person available to that person
    (and editable, and subject to deletion).

    Ranking pages... By reputation.

    A Page is “Good” if Others Point to It.
    Idea: if a page is important, other pages will link to it.
    Importance: The number of incoming arcs in the Web graph; Links from Important Pages are More Important

    When we count incoming arcs,
    we also want to count
    the “importance” of the pages from which the incoming arc is coming.
    and we should count the “importance” of the pages that link to those pages.
    And so on.

    Page Rank Intuition: People Walking at Random

    g=
    ;
    Imagine a person at every node
    Each person
    chooses an outgoing link at random
    (equal probability, independently of past/future decisions)
    and walks to another node

    Repeat the process many times, then
    count how many people are at a node
    to find the node’s “importance” (rank).
    Page Rank: Expected Number of “People” At a Node
    Some pages include no URLs.
    Anyone at such a node
    can ‘start over’
    by choosing a new node at random instead of going down a link.

    2
    . Recommendation Engines

    Companies that care about the problem:
    - Netflix (movies)
    - Amazon (shopping)
    - Search engines (ranking news items)
    - Spotify (recommending music)
    - Google news (customizing news recommendations) - Yelp (recommending restaurants and services)
    - Goodreads (book recommendations)
    ... many many more

    Vector Spaces

    Represent objects (people, movies, recipe, book, etc.) as data vectors.

    In one dimension:

    In[]:=
    peopleAges={7,11,18,20,21,50,67};
    In[]:=
    NumberLinePlot[peopleAges]
    Out[]=

    In two dimensions:

    In[]:=
    places={{41.8375511`,-87.6818441`},{39.7639077`,-89.6708323`},{40.115057`,-88.2736523`},{40.7523087`,-89.6170968`}};
    In[]:=
    ListPlot[places,LabelingFunction->Callout]
    Out[]=
    In[]:=
    GeoListPlot[Callout[GeoPosition[#],#]&/@places]
    Out[]=

    In three dimensions:

    In[]:=
    colorRGBs={{0.28,0.62,0.43},{0.54,0.078,0.15},{0.56,0.24,0.006},{0.84,0.42,0.19},{0.92,0.45,0.16},{0.3,1.,0.17},{0.46,0.64,0.078},{0.96,0.79,0.56},{0.11,0.1,0.65},{0.29,0.37,0.9},{0.8,0.69,0.13},{0.67,0.83,0.18},{0.18,0.1,1.},{0.98,0.76,0.92},{0.42,0.33,0.82},{0.056,0.94,0.83},{0.6,0.83,0.79},{0.37,0.95,0.16},{0.39,0.15,0.88},{0.86,0.2,0.35},{0.71,0.16,0.68},{0.65,0.044,0.81},{0.98,0.71,0.3},{0.36,0.43,0.84},{0.82,0.94,0.026},{0.87,0.7,0.78},{0.74,0.2,0.97},{0.092,0.84,0.23},{0.62,0.64,0.73},{0.59,0.42,0.52}};
    ListPointPlot3D[Callout[#,RGBColor[#]]&/@colorRGBs,AxesLabel->{Red,Green,Blue}]
    Out[]=

    Visualizing Data in Feature Space

    Reduce data points to feature vectors. For a movie, the features could be: {romance, horror, action, comedy, informational, thriller, drama}
    Plot in n-dimensional feature space and find the movie closest to a another movie.
    Here’s an example with pet images:
    Out[]=
    Contents cannot be rendered at this time; please try again later or download this notebook for full functionality »

    Recommendation Engine Techniques

    Supervised Learning

    You provide labeled examples that were created by some expert.
    Classification: Answers questions of the type “Is this A or B (or C or D or E)?”
    Regression: Answers questions of the type “How much or how many?”

    Classification

  • Infer a function from the data, mapping from feature values to label
  • Given a new data point, use this function to return a label based on the feature values
  • Day or Night
  • Hammer or Nails
  • Dog or Cat or Hamster or Goldfish
  • Regression

  • Infer a function from the data—mapping from the feature values to the numeric target value
  • Given a new data point, use the regression function to compute a target value optimal for the given features
  • Wine score
  • Financial forecasting (like house price estimates, or stock prices)
  • Sales forecasting
  • Unsupervised Learning: Clustering

    This is used to answer questions like:
  • How is the data organized?
  • Do the samples separate into groups of some kind?
  • Are there samples that are very different from most of the group (outliers)?
  • The goal is to partition a dataset into clusters of similar elements (all sorts of data: numerical, textual, image, as well as dates and times).
    Collect “clusters” of similar colors into separate lists:

    Classification based on Clustering

    When you don’t have enough labeled data, you can use clustering and cluster-membership as pseudo-class-labels to proceed with classification.

    Perceptron: mimics a human neuron

    Neural Net Layers

    Deep neural networks: Depth = number of hidden layers
    Three things that helped with success of neural networks:
  • new architectures that leveraged relationships between the inputs,
  • deeper networks to capture more complex functions more quickly.
  • easily programmable graphics processing unit (GPU) which offered much more raw
    computational power than processors
  • Deep Learning Derives Features from Data

    What is really going on in the neural network?

    E.g. Recognizing hand-written digits

    Applications of Deep Learning

  • Image identification
  • Audio identification
  • Speech to text conversion
  • Language translation
  • Generating text, images, audio and even video
  • Popular deep learning models

  • GPT
  • DALL.E
  • Authentication

    Authentication: the process or action of verifying the identity of a user or process.
    Encryption is the process of encoding information.
    Cryptography, which is the practice and study of techniques for secure communication i.e. communication in the presence of an adversary.
    Uses of cryptography:
  • user and message authentication,
  • protection from illegitimate changes to messages,
  • protection from eavesdropping, etc.
  • Role of Computing in Security

  • improves and coordinates sensors
  • learns habits and preferences
  • mimics human presence
  • integrates with personal computing preserves data
  • Examples of technologies used in security

  • Motion sensing
  • Sensing perimeter state: circuit along windows and doors
  • Audio processing of sound picked up by microphones
  • Image processing of pictures taken by high-resolution, low-power cameras
  • automated mapping of home environment,
  • understanding habits and personal
    preferences
  • control systems to mimic normal behavior integration with personal computing
    (wifi, mobile phones)
  • preservation of data (protected in tamper-proof storage and in the cloud)
  • Application beyond security

    For senior citizens requiring assisted living:
  • Health monitoring
  • General assistance
  • 6. Ethics and Privacy

    Problems of using something like Machine Learning

  • biases in data and absences of data produce biased / partial models.
  • training is much more costly than evaluation, so learning is typically not continuous.
  • What’s Better, ML or Humans?

    Advantages:

  • Capital costs lower.
  • Usage more flexible (same chip executes almost any task, unlike skilled humans).
  • Operating costs much lower (and go down rather than up in off-peak hours).
  • Operates 24/7 in absence of failure.
  • Lower failure rate.
  • No psychological issues.
  • Minimal safety issues (only
  • Replication and replacement nearly instantaneous.
  • Lightweight and portable
  • Disadvantages:

  • Requires electrical power.
  • Results may not be as good (balance against advantages).
  • Another major disadvantage: decisions not explainable.
  • Can’t learn from ML models.
  • No “reason” for results, correct or otherwise.
  • Bias in models due to bias in training data. E.g. https://sitn.hms.harvard.edu/flash/2020/racial-discrimination-in-face-recognition-technology/
  • Is it Ethical? Does it compromise Privacy?

    Is machine intelligence ethical?

    Ethics: moral principles that govern ... the conducting of an activity
    Does machine intelligence allow for privacy?

    Privacy: the state or condition of being free from being observed or disturbed by other people
    Tradeoff between privacy and benefits from surveillance.
    Net neutrality: The principle that an internet service provider (ISP) has to provide access to all sites, content and applications at the same speed, under the same conditions without blocking or giving preference to any content.

    Provide examples where

  • You will trade your privacy for internet service benefits
  • You will not trade your privacy for internet service benefits
  • 7. Match the words with the definitions question

    Pay attention to the last slide on most lecture presentations. Those are good candidates for this set of questions.
    Wolfram Cloud

    You are using a browser not supported by the Wolfram Cloud

    Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.


    I understand and wish to continue anyway »

    You are using a browser not supported by the Wolfram Cloud. Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.