ECE101 Fall 2024 Exam 2
ECE101 Fall 2024 Exam 2
Study Guide
1. Search Engines
1
. Search EnginesHow do Search Engines work?
How do Search Engines work?
Think of it as a five-step process.
◼
1. Crawl the web to gather billions of documents (text, images, videos).
◼
2. Organize the documents for fast searching,
◼
called indexing the documents (*like the index in a textbook)
◼
to create a set S.
◼
3. Order documents in S by decreasing reputation.
These three occur before anyone does a search.
Two More Steps Starting with the Search Phrase P
◼
4. Use phrase P to filter documents in S in order to find relevant set R.
◼
5. Do another round of reordering
◼
based on knowledge of user (search history, YouTube preferences, travel, purchases, and so on)
◼
to create the list L for display (ads go in front!).
What information is being used?
What information is being used?
The companies also use information about
◦ your IP address (where are you?),
◦ your browser, and
◦ more or less anything else that they can deduce by having figured out who you are and consulting their records on your preferences.
◦ your IP address (where are you?),
◦ your browser, and
◦ more or less anything else that they can deduce by having figured out who you are and consulting their records on your preferences.
Tracking can have advantages and disadvantages
Tracking can have advantages and disadvantages
Sometimes tracking is really useful. When?
• Weather tracking - today’s tornado warning • Personalized experience
• Find deals you like
• You don’t mind providing info ...
Tracking can be compromising: Lack of privacy
• Weather tracking - today’s tornado warning • Personalized experience
• Find deals you like
• You don’t mind providing info ...
Tracking can be compromising: Lack of privacy
GDPR
GDPR
Europe Has Led the Way in Privacy Regulation: General Data Protection Regulations or GDPR
Thanks to European privacy laws
◦ companies are being forced
◦ to make the data that they have collected
◦ on a person available to that person
◦ (and editable, and subject to deletion).
Thanks to European privacy laws
◦ companies are being forced
◦ to make the data that they have collected
◦ on a person available to that person
◦ (and editable, and subject to deletion).
Ranking pages... By reputation.
Ranking pages... By reputation.
A Page is “Good” if Others Point to It.
Idea: if a page is important, other pages will link to it.
Importance: The number of incoming arcs in the Web graph; Links from Important Pages are More Important
When we count incoming arcs,
◦ we also want to count
◦ the “importance” of the pages from which the incoming arc is coming.
◦ and we should count the “importance” of the pages that link to those pages.
◦ And so on.
Importance: The number of incoming arcs in the Web graph; Links from Important Pages are More Important
When we count incoming arcs,
◦ we also want to count
◦ the “importance” of the pages from which the incoming arc is coming.
◦ and we should count the “importance” of the pages that link to those pages.
◦ And so on.
Page Rank Intuition: People Walking at Random
Page Rank Intuition: People Walking at Random
g=
;
Imagine a person at every node
Each person
◦ chooses an outgoing link at random
◦ (equal probability, independently of past/future decisions)
◦ and walks to another node
Repeat the process many times, then
◦ count how many people are at a node
◦ to find the node’s “importance” (rank).
Each person
◦ chooses an outgoing link at random
◦ (equal probability, independently of past/future decisions)
◦ and walks to another node
Repeat the process many times, then
◦ count how many people are at a node
◦ to find the node’s “importance” (rank).
Page Rank: Expected Number of “People” At a Node
Some pages include no URLs.
◦ Anyone at such a node
◦ can ‘start over’
◦ by choosing a new node at random instead of going down a link.
◦ Anyone at such a node
◦ can ‘start over’
◦ by choosing a new node at random instead of going down a link.
2. Recommendation Engines
2
. Recommendation EnginesCompanies that care about the problem:
- Netflix (movies)
- Amazon (shopping)
- Search engines (ranking news items)
- Spotify (recommending music)
- Google news (customizing news recommendations) - Yelp (recommending restaurants and services)
- Goodreads (book recommendations)
... many many more
- Amazon (shopping)
- Search engines (ranking news items)
- Spotify (recommending music)
- Google news (customizing news recommendations) - Yelp (recommending restaurants and services)
- Goodreads (book recommendations)
... many many more
Vector Spaces
Vector Spaces
Represent objects (people, movies, recipe, book, etc.) as data vectors.
In one dimension:
In one dimension:
In[]:=
peopleAges={7,11,18,20,21,50,67};
In[]:=
NumberLinePlot[peopleAges]
Out[]=
In two dimensions:
In two dimensions:
In[]:=
places={{41.8375511`,-87.6818441`},{39.7639077`,-89.6708323`},{40.115057`,-88.2736523`},{40.7523087`,-89.6170968`}};
In[]:=
ListPlot[places,LabelingFunction->Callout]
Out[]=
In[]:=
GeoListPlot[Callout[GeoPosition[#],#]&/@places]
Out[]=
In three dimensions:
In three dimensions:
In[]:=
colorRGBs={{0.28,0.62,0.43},{0.54,0.078,0.15},{0.56,0.24,0.006},{0.84,0.42,0.19},{0.92,0.45,0.16},{0.3,1.,0.17},{0.46,0.64,0.078},{0.96,0.79,0.56},{0.11,0.1,0.65},{0.29,0.37,0.9},{0.8,0.69,0.13},{0.67,0.83,0.18},{0.18,0.1,1.},{0.98,0.76,0.92},{0.42,0.33,0.82},{0.056,0.94,0.83},{0.6,0.83,0.79},{0.37,0.95,0.16},{0.39,0.15,0.88},{0.86,0.2,0.35},{0.71,0.16,0.68},{0.65,0.044,0.81},{0.98,0.71,0.3},{0.36,0.43,0.84},{0.82,0.94,0.026},{0.87,0.7,0.78},{0.74,0.2,0.97},{0.092,0.84,0.23},{0.62,0.64,0.73},{0.59,0.42,0.52}};
ListPointPlot3D[Callout[#,RGBColor[#]]&/@colorRGBs,AxesLabel->{Red,Green,Blue}]
Out[]=
Visualizing Data in Feature Space
Visualizing Data in Feature Space
Reduce data points to feature vectors. For a movie, the features could be: {romance, horror, action, comedy, informational, thriller, drama}
Plot in n-dimensional feature space and find the movie closest to a another movie.
Here’s an example with pet images:
Out[]=
Contents cannot be rendered at this time; please try again later or download this notebook for full functionality »
Recommendation Engine Techniques
Recommendation Engine Techniques
Supervised Learning
Supervised Learning
You provide labeled examples that were created by some expert.
Classification: Answers questions of the type “Is this A or B (or C or D or E)?”
Regression: Answers questions of the type “How much or how many?”
Regression: Answers questions of the type “How much or how many?”
Classification
Classification
◼
Infer a function from the data, mapping from feature values to label
◼
Given a new data point, use this function to return a label based on the feature values
◼
Day or Night
◼
Hammer or Nails
◼
Dog or Cat or Hamster or Goldfish
Regression
Regression
◼
Infer a function from the data—mapping from the feature values to the numeric target value
◼
Given a new data point, use the regression function to compute a target value optimal for the given features
◼
Wine score
◼
Financial forecasting (like house price estimates, or stock prices)
◼
Sales forecasting
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
This is used to answer questions like:
◼
How is the data organized?
◼
Do the samples separate into groups of some kind?
◼
Are there samples that are very different from most of the group (outliers)?
The goal is to partition a dataset into clusters of similar elements (all sorts of data: numerical, textual, image, as well as dates and times).
Collect “clusters” of similar colors into separate lists:
Classification based on Clustering
Classification based on Clustering
When you don’t have enough labeled data, you can use clustering and cluster-membership as pseudo-class-labels to proceed with classification.
Perceptron: mimics a human neuron
Perceptron: mimics a human neuron
Neural Net Layers
Neural Net Layers
Deep neural networks: Depth = number of hidden layers
Three things that helped with success of neural networks:
◼
new architectures that leveraged relationships between the inputs,
◼
deeper networks to capture more complex functions more quickly.
◼
easily programmable graphics processing unit (GPU) which offered much more raw
computational power than processors
computational power than processors
Deep Learning Derives Features from Data
Deep Learning Derives Features from Data
What is really going on in the neural network?
What is really going on in the neural network?
E.g. Recognizing hand-written digits
Applications of Deep Learning
Applications of Deep Learning
◼
Image identification
◼
Audio identification
◼
Speech to text conversion
◼
Language translation
◼
Generating text, images, audio and even video
Popular deep learning models
Popular deep learning models
◼
GPT
◼
DALL.E
Authentication
Authentication
Authentication: the process or action of verifying the identity of a user or process.
Encryption is the process of encoding information.
Cryptography, which is the practice and study of techniques for secure communication i.e. communication in the presence of an adversary.
Uses of cryptography:
◼
user and message authentication,
◼
protection from illegitimate changes to messages,
◼
protection from eavesdropping, etc.
Role of Computing in Security
Role of Computing in Security
◼
◦ improves and coordinates sensors
◼
◦ learns habits and preferences
◼
◦ mimics human presence
◼
◦ integrates with personal computing ◦ preserves data
Examples of technologies used in security
Examples of technologies used in security
◼
Motion sensing
◼
Sensing perimeter state: circuit along windows and doors
◼
Audio processing of sound picked up by microphones
◼
Image processing of pictures taken by high-resolution, low-power cameras
◼
automated mapping of home environment,
◼
understanding habits and personal
preferences
preferences
◼
control systems to mimic normal behavior ◦ integration with personal computing
(wifi, mobile phones)
(wifi, mobile phones)
◼
preservation of data (protected in tamper-proof storage and in the cloud)
Application beyond security
Application beyond security
For senior citizens requiring assisted living:
◼
Health monitoring
◼
General assistance
6. Ethics and Privacy
6. Ethics and Privacy
Problems of using something like Machine Learning
Problems of using something like Machine Learning
◼
biases in data and absences of data produce biased / partial models.
◼
training is much more costly than evaluation, so learning is typically not continuous.
What’s Better, ML or Humans?
What’s Better, ML or Humans?
Advantages:
Advantages:
◼
Capital costs lower.
◼
Usage more flexible (same chip executes almost any task, unlike skilled humans).
◼
Operating costs much lower (and go down rather than up in off-peak hours).
◼
Operates 24/7 in absence of failure.
◼
Lower failure rate.
◼
No psychological issues.
◼
Minimal safety issues (only
◼
Replication and replacement nearly instantaneous.
◼
Lightweight and portable
Disadvantages:
Disadvantages:
◼
Requires electrical power.
◼
Results may not be as good (balance against advantages).
◼
Another major disadvantage: decisions not explainable.
◼
Can’t learn from ML models.
◼
No “reason” for results, correct or otherwise.
◼
Bias in models due to bias in training data. E.g. https://sitn.hms.harvard.edu/flash/2020/racial-discrimination-in-face-recognition-technology/
Is it Ethical? Does it compromise Privacy?
Is it Ethical? Does it compromise Privacy?
Is machine intelligence ethical?
Ethics: moral principles that govern ... the conducting of an activity
Ethics: moral principles that govern ... the conducting of an activity
Does machine intelligence allow for privacy?
Privacy: the state or condition of being free from being observed or disturbed by other people
Privacy: the state or condition of being free from being observed or disturbed by other people
Tradeoff between privacy and benefits from surveillance.
Net neutrality: The principle that an internet service provider (ISP) has to provide access to all sites, content and applications at the same speed, under the same conditions without blocking or giving preference to any content.
Provide examples where
Provide examples where
◼
You will trade your privacy for internet service benefits
◼
You will not trade your privacy for internet service benefits
7. Match the words with the definitions question
7. Match the words with the definitions question
Pay attention to the last slide on most lecture presentations. Those are good candidates for this set of questions.