WOLFRAM NOTEBOOK

ECE101 Fall 2024 Exam 2

Study Guide

1
. Search Engines

How do Search Engines work?

Think of it as a five-step process.

◼

1. Crawl the web to gather billions of documents (text, images, videos).

◼

2. Organize the documents for fast searching,

◼

called indexing the documents (*like the index in a textbook)

◼

to create a set S.

◼

3. Order documents in S by decreasing reputation.

These three occur before anyone does a search.

Two More Steps Starting with the Search Phrase P

◼

4. Use phrase P to filter documents in S in order to find relevant set R.

◼

5. Do another round of reordering

◼

based on knowledge of user (search history, YouTube preferences, travel, purchases, and so on)

◼

to create the list L for display (ads go in front!).

What information is being used?

The companies also use information about
◦ your IP address (where are you?),
◦ your browser, and
◦ more or less anything else that they can deduce by having figured out who you are and consulting their records on your preferences.

Tracking can have advantages and disadvantages

Sometimes tracking is really useful. When?
• Weather tracking - today’s tornado warning • Personalized experience
• Find deals you like
• You don’t mind providing info ...

Tracking can be compromising: Lack of privacy

GDPR

Europe Has Led the Way in Privacy Regulation: General Data Protection Regulations or GDPR
Thanks to European privacy laws
◦ companies are being forced
◦ to make the data that they have collected
◦ on a person available to that person
◦ (and editable, and subject to deletion).

Ranking pages... By reputation.

A Page is “Good” if Others Point to It.

Idea: if a page is important, other pages will link to it.
Importance: The number of incoming arcs in the Web graph; Links from Important Pages are More Important

When we count incoming arcs,
◦ we also want to count
◦ the “importance” of the pages from which the incoming arc is coming.
◦ and we should count the “importance” of the pages that link to those pages.
◦ And so on.

Page Rank Intuition: People Walking at Random

;

Imagine a person at every node
Each person
◦ chooses an outgoing link at random
◦ (equal probability, independently of past/future decisions)
◦ and walks to another node

Repeat the process many times, then
◦ count how many people are at a node
◦ to find the node’s “importance” (rank).

Page Rank: Expected Number of “People” At a Node

Some pages include no URLs.
◦ Anyone at such a node
◦ can ‘start over’
◦ by choosing a new node at random instead of going down a link.

2
. Recommendation Engines

Companies that care about the problem:

- Netflix (movies)
- Amazon (shopping)
- Search engines (ranking news items)
- Spotify (recommending music)
- Google news (customizing news recommendations) - Yelp (recommending restaurants and services)
- Goodreads (book recommendations)
... many many more

Vector Spaces

Represent objects (people, movies, recipe, book, etc.) as data vectors.

In one dimension:

In[]:=

peopleAges={7,11,18,20,21,50,67};

In[]:=

NumberLinePlot[peopleAges]

Out[]=

In two dimensions:

In[]:=

places={{41.8375511`,-87.6818441`},{39.7639077`,-89.6708323`},{40.115057`,-88.2736523`},{40.7523087`,-89.6170968`}};

In[]:=

ListPlot[places,LabelingFunction->Callout]

Out[]=

In[]:=

GeoListPlot[Callout[GeoPosition[#],#]&/@places]

Out[]=

In three dimensions:

In[]:=

colorRGBs={{0.28,0.62,0.43},{0.54,0.078,0.15},{0.56,0.24,0.006},{0.84,0.42,0.19},{0.92,0.45,0.16},{0.3,1.,0.17},{0.46,0.64,0.078},{0.96,0.79,0.56},{0.11,0.1,0.65},{0.29,0.37,0.9},{0.8,0.69,0.13},{0.67,0.83,0.18},{0.18,0.1,1.},{0.98,0.76,0.92},{0.42,0.33,0.82},{0.056,0.94,0.83},{0.6,0.83,0.79},{0.37,0.95,0.16},{0.39,0.15,0.88},{0.86,0.2,0.35},{0.71,0.16,0.68},{0.65,0.044,0.81},{0.98,0.71,0.3},{0.36,0.43,0.84},{0.82,0.94,0.026},{0.87,0.7,0.78},{0.74,0.2,0.97},{0.092,0.84,0.23},{0.62,0.64,0.73},{0.59,0.42,0.52}};

ListPointPlot3D[Callout[#,RGBColor[#]]&/@colorRGBs,AxesLabel->{Red,Green,Blue}]

Out[]=

Visualizing Data in Feature Space

Reduce data points to feature vectors. For a movie, the features could be: {romance, horror, action, comedy, informational, thriller, drama}

Plot in n-dimensional feature space and find the movie closest to a another movie.

Here’s an example with pet images:

Out[]=

Contents cannot be rendered at this time; please try again later or download this notebook for full functionality »

Recommendation Engine Techniques

Supervised Learning

You provide labeled examples that were created by some expert.

Classification: Answers questions of the type “Is this A or B (or C or D or E)?”
Regression: Answers questions of the type “How much or how many?”

Classification

◼

Infer a function from the data, mapping from feature values to label

◼

Given a new data point, use this function to return a label based on the feature values

◼

Day or Night

◼

Hammer or Nails

◼

Dog or Cat or Hamster or Goldfish

Regression

◼

Infer a function from the data—mapping from the feature values to the numeric target value

◼

Given a new data point, use the regression function to compute a target value optimal for the given features

◼

Wine score

◼

Financial forecasting (like house price estimates, or stock prices)

◼

Sales forecasting

Unsupervised Learning: Clustering

This is used to answer questions like:

◼

How is the data organized?

◼

Do the samples separate into groups of some kind?

◼

Are there samples that are very different from most of the group (outliers)?

The goal is to partition a dataset into clusters of similar elements (all sorts of data: numerical, textual, image, as well as dates and times).

Collect “clusters” of similar colors into separate lists:

Classification based on Clustering

When you don’t have enough labeled data, you can use clustering and cluster-membership as pseudo-class-labels to proceed with classification.

Perceptron: mimics a human neuron

Neural Net Layers

Deep neural networks: Depth = number of hidden layers

Three things that helped with success of neural networks:

◼

new architectures that leveraged relationships between the inputs,

◼

deeper networks to capture more complex functions more quickly.

◼

easily programmable graphics processing unit (GPU) which offered much more raw
computational power than processors

Deep Learning Derives Features from Data

What is really going on in the neural network?

E.g. Recognizing hand-written digits

Applications of Deep Learning

◼

Image identification

◼

Audio identification

◼

Speech to text conversion

◼

Language translation

◼

Generating text, images, audio and even video

Popular deep learning models

◼

GPT

◼

DALL.E

Authentication

Authentication: the process or action of verifying the identity of a user or process.

Encryption is the process of encoding information.

Cryptography, which is the practice and study of techniques for secure communication i.e. communication in the presence of an adversary.

Uses of cryptography:

◼

user and message authentication,

◼

protection from illegitimate changes to messages,

◼

protection from eavesdropping, etc.

Role of Computing in Security

◼

◦ improves and coordinates sensors

◼

◦ learns habits and preferences

◼

◦ mimics human presence

◼

◦ integrates with personal computing ◦ preserves data

Examples of technologies used in security

◼

Motion sensing

◼

Sensing perimeter state: circuit along windows and doors

◼

Audio processing of sound picked up by microphones

◼

Image processing of pictures taken by high-resolution, low-power cameras

◼

automated mapping of home environment,

◼

understanding habits and personal
preferences

◼

control systems to mimic normal behavior ◦ integration with personal computing
(wifi, mobile phones)

◼

preservation of data (protected in tamper-proof storage and in the cloud)

Application beyond security

For senior citizens requiring assisted living:

◼

Health monitoring

◼

General assistance

6. Ethics and Privacy

Problems of using something like Machine Learning

◼

biases in data and absences of data produce biased / partial models.

◼

training is much more costly than evaluation, so learning is typically not continuous.

What’s Better, ML or Humans?

Advantages:

◼

Capital costs lower.

◼

Usage more flexible (same chip executes almost any task, unlike skilled humans).

◼

Operating costs much lower (and go down rather than up in off-peak hours).

◼

Operates 24/7 in absence of failure.

◼

Lower failure rate.

◼

No psychological issues.

◼

Minimal safety issues (only

◼

Replication and replacement nearly instantaneous.

◼

Lightweight and portable

Disadvantages:

◼

Requires electrical power.

◼

Results may not be as good (balance against advantages).

◼

Another major disadvantage: decisions not explainable.

◼

Can’t learn from ML models.

◼

No “reason” for results, correct or otherwise.

◼

Bias in models due to bias in training data. E.g. https://sitn.hms.harvard.edu/flash/2020/racial-discrimination-in-face-recognition-technology/

Is it Ethical? Does it compromise Privacy?

Is machine intelligence ethical?

Ethics: moral principles that govern ... the conducting of an activity

Does machine intelligence allow for privacy?

Privacy: the state or condition of being free from being observed or disturbed by other people

Tradeoff between privacy and benefits from surveillance.

Net neutrality: The principle that an internet service provider (ISP) has to provide access to all sites, content and applications at the same speed, under the same conditions without blocking or giving preference to any content.

Provide examples where

◼

You will trade your privacy for internet service benefits

◼

You will not trade your privacy for internet service benefits

7. Match the words with the definitions question

Pay attention to the last slide on most lecture presentations. Those are good candidates for this set of questions.

You are using a browser not supported by the Wolfram Cloud

Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.

I understand and wish to continue anyway »

ECE101 Fall 2024 Exam 2

1. Search Engines

How do Search Engines work?

What information is being used?

Tracking can have advantages and disadvantages

GDPR

Ranking pages... By reputation.

Page Rank Intuition: People Walking at Random

2. Recommendation Engines

Vector Spaces

In one dimension:

In two dimensions:

In three dimensions:

Visualizing Data in Feature Space

Recommendation Engine Techniques

Supervised Learning

Classification

Regression

Unsupervised Learning: Clustering

Classification based on Clustering

Perceptron: mimics a human neuron

Neural Net Layers

Deep Learning Derives Features from Data

What is really going on in the neural network?

Applications of Deep Learning

Popular deep learning models

Authentication

Role of Computing in Security

Examples of technologies used in security

Application beyond security

6. Ethics and Privacy

Problems of using something like Machine Learning

What’s Better, ML or Humans?

Advantages:

Disadvantages:

Is it Ethical? Does it compromise Privacy?

Provide examples where

7. Match the words with the definitions question

1
. Search Engines

2
. Recommendation Engines