Exam 3 Review

Lectures 19 & 20: Sense-Compute-Communicate-Actuate

Think in terms of the Sense-Compute-Communicate-Actuate loop


Examples of the Sense-Compute-Communicate-Actuate loop

◼
  • Drone delivery system
  • ◼
  • Autonomous vehicles
  • ◼
  • Digital assistants (e.g. Siri, Alexa etc.)
  • ◼
  • Cleaning robots (e.g. Roomba)
  • ◼
  • Smart treadmill
  • ◼
  • Your new invention
  • A note on moving robots (vacuum cleaners or self-driving cars)

    ◼
  • SLAM: Simultaneous Localization And Mapping
  • ◼
  • A robot will use information from its moving parts (e.g. wheel rotations) and sensors (e.g. cameras) to figure out how far it can move. This is localization.
  • ◼
  • It simultaneously uses the information to create a map of obstacles around (so it can avoid them) and the surrounding (so it knows where it has already been). This is mapping.
  • ◼
  • It does localization and mapping at the same time. Algorithms that help it do so are called SLAM algorithms.
  • Types of Sensors Available

    ◼
  • Visual: Regular camera, Infrared cameras, Thermal imaging sensors, LIDAR
  • ◼
  • Audio: Microphones for both audible sound and ultrasound
  • ◼
  • Motion: IMU (Inertial Measurement Unit) consisting of accelerometer, gyroscope, magnetometer
  • ◼
  • Wireless signal: GPS, WiFi
  • ◼
  • Other assorted sensors: Pressure, humidity, proximity, temperature, chemical traces
  • How do the sensors work?

    Camera

    Basic principle of how a camera works: https://youtu.be/jhBC39xZVnw

    LIDAR

    ◼
  • Light Detection and Ranging
  • ◼
  • How does it work?
  • ◼
  • Sends out laser pulses
  • ◼
  • Senses the reflected signals bouncing off surrounding objects (like echos)
  • ◼
  • Computes the time it took for the reflection to arrive back to calculate the distance it traveled
  • ◼
  • Also computes the intensity of the reflected light
  • ◼
  • Creates a 3D point cloud (a collections of points in 3D Space) based on this data -
  • ◼
  • A 3D point cloud shows how far different objects are from the sensor
  • ◼
  • Objects at different distances are represented in different colors
  • ◼
  • Easy to figure out what is near and what is far
  • ◼
  • Applications of LIDAR
  • ◼
  • Examples: https://youtu.be/zREAEdXzOcw
  • Microphone

    ◼
  • How does the microphone work: https://youtu.be/d_crXXbuEKE
  • IMU (Inertial Measurement Unit)

    ◼
  • Three sensors: Accelerometer + Gyroscope + Magnetometer
  • ◼
  • Accelerometer: measures change of velocity of a moving (or vibrating - in case of phone) object
  • ◼
  • Gyroscope: how fast an object is rotating around an axis (around Z axis - Roll, around Y axis - Pitch, around x axis Yaw)
  • ◼
  • Magnetometer: measures the strength and direction of a magnetic field. It can be used to determine the orientation of an object with respect to the earth’s magnetic field.
  • ◼
  • Together they help compute an object’s movement and position
  • ◼
  • Uses a technique called sensor fusion, which combines data from multiple sensors to improve accuracy and eliminate errors.
  • ◼
  • Applications:
  • ◼
  • Aircrafts: navigation, control, stabilization
  • ◼
  • Autonomous vehicles: accurate positioning and navigation (along with GPS)
  • ◼
  • Phones
  • ◼
  • Robotics
  • ◼
  • Virtual and augmented reality headsets and controllers
  • GPS: Global Positioning System

    ◼
  • 30+ navigation satellites in orbit around the earth
  • ◼
  • With the help of information about the distance to three satellites and the location of the satellite when the signal was sent, a receiver can compute its own three-dimensional position (given it has an atomic clock synchronized to GPS)
  • ◼
  • By taking a measurement from a fourth satellite, the receiver avoids the need for an atomic clock. Thus, the receiver uses four satellites to compute latitude, longitude, altitude, and time.
  • ◼
  • https://www.faa.gov/about/office_org/headquarters_offices/ato/service_units/techops/navservices/gnss/gps/howitworks
  • Lecture 21: Speech and Natural Language Processing

    The Sense-Compute-Communicate-Actuate loop is seen in a voice controlled device/assistant/smart home.
    The sensing part is easy - use a microphone to capture human voice. The computing step needs a lot of work.

    Computing for voice-controlled devices

    Computing consists of 3 steps (or sometimes more):
    ◼
  • Use Digital Signal Processing to get rid of noise (other voices, music, television, video games, pets, and so forth; Unauthorized voices should also be treated as noise)
  • ◼
  • Perform “voice recognition”: Convert the audio signal into a sequence of words or text. (Machine learning is now widely used for this).
  • ◼
  • Voice recognition is difficult - mispronunciations, misuse of grammar and meaning.
  • ◼
  • Context matters; Having a smaller vocabulary can help
  • ◼
  • Hierarchy of interacting probabilistic models used: Filtered audio -> phoneme extraction -> word identification -> grammatical sequencing
  • ◼
  • Machine learning used widely now
  • ◼
  • Use Natural Language Processing (NLP) to understand what the human is trying to communicate (process the natural language - e.g. English - and make sense of what they are saying)
  • NLP: Natural Language Processing

    ◼
  • Machine learning used heavily
  • ◼
  • Complex and expensive models
  • ◼
  • Uses a lot of probability
  • ◼
  • NLP is behind Google Translate, voice assistants (Alexa, Siri, etc.), chatbots, Google searches, voice-operated GPS, and more.
  • ◼
  • BERT: Bidirectional Encoder Representations from Transformers - a Deep Neural Net model for language processing
  • ◼
  • Understanding how words fit together is an important element in processing natural language
  • ◼
  • Can help with sentiment analysis, question answering, tect prediction, text generation, text summarization, and more.
  • ◼
  • Words that come after a missing word can help with guessing (e.g. I took my _____ for a walk; “dog” a good choice; “cat”, “snake”, “ferrari” not great choices)
  • ◼
  • BERT bidirectionally use the words on either side of the missing word to predict the missing word
  • ◼
  • LLMs: Large Language Models (e.g. GPT 4 - Generative Pretrained Transformer used in ChatGPT by OpenAI)
  • ◼
  • A “transformer” model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence
  • ◼
  • GPT models are neural network-based language prediction models built on the Transformer architecture. They analyze natural language queries, known as prompts, and predict the best possible response based on their understanding of language.
  • ◼
  • Applications:
  • ◼
  • Classification: How much did a reviewer like a movie? What sentiment did they express? Did they mention or suggest any movie categories?
  • ◼
  • Interpretation: Is anything in a patient’s electronic medical records relevant to the patient’s current symptoms?
  • ◼
  • Question Answering: Which sentence in a page of text answers a particular question?
  • ◼
  • Organization: How are documents and terms related? Which documents are relevant to a given question of interest?
  • Probability

    ◼
  • Calculate basic probability:
    Rolling a die has 6 possible outcomes.
    Probability of getting a 4 (one of the six possible outcomes) is 1/6
  • ◼
  • Conditional probability: Probability (B | A)
    E.g. Probability of getting a 4 when only 1 die was rolled
    Condition: only 1 die was rolled
    ​​
    or Probability of getting a 4 when 2 dice were rolled
    Condition: 2 dice were rolled
  • ◼
  • Bayes Theorem can be used to calculate MLE (Maximum Likelihood Estimate)
  • Probability (A AND B) = Probability (B | A) X Probability (A)
    ◼
  • MLE helps us figure out
  • ◼
  • Given an observation,
    ◦ how to choose the explanation that is
    ◦ most likely to produce the observation.
  • ◼
  • Given we are told the result of rolling 1 or 2 dice is 4, what is the likelihood that 1 die was rolled or 2 dice were rolled.
  • ◼
  • “Given an audio input, what sequence of words was spoken?”
  • ◼
  • “Given a sequence of words, what did the speaker want to communicate?”
  • Example of calculating probability

    ◼
  • What is the probability of rolling a 4 using 1 die?
  • 1/6
    ◼
  • What is the probability of rolling a 4 using 2 dice?
  • ◼
  • How can we get 5 from two dice?
    First die: 1, Second die: 3
    First die: 2, Second die: 2
    First die: 3, Second die: 1
  • ◼
  • How many possible combinations for 2 dice?
  • Out[]=
    {1,1}
    {1,2}
    {1,3}
    {1,4}
    {1,5}
    {1,6}
    {2,1}
    {2,2}
    {2,3}
    {2,4}
    {2,5}
    {2,6}
    {3,1}
    {3,2}
    {3,3}
    {3,4}
    {3,5}
    {3,6}
    {4,1}
    {4,2}
    {4,3}
    {4,4}
    {4,5}
    {4,6}
    {5,1}
    {5,2}
    {5,3}
    {5,4}
    {5,5}
    {5,6}
    {6,1}
    {6,2}
    {6,3}
    {6,4}
    {6,5}
    {6,6}
    36 possible combinations.
    ◼
  • Probability of getting a 4 out of those 36 combinations:
  • Lecture 22: Computer Vision

    What do the following terms mean in the context of computer vision and image processing?

    Edge Detection

    Edges of an image are a set of points between image regions and are typically computed by linking high-gradient pixels. In practice, an edge can have an arbitrary shape.

    Color Quantization

    Color Quantization: the process of reducing the number of colors used to represent an image, typically for efficient file format compression such as GIF.

    Image Segmentation

    Image Segmentation: organizing the pixels of an image into groups.

    Lecture 23: Virtual and Augmented Reality

    ◼
  • AR - Augmented reality, designed to add digital elements over real- world views with limited interaction. Enhancement of a person’s vision through addition of computer-generated imagery.
  • ◼
  • VR - Virtual reality immersive experiences helping to isolate users from the real world, usually via a headset device and headphones designed for such activities.
  • ◼
  • MR - Mixed reality combining AR and VR elements so that digital objects can interact with the real world, means businesses can design elements anchored within a real environment.
  • ◼
  • ER -Extended reality (XR)— covering all types of technologies that enhance our senses, including the three types previously mentioned.
  • ◼
  • Applications: Think of some existing and futuristic applications of AR and VR.
  • Motion capture:

    Used in entertainment (movies, games) and sports (feedback to athletes)
    ◼
  • With markers, attached to the face or bodysuit, provides the ability to record the relation between human emotion, speech, and facial expressions.
  • ◼
  • Actors/players are recorded, features extracted and then cast onto animated characters.
  • ◼
  • Markerless motion capture possible: e.g. Microsoft Kinect, which uses blended depth detection with feature identification
  • Digital twin

    ◼
  • Digital twin of a person can be created by
  • ◼
  • Motion captured recordings
  • ◼
  • NLP
  • ◼
  • Enhance background information on them by mining internet
  • ◼
  • Use machine learning to generate speech, images and even video (e.g. deep fake)
  • Immersion

    Immersion: “goal” is to enable a user to temporarily forget that the virtual reality world is not the real world.
    Virtual reality headsets make it easier to interact with the virtual world, however laws of physics still need to be followed.

    Haptics

    Haptics: Simulated touch
    Think about applications: entertainment (movies, games), prosthetic limbs

    Lecture 24: Autonomous Driving

    Autonomous driving leverages technologies that we have already discussed:
    ◼
  • computer (robot) vision,
  • ◼
  • sensor fusion, and
  • ◼
  • machine learning.
  • Problem with training self-driving cars

    ◼
  • how to resolve problem of training on unusual events/accidents (autonomous vehicle must be able to respond to rare events safely)
  • ◼
  • Possible solution: Simulation
  • ◼
  • ML models can be brittle
  • ◼
  • Possible solution: Same problem for all ML models. Progress being made across the board
  • ◼
  • Some problems with “Actuation” part of the loop addressed
  • ◼
  • Use augmented reality to execute 3-point turn (take images and analyze)
  • ◼
  • acquire a model of the local environment,
  • ◼
  • select the best location to move to
  • ◼
  • use path planning based on the vehicle dynamics
  • ◼
  • execute plan
  • ◼
  • Models of motion: Dynamics can be helpful.
    Realistic dynamics models incorporate:
  • ◼
  • mass distribution,
  • ◼
  • acceleration and braking,
  • ◼
  • suspension and steering,
  • ◼
  • aerodynamics,
  • ◼
  • tires and traction (including issues of slippage both laterally and due to overly rapid braking), and
  • ◼
  • even distortions in the car’s and tires’ shapes during high-speed turns.
  • Stopping distance

    ◼
  • Formula for stopping:
    distance = velocity^2 / (2 x acceleration)
  • ◼
  • Anything closer than stopping distance demands choice - hit or swerve
  • ◼
  • Higher speeds and lower traction need increased distance
  • ◼
  • Adverse conditions affect stopping distance
  • Designing Safety

    ◼
  • Think about how you can design safety for autonomous cars?
  • ◼
  • Practice defensive driving?
  • ◼
  • What’s more important, human life or passenger safety?
  • ◼
  • How much danger does the car need to be in to deliberately run down a human?
  • Applications

    ◼
  • Broader impacts of autonomous driving
  • ◼
  • autonomous delivery
  • ◼
  • autonomous shipping
  • ◼
  • autonomous transportation
  • ◼
  • Think about other applications