Introduction to Machine Learning
Introduction to Machine Learning
A Machine Learning by Examples: Examples of Machine Learning
A Machine Learning by Examples: Examples of Machine Learning
Let’s look at some examples of Machine Learning.
It can be used to answer questions as follows:
It can be used to answer questions as follows:
What language is this?
What language is this?
LanguageIdentify takes pieces of text, and identifies what human language they’re in.
Identify the language each phrase is in:
In[]:=
LanguageIdentify"thank you","merci","dar las gracias","감사합니다","Дякую","நன்றி"
Out[]=
,,,,,
What’s in this image?
What’s in this image?
Identify what an image is of:
In[]:=
ImageIdentify
Out[]=
What sort of sentiment does this text express?
What sort of sentiment does this text express?
Classifying the “sentiment” of text:
In[]:=
Classify["Sentiment","I'm so excited about machine learning"]
Out[]=
Positive
In[]:=
Classify["Sentiment","I broke my phone"]
Out[]=
Negative
The Machine is trying to answer a Question: Is this A or B (or C or D or E)
The Machine is trying to answer a Question: Is this A or B (or C or D or E)
◼
Is this English or French (or Arabic or Hindi)?
◼
Is this a cheetah or a tiger or an owl?
◼
Is this an example of positive or negative or neutral sentiment?
Build your own classifier: Human
Build your own classifier: Human
How do you learn?
Example: Can you tell a Bolete from a Morel?
Example: Can you tell a Bolete from a Morel?
Train
Train
->Bolete
->Morel
Test
Test
Out[]=
Build your own classifier: Machine Learning Program
Build your own classifier: Machine Learning Program
A new way to program
A new way to program
Programming is telling the computer what to do. With Machine Learning, you program the computer by showing examples instead of explicitly telling it step-by-step what to do.
You can train a classifier yourself.
Train
Train
In[]:=
mushroomClassifier=Classify
->"Bolete",
->"Morel",
->"Bolete",
->"Morel",
->"Morel",
->"Bolete",
->"Morel",
->"Morel",
->"Bolete",
->"Bolete",
->"Morel",
->"Morel",
->"Bolete",
->"Bolete",
->"Morel",
->"Bolete"
Out[]=
ClassifierFunction
Test
Test
In[]:=
mushroomClassifier
Out[]=
Bolete
In[]:=
mushroomClassifier
Out[]=
Morel
More Classifiers
More Classifiers
Day or Night Classifier
Day or Night Classifier
Feed a list of images, each with a label “Day” or “Night”, to Classify:
In[]:=
daynight=Classify
->"Night",
->"Day",
->"Night",
->"Night",
->"Day",
->"Night",
->"Day",
->"Day",
->"Night",
->"Night",
->"Day",
->"Night",
->"Night",
->"Day",
->"Night",
->"Night",
->"Day",
->"Day",
->"Day",
->"Day",
->"Night",
->"Night",
->"Day",
->"Night",
->"Night",
->"Day",
->"Day",
->"Night",
->"Day"
The result is a classifier function that can accept new examples as input data and return a class or label as output that the classifier believes fits the input best:
Try your own Classifier
Try your own Classifier
Try using Classify[ ] for identifying images of any of the following famous pairs:
◼
Dogs and cats
◼
Tom and Jerry
◼
Chalk and cheese
◼
Hammer and nails
◼
Fish and chips
◼
Ant and Dec
◼
Batman and Robin
◼
Asterix and Obelix
Handwritten Digit Classifier
Handwritten Digit Classifier
Here's a simple example of classifying handwritten digits as 0 or 1. You give the classifier a collection of training examples, followed by a particular handwritten digit. Then it'll tell you whether the digit you give is a 0 or 1.
With training examples, Classify correctly identifies a handwritten 0:
Can a classifier handle lots of examples?
Can a classifier handle lots of examples?
We looked at only a few examples:
How about looking at many more examples?
Note: Constructed from hand written digits extracted from handwriting samples
There’s no way a human could process that data in seconds - right?
Try the classifier on these samples:
How do the Classifiers Work? (Hint: Remember Vector Spaces)
How do the Classifiers Work? (Hint: Remember Vector Spaces)
In one dimension
In one dimension
Say the goal is to predict if a vehicle is a car or a truck based on its weight in tons:
In two dimensions
In two dimensions
In three dimensions
In three dimensions
Colors in 3D feature space (Red, Green, Blue):
Based on how close they are to each other we can cluster them into three groups: the Red group, the Green group and the Blue group:
Any data sample can be represented as a vector of numbers
Any data sample can be represented as a vector of numbers
Calculate nearness based on numbers
Calculate nearness based on numbers
Find what number in a list is nearest to what you supply.
Find what number in the list is nearest to 22:
Find the nearest three numbers:
Convert any data to numbers
Convert any data to numbers
In Wolfram Language the Nearest function can work with a variety of data, including numerical, geospatial, textual, and visual, as well as dates and times.
Find the 3 colors in the list that are nearest to the color you give:
Find the 5 words nearest to “electric” in the list of words:
There’s a notion of nearness for images too.
Find the nearest image from a dataset of dog images:
Train a nearest function:
Find the image from the dataset that is nearest to these new sample images:
Nearness as step to identifying
Nearness as step to identifying
Human’s Approach
Human’s Approach
When we compare things—whether they’re colors or pictures of animals—we can think of identifying certain features that allow us to distinguish them.
◼
For colors, a feature might be how light the color is, or how much red it contains.
◼
For pictures of animals, a feature might be how furry the animal looks, or how pointy its ears are.
Machine’s Approach
Machine’s Approach
The machine learning function is able to identify an image because it has previously seen similar images and decides that this image is closest to the examples of “cheetah” images it has seen before:
What if we try to provide the same image to the machine learning function but blur it a bit every time, i.e. we “muddy the input”?
Progressively blur a picture of a cheetah:
When the picture gets too blurred, ImageIdentify no longer thinks it' s a cheetah. What do you think this is an image of?
What does the computer think of the images?
Take one of the blurred images and look at all possible answers ImageIdentify came up with.
ImageIdentify thinks this might be a cheetah, but it’s also likely to be a liger, or it could be a lion or a wildcat.
ImageIdentify thinks this might be a cheetah, but it’s also likely to be a liger, or it could be a lion or a wildcat.
When the image is sufficiently blurred, ImageIdentify can have wild ideas about what it might be.
So what is machine learning?
So what is machine learning?
◼
How computers recognize patterns without being explicitly programmed...
◼
A different way to program a computer.
Instead of writing rules and providing explicit instructions, you are programming with help of data
(showing lots and lots of examples;
learning by trial and error, lots of practice and data).
Instead of writing rules and providing explicit instructions, you are programming with help of data
(showing lots and lots of examples;
learning by trial and error, lots of practice and data).
AI, Machine Learning and Neural Networks
AI, Machine Learning and Neural Networks
Supervised vs. Unsupervised Machine Learning
Supervised vs. Unsupervised Machine Learning
In machine learning, one often gives training that explicitly says, for example, “this is a cheetah”, “this is a lion”.
This is known as “Supervised Learning”. You provide labeled examples that were created by some expert.
This is known as “Supervised Learning”. You provide labeled examples that were created by some expert.
But one also often just wants to automatically pick out categories of things without providing specific labels.
This is “Unsupervised Learning”.
This is “Unsupervised Learning”.
Supervised Learning
Supervised Learning
This is used to answer questions of the type:
◼
Is this A or B (or A or B or C or D or E)? (Classification)
◼
How much or how many? (Regression)
The Task of Classification
The Task of Classification
Predict a label for the sample:
Training data is usually a list of labeled samples:
◼
Infer a function from the data, mapping from feature values to label
◼
Given a new data point, use this function to return a label based on the feature values
Example of a Classifier to Recognize a Dog or a Cat
Example of a Classifier to Recognize a Dog or a Cat
Well-known algorithms available to perform Classification
Well-known algorithms available to perform Classification
Set up some example data:
Classify automatically picks a method most suitable for the input data:
It is also possible to specifically set the method to be used:
NearestNeighbors
Find known data points that are nearest to the input sample in feature space and use only those to infer the class or value.
LogisticRegression
Fitting best linear combination of logistic sigmoid functions.
SupportVectorMachine
Find the hyperplane that best partitions the data (maximum-margin hyperplane).
RandomForest
Construct a decision tree that repeatedly partitions the data. Do this repeatedly to create an ensemble of decision trees and choose the decision tree that gives the best predictive power.
NaiveBayes
Determine the class using Bayes’s theorem and assuming that features are independent given the class.
NeuralNetwork
Model class probabilities or predict the value distribution using a neural network.
Find known data points that are nearest to the input sample in feature space and use only those to infer the class or value.
LogisticRegression
Fitting best linear combination of logistic sigmoid functions.
SupportVectorMachine
Find the hyperplane that best partitions the data (maximum-margin hyperplane).
RandomForest
Construct a decision tree that repeatedly partitions the data. Do this repeatedly to create an ensemble of decision trees and choose the decision tree that gives the best predictive power.
NaiveBayes
Determine the class using Bayes’s theorem and assuming that features are independent given the class.
NeuralNetwork
Model class probabilities or predict the value distribution using a neural network.
The Task of Regression
The Task of Regression
Compute a target value for a sample:
The training data contains samples with recorded values:
◼
Infer a function from the data—mapping from the feature values to the numeric target
◼
Given a new data point, use the regression function to compute a target value optimal for the given features
Example of Sales Predictor
Example of Sales Predictor
Import a dataset with data about customer purchases:
Use the model to predict the amount spent by a new customer:
Use the model to predict the most likely spending by location:
Well known algorithms available for regression
Well known algorithms available for regression
Take Away
Take Away
Terminology
Terminology
◼
Machine learning
◼
Vector space
◼
1 dimension
◼
2 dimensions
◼
3 dimensions
◼
n dimensions
◼
Features
◼
Nearness
◼
Supervised learning
◼
Labels
◼
Classification
◼
Regression
◼
Unsupervised learning
◼
Clustering
Concepts
Concepts
◼
Examples of machine learning
◼
Identifying language from example text
◼
Identifying contents in an image
◼
Identifying sentiment from example text
◼
Answering questions of the type “Is this A or B (or C or D or E)?”
◼
Features/identifying properties of a sample
◼
Any type of data can be represented by numeric features
◼
Computers recognize patterns without being explicitly programmed
◼
Machine learning is a way to now program with data instead of writing explicit code
◼
“Supervised Learning”needs labeled examples
◼
Classification can answer questions of the type: Is this A or B (or A or B or C or D or E?
◼
Dog or cat
◼
Day or night
◼
Sneaker or boot
◼
Regression can answer questions of the type: How much or how many?
◼
Cost of a home
◼
Score representing quality of wine
◼
Price of stock
◼
“Unsupervised Learning” can be done when the data does not have labels.
◼
Clustering can answer questions like:
◼
How is the data organized?
◼
Do the samples separate into groups of some kind?
◼
Are there samples that are very different from most of the group (outliers)?
◼
The goal of clustering is to partition a dataset into clusters of similar elements
◼
When you don’t have labeled data, you can use clustering to separate samples into groups and then use the group-membership as pseudo-labels to classify more data.