Lab 6: Machine Learning

NetID:

Link to published notebook:

In this lab, we will try some examples of Machine Learning.

Part 1: Classification (Supervised Learning)

Sneaker or Boot

In this exercise you will use the “FashionMNIST” dataset from the Wolfram Data Repository to train a classifier to distinguish a sneaker from a boot.

Download the training dataset:

In[]:=

trainingData=ResourceData["FashionMNIST","TrainingData"];

Download the test dataset:

In[]:=

testData=ResourceData["FashionMNIST","TestData"];

Check the number of samples in each dataset:

In[]:=

Length[trainingData]

In[]:=

Length[testData]

The dataset has images labeled with one of the following 10 labels:

In[]:=

ResourceData["FashionMNIST","ClassLabels"]

The following piece of code selects training and test examples labeled as
7 "Sneaker" or 9 "Ankle boot"

In[]:=

selectedTrainingData=Select[trainingData,MatchQ[Values@#,7|9]&];

In[]:=

selectedTestData=Select[testData,MatchQ[Values@#,7|9]&];

Check the number of samples in the filtered datasets:

In[]:=

Length[selectedTrainingData]

In[]:=

Length[selectedTestData]

Look at some random samples from the selected training set:

In[]:=

RandomSample[selectedTrainingData,5]

Remember the example from the lecture

Train the classifier:

In[]:=

daynight=Classify

"Night",

"Day",

"Night",

"Day",

"Night",

"Day",

"Night",

"Day",

"Night",

"Day",

"Night",

"Day",

"Night",

"Day",

"Night",

"Day",

"Night",

"Day"

Out[]=

ClassifierFunction

Input type: Image

Classes: Day,Night



Use the classifier on test samples:

In[]:=

daynight



Out[]=

{Day,Night,Night,Night}

Problem 1:

Build a classifier using the selected training data set:

(*writeyourcodehere*)

yourClassifier=

Pull out ten random images from the selected test data set:

In[]:=

Keys[RandomSample[selectedTestData,10]]

Use your classifier on the images to see if they are identified as “sneaker” or “boot”:

(*writeyourcodehere*)

You can evaluate the performance of your classifier against the entire test data set (edit the following piece of code):

In[]:=

ClassifierMeasurements[yourClassifier,selectedTestData]

Comment on the performance of your classifier.

◼

Is it doing well or is it pretty bad at classifying sneakers and boots? What information are you using to come to this conclusion?

◼

What method/algorithm did Classify use for this problem?

Answer

Part 2: Clustering (Unsupervised Learning)

In this part, we will use Spotify’s “Audio Features” data to map different songs as points on a 2D feature space. The data has been taken from three Spotify playlists with different genres: classical, electronic and rap.

Import the data:

In[]:=

audioFeaturesRawData=Import["https://www.wolframcloud.com/obj/abritac/Published/ECE101/spotify_data.csv"];

Let’s look at the size of the data:

In[]:=

Dimensions[audioFeaturesRawData]

This means the dataset has 201 rows and 23 columns.

Let’s look at the first 2 rows:

In[]:=

audioFeaturesRawData[[1;;2]]

The first row shows the headers for each column:

In[]:=

columnHeaders=audioFeaturesRawData[[1]]

How many headers are there?

In[]:=

Length[columnHeaders]

We will use only some of the 23 columns or features of the dataset. The following features seem useful:

◼

(2) danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

◼

(3) energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

◼

(5) loudness: The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks.

◼

(7) speechiness: Speechiness detects the presence of spoken words in a track.

◼

(8) acousticness: A measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

◼

(9) instrumentalness: Whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context.

◼

(10) liveness: Detects the presence of an audience in the recording.

◼

(11) valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track.

◼

(12) tempo: The pace of music, measured in beats per second.

◼

(18) duration_ms: The duration of the track in milliseconds.

◼

(19) time_signature: An estimated overall time signature of a track. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure).

Let’s pull out the headers that might be able to characterize the track well (just the headers here no the actual sample values):

In[]:=

usefulHeaders=columnHeaders[[{2,3,5,7,8,9,10,11,12,18,19}]]

Let's pull out the sample values for these features for each song/track:

In[]:=

dataFeatures=audioFeaturesRawData[[2;;,{2,3,5,7,8,9,10,11,12,18,19}]];

This is what the data now looks like:

In[]:=

dataFeatures[[1;;5]]

Each song is now represented by a number of score for each of the features we have selected.

Additionally there is some other information that will be useful in labeling the tracks in a visualization:

◼

(20) song: name of the song/track

◼

(21) artist: name of the artist

Let’s use the name of the track for labeling the sample:

In[]:=

labelingInfo=audioFeaturesRawData[[2;;,20]];

Remember these are the feature columns in our dataset now:

In[]:=

usefulHeaders

Out[]=

{danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature}

“Danceability” is the first feature, “loudness” is the third feature.

Visualize the samples in only these two dimensions:

In[]:=

ListPlot[dataFeatures[[All,{1,3}]],AxesLabel->{usefulHeaders[[1]],usefulHeaders[[3]]}]

Let’s use the labeling information to identify each point:

In[]:=

labeledData=MapThread[Callout[#1,#2]&,{dataFeatures[[All,{1,3}]],labelingInfo}];ListPlot[labeledData,AxesLabel->{usefulHeaders[[1]],usefulHeaders[[3]]}]

Let’s recreate the plot for “danceability” (feature 1) and “speechiness” (feature 4):

Problem 2

Here is a more interactive visualization of the audio features:

Change the x & y axis so that it becomes easy to differentiate classical music from the rest. Which two features did you plot to achieve that?

Answer

Problem 3

A Spotify user often listens to music with 0.5 energy and 0.2 danceability. Name a track that should be recommended to this user.

Answer

Problem 4

Here is two-dimensional visualization of all the songs that was created using information from all 9 of our selected features. (Hover over the data point to see the name of the song.)

What do the two clusters represent?
If someone recently listened to Beethoven’s Sonata No. 14 “Moonlight” in C-Sharp Minor, which of the above clusters would be a better pool for recommending new songs to them?

Answer

Problem 5

This is the “Flight of the Bumblebee” track:

Here are some other tracks:

The FeatureNearest function can find the member of a list, that is nearest to a given sample. For example:

Find the track from the list “tracks” that is “nearest” to “bumblebee”:

Submitting your work

Publish your notebook

From the cloud notebook, click on “Publish” at the top right corner.

From the desktop notebook, use the menu option File -> Publish to Cloud

Copy the published link

Add it to the top of the notebook, below your netID

Print to PDF

Upload to Gradescope

Just to be sure, maybe ping your TA Sattwik on Slack that you have submitted.