Digit Classification
Digit Classification
Load Data
Load Data
res=ResourceObject["MNIST"]trainingData=ResourceData[res,"TrainingData"];testData=ResourceData[res,"TestData"];RandomSample[trainingData,10]
In[]:=
ResourceObject
Out[]=
Out[]=
Data Preprocessing
Data Preprocessing
Our preprocessing is mostly here to reduce network size by eliminating insignificant features from the source images. The Crop+Resize combination allows us to force a digit to take the entire space of the 28x28 rectangle, effectively reducing noise in the network.
A notable omission is thinning: even through we believe it to have tremendous impact on network’s performance, no appropriate thinning algorithm was found (e.g. Mathematica’s standard Thinning tends to break curves and narrow hinges of 3 and 8, turning them into mess hard to identify even for humans).
A notable omission is thinning: even through we believe it to have tremendous impact on network’s performance, no appropriate thinning algorithm was found (e.g. Mathematica’s standard Thinning tends to break curves and narrow hinges of 3 and 8, turning them into mess hard to identify even for humans).
preprocess[img_]:=(i=Image[img];ImagePad[ImageResize[ImageCrop[Binarize[i,FindThreshold[i,Method"Cluster"]]],{28-3*2,28-3*2},Resampling"Cubic"],3,1]);
In[]:=
preprocessKeys[dataset_]:=Module[{preprocessRuleKey},preprocessRuleKey[rule_]:=Module[{k,v},k=Keys[rule];v=Values[rule];preprocess[k]v];ParallelMap[preprocessRuleKey,dataset]];trainingData=preprocessKeys[trainingData];testData=preprocessKeys[testData];RandomSample[trainingData,10]
In[]:=
Out[]=
Construct Network
Construct Network
We use an even kernel of 4x4 in the first layer. It should theoretically “distort” the picture: lack of the central element gives every pixel an equally significant weight, and makes it harder for the network to learn identity function (something we don’t want to happen in such a shallow network).
The 9x9 kernels of the later layers have proven to detect features well enough. Smaller kernels tend to “miss” certain features (i.e. misinterpret 2 as 3).
Batch normalisation is used instead of dropout for conv layers.
The 9x9 kernels of the later layers have proven to detect features well enough. Smaller kernels tend to “miss” certain features (i.e. misinterpret 2 as 3).
Batch normalisation is used instead of dropout for conv layers.
net=NetChain[{BatchNormalizationLayer[],ConvolutionLayer[32,4],BatchNormalizationLayer[],Ramp,ConvolutionLayer[64,9],BatchNormalizationLayer[],Ramp,ConvolutionLayer[64,9],BatchNormalizationLayer[],Ramp,PoolingLayer[2],FlattenLayer[],500,Ramp,DropoutLayer[0.4],10,SoftmaxLayer[]},"Output"NetDecoder[{"Class",Range[0,9]}],"Input"->NetEncoder[{"Image",{28,28},"Grayscale"}]]
NetChain
Out[]=
Train Network
Train Network
We use ADAM for research, but SGD is our final model verification tool. There was evidence SGD allows for better generalisation (i.e. improves variance).SGD is known to degrade on large batch sizes, 64 has proven itself to be a good trade-off.
Debug Network
Debug Network
A step-by-step layer visualisation helps to avoid ridiculous mistakes like broken scaling in NetEncoder.
Test Network
Test Network
We are using our own set of hand-drawn digits for final testing. Certain irregularities can be spotted (i.e. the second “3” and the last “6”), but their problem seems to be with line thickness. As our final app enforces line thickness, it shouldn’t be a real issue. There’s still room for improvement though. Perhaps line thinning could help. Some data augmentation would also be of help, as the network seems to struggle to maintain bias/variance balance despite regularisation.