This project focuses on addressing the essential question: “To what extend does varying the complexity of neural networks affect their efficiency in learning mathematical functions, and how can these changes be quantitatively measured?” The intent is to investigate the relationship between the complexity of neural networks and their capability to learn mathematical functions, while simultaneously creating an effective methodology for quantitative assessment of these variations.
Keywords:
Neural Networks: a model built with interconnected nodes (or neurons), which work together to process information. Neural Networks learn from input data by adjusting strengths of these connections. Activation function: a mathematical calculation that introduces non-linearity into the network’s learning process, enables the network to learn complex data. Loss: refers to a function that calculates the difference between the model’s prediction and the actual input from the given data. Represents the precision of the neural network in learning the inputs
Defining the first criteria to analyze complexity: The amount of neurons in a neural network
Objective in this section: to obtain the appropriate domain of numbers of neurons that could be applied to the neural network model
Model for training - Varying the number of neurons:
Choosing one function to generate data for training
Here, a list of training data is being generated from the function Exp[-Norm[{x,y}]] in the given range [-3,x,3], [-3,y,3]. These data are then being applied to neural networks with different numbers of neurons
Varying complexities of neural network by changing the number of neurons, n as the variable:
net=NetChain[{2^n,Tanh,1}]
The amount of neurons in the neural network will increment by a factor of 2 every time the network is trained.
An example of training a neural network, using 2^5, 32 neurons, in 3 layers:
There are many properties of a neural net that could generate useful data to analyze Efficiency vs Complexity (For example, TotalTrainingTime), however, we will only be extracting the final average loss among all these properties to compare the performance.
Training Data with different amount of neurons:
Generating training data, and RoundLossList across 2^0 - 2^10 numbers of neurons. The purpose of this section is to generate enough data for a primary analysis on the proper domain that the training should be limited to. Neural networks with certain complexities that fail to provide a proper learning precision would be excluded from future trainings.
Properties with 1(2^0) starting neuron
Initializing the net, graphing the LossPlot, and extracting the RoundLossList
By observing the data, neural networks with 2^0 to 2^3 starting neurons appears to be insufficient enough to train the neural network, they are the “outliers” data points that are not helpful in finding the optimized complexity for best performance. Therefore, properties of 2^0 to 2^3 will be eliminated in the following analysis. Vice versa, neural network with 2^10 neurons displays a significant amount of noises thus 2^10 will also be excluded from future training.
Conclusion:
Puling out a list of the final average loss of each function.
Graphing the LossList across different numbers of neurons
ListLinePlotlossData,
Out[]=
The graph of final average loss vs 2^n numbers of neurons provide a direct visualization of a function roughly the shape of a parabola. With this graph, we can conclude that neither too little or too much neurons in a neural network would improve the accuracy; as well, the parabolic shape indicates that there will be an “optimized” amount of neurons in a neural network that would provide the highest precision, which could be a representation of the complexity of the math function
Defining the second criteria to analyze complexity: The number of layers
Objective in this section: to obtain the appropriate domain of numbers of layers that could be applied to the neural network model
Model for training - Varying the number of layers:
Choosing one function to generate data for training
A list of training data is being generated from the function -Log[x^2+1] in the given range [-3,x,3], [-3,y,3]. These data are then being applied to neural networks with different number of layers
Here, we are training an example neural network that has 5 layers in total, with 3 linear layers having 8 neurons each. However, even though the number of layers varies, the order of the layers follows a pattern: a linear layer with multiple neurons + activation function tanh + n linear layers with multiple neurons + a linear layer with one neuron.
Pulling out the data from the example multilayer network
In[]:=
MultiLayerTrained["LossPlot"]
Out[]=
rounds
loss
validation
training
In[]:=
MultiLayerTrained["RoundLossList"]
Out[]=
{0.0163135,0.0000561487}
Training Data with different amount of layers:
This subsection is intended to generate enough data for a primary analysis on the proper domain that the training should be limited to regarding the number of layers. Neural networks that produce high precision will be kept, networks that produces insufficient precision would be excluded from future trainings.
Properties with 3 layers
Initializing the net, graphing the LossPlot, and extracting the RoundLossList
Neural networks with 3 to 8 layers all produced a decent loss. Therefore the domain of layers would be set to 3-8 layers, such that it could accommodate a wider range of functions input.
Conclusion:
Puling out a list of the final average loss of each function:
Graphing the LossList across different numbers of layers:
ListLinePlotlayerData,
Out[]=
The graph of final average loss vs n number of layers also displays a function roughly the shape of a parabola. With this graph, we can make similar conclusion that we got from the neurons chapter: neither too little or too much layers in a neural network would improve the accuracy; as well, there will be an "optimized" amount of layers in a neural network that would provide the highest precision, which could be a representation of the complexity of the math function.
Abstracting Algorithm
Objective in this section: to abstract previous code & to make one function that could iterate through all neural network and output the most optimized points
Generating Data:
Making one function that takes in a math formula and generate a list of data
Calculating the optimized point according to the lowest loss
In[]:=
opPoint=TakeSmallestBy[l,Last,1]
Out[]=
{{7,1,0.0000704327}}
The opPoint is constructed by {number of neurons 2^n, number of layers n, loss}. This data structure allows accessible comparison across multiple functions.
By calling trainNeuralNet function, all the abstracted functions would be called, with the most optimized point in the format {neurons, layers, loss, function} being returned.
Analysis
Defining the Evaluation matrix
We are creating an evaluation matrix to plot the most optimized neural net for math functions, while looking for patterns in the points plotted The matrix has 2 axis, its x-axis being the number of neurons, and its y-axis being the number of layers.
Setting the domain and range of the evaluation matrix:
ListPlot{{4,1},{9,6}},
Out[]=
The domain of the number of neurons includes 2^4 to 2^9 neurons, the range of the layers includes 1 to 6 layers in addition to 1 Tanh activation layer and 1 linear layer. The evaluation matrix displays neural networks of different complexity. An example {4,3} means that the neural network has 2^4 16 neurons and 3+2 5 layers.
Optimized neural net complexity with different group of math functions
Calculating the most optimized net for some example math functions, these functions cover a wide range of mathematical operation, with varied coefficients.
Trigonometric functions
Calling trainNeuralNet with trig functions
In[]:=
trigSin=trainNeuralNet[Sin[x]]
Out[]=
{4,3,0.000156019,Sin[x]}
In[]:=
trig3Sin=trainNeuralNet[3Sin[x]]
Out[]=
{4,3,0.000853865,3Sin[x]}
Polynomials
Calling trainNeuralNet with polynomials functions
In[]:=
polyx2=trainNeuralNet[x^2]
Out[]=
{5,2,0.00175294,
2
x
}
In[]:=
poly12x2=trainNeuralNet[12x^2]
Out[]=
{4,3,0.134422,12
2
x
}
Logarithmic functions
Calling trainNeuralNet with Log functions
In[]:=
log10=trainNeuralNet[Log[10,(x^2)+1]]
Out[]=
7,1,0.000018131,
Log[1+
2
x
]
Log[10]
In[]:=
log2=trainNeuralNet[Log[2,(x^2)+1]]
Out[]=
7,1,0.0000551542,
Log[1+
2
x
]
Log[2]
Exponential functions
Calling trainNeuralNet with exponents functions
In[]:=
exp2x=trainNeuralNet[2^x]
Out[]=
{8,1,0.00140252,
x
2
}
expSinx=trainNeuralNet[E^Sin[x]]
Out[]=
8,1,0.000312915,
Sin[x]
Observing patterns from the data
Plotting out the data points on the evaluation matrix
There are a few neural net complexities {neurons, layers} where the optimized point of math function clusters: {4,3}, {7,1}, {8,1} But how are they related to how the neural net learn and adapt to each mathematical function? We will graph the activation function Tanh[x], as well as the functions that have similar optimized neural net complexities.
Functions that cluster at {4,3} doesn’t change concavity, or if any, concavity is only changed at 0. Coefficient also has no effect on how a neural network learns, as sin[x] and 3sin[x] share the same optimized net.
Neural net cluster at {7,1}
Plotting the functions that clusters around {7, 1}
Functions that cluster at {8, 1} have no intersection with Tanh[x] in the domain[x,-3,3], more adjustments are required be made for Tanh[x] in order to model these functions.
Result:
The complexity variation of neural networks are dependent on the shape of math functions, independent on the coefficient and the scale of the function .
Conclusion:
Summary:
In conclusion, we’ve developed a new approach to understand the complexity of a mathematical function, by uniquely representing each math function with an optimized neural network. We’ve established a correlation between the number of neurons, number of layers of a neural network, and the complexity of mathematical functions; and created an evaluation matrix, which stands as a helpful tool to quantitatively measure the complexity of a mathematical function.
Limitations:
◼
The activation function is for all trained neural network is Tanh, parabolic tangent function
◼
The function needs to be defined over the entire domain, otherwise there might be error, as the structure of the training data is constructed as {x,y} -> value
◼
The neuron and layer values are limited to integers.
◼
The neural network learning models were only trained for 2 rounds
Future Works:
◼
To relate the mathematical principles of backpropagation with the patterns that are displayed between mathematical functions and complexities of optimized neural net models
◼
Interpolate data points into mathematical functions, then use the functions to find more precise value for optimization points.
◼
Run the algorithm with more mathematical functions, obtain the optimized points and plot them into the evaluation matrix, for a more accurate, intuitive graph.
◼
Run the algorithm with different activation functions to observe change in the output.
Acknowledgements:
I wish to express my deepest gratitude to my mentor Eric Rimbey, who provided me insightful guidance and constructive criticisms throughout the research. Special thanks to Nicolo Monti for his valuable suggestions during this project, and to Wolfram Summer Research Program for providing me this opportunity to explore in the field of machine learning.