Wolfram Cloud Page

What generic regularities exist in trained vs. untrained networks?

What is the initialized distribution? Generally, the tails of the weight distribution get longer during training...

In[]:=

piecewise=Piecewise[{{0,#<-2},{1,#<-1},{0,#<0},{1,#<2}}]&;SeedRandom[23049829048];trainingx=RandomReal[{-3,3},5000];trainingy=piecewise/@trainingx;

In[]:=

(*===TrainingFunction===*)TrainPiecewise[net_,seed_Integer]:=Module[{res,checkpoints={},initialNet},initialNet=NetInitialize[net,RandomSeeding->seed];res=NetTrain[initialNet,trainingx->trainingy,All,TrainingProgressFunction->{(AppendTo[checkpoints,#Net]&),"Interval"->Quantity[10,"Rounds"]},MaxTrainingRounds->15000,BatchSize->5000,RandomSeeding->seed];{res,Prepend[checkpoints,initialNet]}]

In[]:=

numInstances=5;architecture={8,8,8};

In[]:=

RRnet[lws_List]:=NetChain[Join[Riffle[LinearLayer/@lws,Ramp],{Ramp,LinearLayer[]}],"Input"->"Real","Output"->"Real"]

In[]:=

allResults=Table[(Echo["Training instance ",i];TrainPiecewise[RRnet[architecture],123456*i]),{i,numInstances}];

1Training instance

2Training instance

3Training instance

4Training instance

5Training instance

In[]:=

allResults

Out[]=

$Aborted

In[]:=

allCheckpoints=allResults[[All,2]];FlattenWeights[net_]:=Join@@Table[With[{layer=NetExtract[net,i]},If[Head[layer]===LinearLayer,Join[Flatten[Normal[NetExtract[layer,"Weights"]]],Flatten[Normal[NetExtract[layer,"Biases"]]]],{}]],{i,Length[net]}]allWeights=Map[FlattenWeights,allCheckpoints,{2}];Print["Weight dimensions: ",Dimensions[allWeights]];allWeightsFlat=Flatten[allWeights,1];

Weight dimensions: {5,1501,169}

In[]:=

Histogram/@Take[allWeights[[1]],{1,-1,100}]

Out[]=





In[]:=

ListDensityPlot[BinCounts[#,{-5,5,.25}]&/@allWeights[[1]]]

Out[]=

In[]:=

ListPlot3D[BinCounts[#,{-5,5,.25}]&/@allWeights[[1]]]

Out[]=

In[]:=

ListPlot3D[BinCounts[#,{-5,5,.25}]&/@allWeights[[1]],PlotRange->All,ScalingFunctions->{Identity,Identity,"Log"}]

Out[]=

In[]:=

ListLinePlot[Variance/@allWeights[[1]]]

Out[]=

In[]:=

ListLinePlot[(Variance/@#)&/@allWeights]

Out[]=

General claim: as the network trains, the distribution of weight values broadens

Microscopically, we could study correlations between weight values... (e.g. how correlated are the weights across the network; could be that correlations are exponentially damped.... or maybe not)

Can look at this as a function of network depth...

More ...

Investigations

Look at MLPs of different widths and depths

Look at larger single networks, vs. ensembles of networks

◼

Distribution of all weights as a function of training

◼

Distribution of weights per layer as a function of training

We also want to look at individual weights vs. training rounds.....

Should also make this plot normalizing by total norm of the weights....

Take grid networks... Can one visually see training?

◼

How do these results change if one changes the target function?

Richard observes that adversarially changing a few weights won’t affect results, usually....

What generic regularities exist in trained vs. untrained networks?

General claim: as the network trains, the distribution of weight values broadens

More ...

Investigations

Claim: as training progresses the rank of the weight matrices decreases (?)