Wolfram Cloud Page

1011 / Sin / Sinc

In[]:=

piecewise=Piecewise[{{0,#<-2},{1,#<-1},{0,#<0},{1,#<2}}]&;

In[]:=

Plot[{piecewise[x],Sin[x]},{x,-6,6},Exclusions->None]

Out[]=

In[]:=

SeedRandom[23049829048];trainingx=RandomReal[{-3,3},5000];

In[]:=

trainingy=piecewise/@trainingx;

In[]:=

trainingySin=Sin/@trainingx;

In[]:=

SeedRandom[40985093850948];trainingxSinc=RandomReal[{-32,32},5000];

In[]:=

trainingySinc=Sinc/@trainingxSinc;

In[]:=

ClearAll[TrainPiecewise];TrainPiecewise[net_,x_,y_,seed_Integer:182731,opts___]:=Module[{res,checkpoints={},initNet},initNet=NetInitialize[net,RandomSeeding->seed];res=NetTrain[initNet,x->y,All,opts,TrainingProgressFunction{Function[AppendTo[checkpoints,#Net];],"Interval"Quantity[10,"Rounds"]},MaxTrainingRounds->15000,BatchSize->5000];{res,Prepend[checkpoints,initNet]}]

In[]:=

RRnet[lws_List,activationFn_:Ramp,opts___]:=NetChain[Join[Riffle[LinearLayer/@lws,activationFn],{activationFn,LinearLayer[]}],"Input"->"Real","Output"->"Real"]

1011

In[]:=

experiment=TrainPiecewise[RRnet[{5,5,5,5}],trainingx,trainingy,182731,TargetDevice->"GPU"];

In[]:=

ListLinePlot[First[experiment]["RoundLossList"][[;;;;10]],ScalingFunctions->{"Log","Log"},Frame->True,AspectRatio->1/3]

Out[]=

In[]:=

With[{net=experiment[[-1,-10]]},ListLinePlot[Table[net[x],{x,-3,3,1/16}]]]

Out[]=

In[]:=

experiment2=TrainPiecewise[RRnet[{8,8,8}],trainingx,trainingy,TargetDevice->"GPU"];

In[]:=

ListLinePlot[First[experiment2]["RoundLossList"][[;;;;10]],ScalingFunctions->{"Log","Log"},Frame->True,AspectRatio->1/3]

Out[]=

In[]:=

With[{net=experiment2[[-1,-1]]},ListLinePlot[Table[net[x],{x,-3,3,1/16}]]]

Out[]=

Sin[x]

In[]:=

experimentSin=TrainPiecewise[RRnet[{5,5,5,5}],trainingx,trainingySin,TargetDevice->"GPU"];

In[]:=

ListLinePlot[First[experimentSin]["RoundLossList"][[;;;;10]],ScalingFunctions->{"Log","Log"},Frame->True,AspectRatio->1/3]

Out[]=

In[]:=

With[{net=experimentSin[[-1,-1]]},ListLinePlot[Table[net[x],{x,-3,3,1/16}]]]

Out[]=

In[]:=

experimentSin2=TrainPiecewise[RRnet[{8,8,8}],trainingx,trainingySin,TargetDevice->"GPU"];

In[]:=

ListLinePlot[First[experimentSin2]["RoundLossList"][[;;;;10]],ScalingFunctions->{"Log","Log"},Frame->True,AspectRatio->1/3]

Out[]=

In[]:=

With[{net=experimentSin2[[-1,-1]]},ListLinePlot[Table[net[x],{x,-3,3,1/128}]]]

Sinc[x]

“Should also make this plot normalizing by total norm of the weights....”

GZip complexity

Networks become less compressible throughout training, indicating an increase in information content.

WeightGzipComplexity[] - Compression ratio of weights serialized as Real64 then gzipped. Lower = more compressible = more structure.

Weight histograms

3D surface of weight value histograms over training.
X-axis: weight value, Y-axis: checkpoint, Z-axis: bin count.

Density plot of all weight values (all layers combined) over training.
X-axis: weight value, Y-axis: checkpoint, colour: bin count.

Density plot of weight value distributions per layer over training.
X-axis: weight value, Y-axis: checkpoint, colour: bin count.

Claim: as training progresses the rank of the weight matrices decreases (?)

Heat map of singular value magnitudes per layer over training. X-axis: SV index, Y-axis: checkpoint, colour: magnitude.

Effective rank via exponential entropy of normalized singular values. 1 = all energy in one SV, n = energy spread equally across n SVs.

MNIST (a more “information dense” problem)

Loss / Error curves

Effective Rank

Weight density

GZip complexity

Correlations

“Microscopically, we could study correlations between weight values... (e.g. how correlated are the weights across the network; could be that correlations are exponentially damped.... or maybe not)”

Weight change correlation vs layer distance over training.
For each checkpoint, computes how each layer’s weights changed from initialization (ΔW).
For each pair of layers, correlates the QQ-aligned magnitude distributions of their ΔW.
Plots mean correlation as a function of layer distance (x) and checkpoint (y).
Tests whether co-adaptation between layers decays with depth separation.