In[]:=
deploy
Wed 14 Aug 2019 21:53:40
Init
Init
Batch - size quantities
Batch - size quantities
In[]:=
SeedRandom[0];Clear[X,Y,w0];d=2;n=1000;yvar=1;{X,Y,w0}=generateXY[0.01,yvar,d,n];ListPlot@Transpose@X
Out[]=
Notation:d dimension, n inputsX: matrix of input examples as columns, for consistency with linear regression literature, shape (d,n)Y: output labels as columns,(1,n)J: Jacobian of output function that works on a batch of examples. Rows correspond to inputs, hence i in Ji. gives sensitivity of input i to parameter j, shape (n, d)G: gradients matrix, gradients as rows (to match shape of w)e: row of residuals, shape (1, n)w: weights vector, (1, d)g: true gradient, (1, d)Σ: gradient covariance, GG, (d, d)H: Hessian matrix, (d, d)hi: hessian of loss with respect to output for example iL: loss, average of losses across all examples
Ji
i,j
1
n
Clear[update];update[w_]:=Module{},(* J: Jacobian of output function tahat works on batch of examples. . gives sensitivity of example i to parameter j, shape (n,d) *)J=X;(* e: row of residuals, to match shape of X, (1, n) *)e=w.X-Y;(* G: gradients matrix, gradients as rows (to match shape of w) *)G=DiagonalMatrix[Flatten@e].J;(* g: g: true gradient, (1, d) *)g=rowsum[G]n;(* Σ: noise covariance (Jain), uncentered gradient expected second moment *)Σ=G.Gn;(* : centered gradient noise covariance (OpenAI) *)=Σ-g.g;(* H: Hessian of expected loss *)H=J.Jn;(* Loss averaged over all examples *)loss=;(* Excess loss compared to minimizer (from Newton decrement) *)excess=//toscalar;(* optimal step-size, from OpenAI paper *)zeroG=(Norm[g]0);zeroS=(Norm[Σ]0);stepOpenAI=IfzeroG,∞,//toscalar;batchOpenAI=IfzeroG,∞,//toscalar;(* gradient diversity, Berkeley paper *)diversity=IfzeroG,∞,;(* noise variance v from Jain, minimax error goes to zero at this rate *)noiseVariance=Tr[Inverse[H].Σ];(* Measure of misfit, from Jain *)rho=IfzeroS,1,;(* Divergent learning rate for batch=1, Jain *)stepMin=;(* Divergent learning rate for batch=∞, Jain *)stepMax=;(* Divergent learning rate for batch=1, adjusted for misspecification*)stepMinAdjusted=stepMinrho;(* critical batch sizes: regular and adjusted for misspecification *)batchJain=1+rank[H];batchJainAdjusted=1+rank[H]rho;
J
i,j
Σ
c
Σ
c
1
n
toscalar[e.e]
2n
g.Inverse[H].g
2
2
Norm[g]
g.H.g
Tr[H.]
Σ
c
g.H.g
2
Norm[G,"Frobenius"]
2
Norm[g,2]
d
rank[Inverse[H].Σ]
2
Tr[H]
2
Norm[H]
In[]:=
Σ
c
Out[]=
{{0.,0.},{0.,0.}}
In[]:=
update[w0]
Sanity checks
Sanity checks
In[]:=
Clear[X,Y,w0];d=2;n=1000;yvar=1;{X,Y,w0}=generateXY[0.01,yvar,d,n];ListPlot@Transpose@X
Out[]=
In[]:=
update[w0];{loss0,excess0}={loss,excess};(*Takeanewtonstep*)update[w0-g.Inverse[H]];(*makesureconverged*)Print[Norm[g]];Print[Norm[excess]];(*checkthatnewtondecrementcalculationwascorrect*)Print[Norm[loss0-loss-excess0]];
Unit test
Unit test