https://math.stackexchange.com/questions/4648662/minimizing-frac-operatornametrh22-operatornametrh3-operatorname

In[]:=

CompoundExpression[

]

deploy

Thu 9 Mar 2023 16:15:19

Single step of Gradient Descent

What quadratic results in least progress in terms of loss reduction for gradient descent?https://math.stackexchange.com/questions/4648662/minimizing-frac-operatornametrh22-operatornametrh3-operatornameTLDR; efficiency drops as

1

d

, greedy step size is asymptotically 3x more efficient than LipshitzFor single step error drop, see: https://www.wolframcloud.com/obj/yaroslavvb/nn-linear/forum-step1-minimizer.nb

Util

In[]:=

rsort[l_]:=Reverse[Sort[l]];traceNormalize[h_]:=

h

Total[h]

;nf[x_]:=NumberForm[N[x],{3,3}];SF=StringForm;On[Assert];

Greedy step size

Lipschitz drop:

1

||H||

Tr(

2

H

)

Greedy drop:

Tr(

2

H

)

Tr(

3

H

)Tr(H)

Tr(

2

H

)

Numeric minimization

In[]:=

Clear[h];d0=10;Block{d=d0},d=10;hvec=Array[h,d];obj=

2

Tr[hvec*hvec]

Tr[hvec*hvec*hvec]Tr[hvec]

;poscons=hvec>0;sol=NMinimize[{obj,hvec>0},hvec];hh=rsort@traceNormalize[hvec/.Last@sol];Print["Largest eval: ",nf@Max[hh],", other eval: ",nf@Min[hh]];ListPlot[hh,PlotRange->All,Filling->Axis]

Largest eval: 0.430, other eval: 0.063

Out[]=

Nicer syntax but less robust

In[]:=

sol=NMinimize

2

Tr[c*c]

Tr[c*c*c]

,c∈Simplex@IdentityMatrix[d0];Print["Δloss: ",nf@First@sol];Print[SF["d=``, h=``",d0,rsort[nf/@(c/.Last@sol)]]]

Δloss: 0.597

d=10, h={0.430,0.063,0.063,0.063,0.063,0.063,0.063,0.063,0.063,0.063}

user1551 solution

https://math.stackexchange.com/a/4648859/998

In[]:=

Clear[d,lambda2,xd,xk,lossDrop,worstLossDrop];d0=10;lambda2=1+2

d

Cos

1

3

ArcCos

1

d

;xd0=

lambda2

lambda2+d-1

;xk0=

1

lambda2+d-1

;lossDrop=

2

(

2

xd

+(d-1)

2

xk

)

(

3

xd

+(d-1)

3

xk

)

;worstLossDrop=lossDrop/.{xd->xd0,xk->xk0};Block[{d=d0},Print["Eigenvalues normalized: ",xd+(d-1)xk==1/.Thread[{xd,xk}->{xd0,xk0}]];Print[SF["Top eval for d=`` is ``",d,nf[xd0]]];Print[SF["Other eval for d=`` is ``",d,nf[xk0]]];Print[SF["Δloss for d=`` is ``",d,nf[worstLossDrop]]];];Plot[xk0,{d,1,100000}]Print["Top eval: ",xd0//FullSimplify];Print["Other eval: ",xk0//FullSimplify];xda=Asymptotic[xd0,{d,∞,1}];xka=Asymptotic[xk0,{d,∞,1}];Print["Asymptotic top eval: ",xda];Print["Asymptotic other eval: ",xka];asymptoticWorstLossDrop=lossDrop/.{xd->xda,xk->xka}//Simplify;asymptoticWorstLossDrop=Asymptotic[asymptoticWorstLossDrop,{d,∞,1}];Print["Asymptotic loss drop: ",asymptoticWorstLossDrop];Plot[{worstLossDrop,asymptoticWorstLossDrop},{d,1,1000},PlotLegends->{"Δloss","Δloss asympt"}]

Eigenvalues normalized: True

Top eval for d=10 is 0.430

Other eval for d=10 is 0.063

Δloss for d=10 is 0.597

Out[]=

Top eval: 1+

1-d

d+2

d

Cos

ArcSec

d



3



Other eval:

1

d+2

d

Cos

ArcSec

d



3



Lipschitz step size

Ryszard solution

https://math.stackexchange.com/questions/4652340/minimizing-ratio-of-l2-norm-and-linf-norm

Single step of Gradient Descent

Util

Greedy step size

Numeric minimization

Nicer syntax but less robust

user1551 solution

Lipschitz step size

Ryszard solution

Greedy vs Lipschitz

Related