https://math.stackexchange.com/questions/4648662/minimizing-frac-operatornametrh22-operatornametrh3-operatorname
​
In[]:=
CompoundExpression[
]
​​deploy
Thu 9 Mar 2023 16:15:19

Single step of Gradient Descent

What quadratic results in least progress in terms of loss reduction for gradient descent?​https://math.stackexchange.com/questions/4648662/minimizing-frac-operatornametrh22-operatornametrh3-operatorname​​TLDR; efficiency drops as
1
d
, greedy step size is asymptotically 3x more efficient than Lipshitz​​For single step error drop, see: https://www.wolframcloud.com/obj/yaroslavvb/nn-linear/forum-step1-minimizer.nb

Util

In[]:=
rsort[l_]:=Reverse[Sort[l]];​​traceNormalize[h_]:=
h
Total[h]
;​​nf[x_]:=NumberForm[N[x],{3,3}];​​SF=StringForm;​​On[Assert];

Greedy step size

Lipschitz drop:
1
||H||
Tr(
2
H
)
​Greedy drop:
Tr(
2
H
)
Tr(
3
H
)Tr(H)
Tr(
2
H
)

Numeric minimization

In[]:=
Clear[h];​​d0=10;​​​​Block{d=d0},​​d=10;​​hvec=Array[h,d];​​obj=
2
Tr[hvec*hvec]
Tr[hvec*hvec*hvec]Tr[hvec]
;​​poscons=hvec>0;​​sol=NMinimize[{obj,hvec>0},hvec];​​hh=rsort@traceNormalize[hvec/.Last@sol];​​Print["Largest eval: ",nf@Max[hh],", other eval: ",nf@Min[hh]];​​ListPlot[hh,PlotRange->All,Filling->Axis]​​
Largest eval: 0.430, other eval: 0.063
Out[]=

Nicer syntax but less robust

In[]:=
sol=NMinimize
2
Tr[c*c]
Tr[c*c*c]
,c∈Simplex@IdentityMatrix[d0];​​Print["Δloss: ",nf@First@sol];​​Print[SF["d=``, h=``",d0,rsort[nf/@(c/.Last@sol)]]]
Δloss: 0.597
d=10​, h={0.430,0.063,0.063,0.063,0.063,0.063,0.063,0.063,0.063,0.063}

user1551 solution

https://math.stackexchange.com/a/4648859/998
In[]:=
Clear[d,lambda2,xd,xk,lossDrop,worstLossDrop];​​​​d0=10;​​lambda2=1+2
d
Cos
1
3
ArcCos
1
d
;​​xd0=
lambda2
lambda2+d-1
;​​xk0=
1
lambda2+d-1
;​​lossDrop=
2
(
2
xd
+(d-1)
2
xk
)
(
3
xd
+(d-1)
3
xk
)
;​​worstLossDrop=lossDrop/.{xd->xd0,xk->xk0};​​​​Block[{d=d0},​​Print["Eigenvalues normalized: ",xd+(d-1)xk==1/.Thread[{xd,xk}->{xd0,xk0}]];​​Print[SF["Top eval for d=`` is ``",d,nf[xd0]]];​​Print[SF["Other eval for d=`` is ``",d,nf[xk0]]];​​Print[SF["Δloss for d=`` is ``",d,nf[worstLossDrop]]];​​];​​​​​​​​Plot[xk0,{d,1,100000}]​​Print["Top eval: ",xd0//FullSimplify];​​Print["Other eval: ",xk0//FullSimplify];​​​​xda=Asymptotic[xd0,{d,∞,1}];​​xka=Asymptotic[xk0,{d,∞,1}];​​​​Print["Asymptotic top eval: ",xda];​​Print["Asymptotic other eval: ",xka];​​​​asymptoticWorstLossDrop=lossDrop/.{xd->xda,xk->xka}//Simplify;​​asymptoticWorstLossDrop=Asymptotic[asymptoticWorstLossDrop,{d,∞,1}];​​Print["Asymptotic loss drop: ",asymptoticWorstLossDrop];​​Plot[{worstLossDrop,asymptoticWorstLossDrop},{d,1,1000},PlotLegends->{"Δloss","Δloss asympt"}]​​
Eigenvalues normalized: True
Top eval for d=10 is 0.430
Other eval for d=10 is 0.063
Δloss for d=10 is 0.597
Out[]=
Top eval: 1+
1-d
d+2
d
Cos
ArcSec
d

3

Other eval:
1
d+2
d
Cos
ArcSec
d

3


Lipschitz step size

Ryszard solution

https://math.stackexchange.com/questions/4652340/minimizing-ratio-of-l2-norm-and-linf-norm

Greedy vs Lipschitz

Related

Related:
- Which p maximizes r for given R?
https://math.stackexchange.com/questions/4633085/bounds-on-max-i-p-i-in-terms-of-sum-i-p-i2
​
- how to do Convex optimization faster in Mathematica
https://mathematica.stackexchange.com/a/280128/217