In[]:=
Wed 22 Mar 2023 15:26:20
Fast powers of diagonal + rank1 matrices
Fast powers of diagonal + rank1 matrices
Goal: ≃ exp(-t(A-B))For diagonal A, symmetric rank1 B, can we approximate the exponential quantity above quickly and accurately?Mathoverflow discussion.Top-level notes: Diagonal + rank1 section
t
(I-(A-B))
Code
Code
Background:
Gives diagonal, rank1 matrices from eq Eq 5. from Bordelon “structured” paper
generateDiagRank1 returns DPR1 matrix A-B such that exp(-t(A-B)) is the state transition matrix at time t
{a0,b0}=generateDiagRank1[d];
A=DiagonalMatrix[a0];
B={b0}.{b0};
Related notebooks:
- forum-velikanov-berthier-example.nb, simulations for p=1, p=1.1, p=1.4
- gd-vs-sgd.nb utilities for plotting loss curves, finding step times
Notation:
b -- batch size b
p -- power-law decay constant 1<p<2
d -- dimensions
t -- timestep(important values, t=2, t=d, )
Gives diagonal, rank1 matrices from eq Eq 5. from Bordelon “structured” paper
generateDiagRank1 returns DPR1 matrix A-B such that exp(-t(A-B)) is the state transition matrix at time t
{a0,b0}=generateDiagRank1[d];
A=DiagonalMatrix[a0];
B={b0}.{b0};
Related notebooks:
- forum-velikanov-berthier-example.nb, simulations for p=1, p=1.1, p=1.4
- gd-vs-sgd.nb utilities for plotting loss curves, finding step times
Notation:
b -- batch size b
p -- power-law decay constant 1<p<2
d -- dimensions
1<d<
10
10
t -- timestep
1<t<
2
d
t=
2
d
(batch-size-formulas.graffle / bertier), https://www.dropbox.com/s/cwyhmkzsavj9pdh/batch-size-formulas.graffle?dl=0
In[]:=
(*A-B is DPR1 matrix for A=diag(a),B=b.b exp(-t(A-B)) gives the state transition matrix where{a,b}=generateDiagRank1[d], then () *)generateDiagRank1[d_]:=Module{},(* Largest value of step size contractive in Schatten-1 norm *)p=1.1;h=Table[,{i,1,d}];h=hTr[h];α=;a0=2 α h(1-α h);b0=α h;(* For small sizes, run sanity checks *)If[d<=100,On[Assert];A=DiagonalMatrix[a0];B={b0}.{b0};Assert[And@@Thread[{Norm[A],Norm[B],Norm[A-B]}<=1]];Assert[And@@(PositiveSemidefiniteMatrixQ/@{A,B,A-B})];(* check that matrices are non-negative *)nnMatrixQ[mat_]:=And@@Thread[Flatten[mat]>=0];Assert[nnMatrixQ[A]];Assert[nnMatrixQ[B]];];{a0,b0};(* Faster version of A.x DPR1 matrix A stored globally as dd,u *)fastMatmul[x_]:=x-dd*x+u*u.x;(* {{"Exact value: ",ones.MatrixPower[ii-(A-B),t,ones]},{"Exact fast: ",ones.Nest[fastMatmul,ones,t]}}//TableForm *)d=100; (* dimensions *){a0,b0}=generateDiagRank1[d];A=DiagonalMatrix[a0];B={b0}.{b0};ones=ConstantArray[1.,d];ii=IdentityMatrix[d];exact[t_]:=ones.MatrixPower[ii-(A-B),t,ones];exp[t_]:=ones.MatrixExp[-t(A-B)].ones;On[Assert];Assert[exact[d]==61.54928435838899`]Assert[exp[d]==61.62225463274567`]
-p
i
1
2Max[h]+Tr[h]
Tests
Tests
Evaluation
Evaluation
In[]:=
d=100;(*dimensions*){a0,b0}=generateDiagRank1[d];A=DiagonalMatrix[a0];B={b0}.{b0};ii=IdentityMatrix[d];ones=ConstantArray[1.,d];t=d;TableForm[{{"Exact value: ",ones.MatrixPower[ii-(A-B),t,ones]},{"exp approximation: ",ones.MatrixExp[-t(A-B)].ones},{"cheaper exp approximation: ",ones.MatrixExp[-t(A)].ones}}]
Out[]//TableForm=
Exact value: | 61.5493 |
exp approximation: | 61.6223 |
cheaper exp approximation: | 55.54 |
In[]:=
ones.MatrixPower[ii-(A-B),t,ones]