In[]:=
deploy
Fri 15 Jul 2022 11:49:19
- Python notebook: https://colab.research.google.com/drive/1tXmS6kZcJ8hsfvzKaZSCvXf7iNRT25UG​
- Notes (NN<>Least Squares): https://notability.com/n/06JGhcgRtAcAR7~nqNsLaO​
​

Whitening connection Aw=b

In[]:=
​​​​setupDataFullRank;​​Print["X=",MatrixForm@X];​​​​B=getB[X];​​​​Print["B= ",B//MatrixForm]​​getResiduals[X_]:=X-X.getB[X];​​R=getResiduals[X];​​Print["Residuals=",Diagonal[R.R]]​​​​​​(*aftertransforming,covarianceofnewdatasetwitholddatasetisdiagonal*)​​Assert[DiagonalMatrixQ[cov[X-X.getB[X],X]]]​​​​(*D2matrixof5.6*)​​R=getResiduals[X];​​D2=DiagonalMatrix[1/Diagonal@(R.R)];​​Print["D2= ",D2//MatrixForm]​​​​Print[Inverse[X.X]//MatrixForm];​​​​(*checkforinversecorrectness*)​​precisionMat=(iiMat-B).D2;​​Assert[precisionMat==Inverse[cov@X]]​​​​(*Diagonalentriesofprecisionmatrixareinverseresiduals*)​​Assert[Diagonal[precisionMat]==1/Diagonal[R.R]]​​​​(*dividingbycovarianceistransposeofthepseudo-inverse*)​​Assert[X.Inverse[X.X]==PseudoInverse[X]]​​​​Assert[phiToInv[getB@X]==Inverse@cov@X];​​Assert[invToPhi[Inverse@cov@X]==getB@X];​​Assert[invToPhi@phiToInv@getB[X]==getB[X]];​​Assert[covToPhi@cov[X]==getB[X]];​​Assert[covToPhiRobust@cov[X]==getB[X]];​​​​​​
X=
2
-2
2
-1
1
1
-2
-2
1
1
1
3
B=
0
4
67
2
5
4
73
0
-
1
5
20
73
-
10
67
0
Residuals=
650
73
,
650
67
,13
D2=
73
650
0
0
0
67
650
0
0
0
1
13
73
650
-
2
325
-
2
65
-
2
325
67
650
1
65
-
2
65
1
65
1
13

One sided

In[]:=
(*forX,returnslowertriangularCandsetsglobal"diag"suchthatX=C.diag.diag.C*)​​mchol[cov_]:=Module[{},​​chol=CholeskyDecomposition[cov];​​diag=DiagonalMatrix@Diagonal@chol;​​chol.Inverse[diag]​​];​​​​(*inversemodifiedCholesky*)​​imchol[cov_]:=Module[{},​​Inverse[mchol@cov]​​];​​​​(*OnedimensionalviewfromCholesky.Doesnothavedirectmappingtodecorrelation,becauseneedtoinvertCholeskyfactor*)​​​​​​​​(*Returnscoefficientsneededtopredicti'thcoordinatefrompreviouscoords*)​​phiL2R[i_,data_]:=Module[{},​​coefs=Switch[i,​​1,​​{},(*emptylist,sinceno"previous coordinates"forthefirstcoord*)​​_,​​(*Drop[data,-(n-i+1)];(*predictors,dropcolumns>i*)*)​​Y1=onlyFirstKColumns[data,i-1];​​Y2=extractKColumn[data,i];(*targetvariable,columni*)​​​​(*predictioncoeffs,3.26ofPourahmadi*)​​p=PseudoInverse[Y1.Y1].(Y1.Y2);​​​​(*Formresiduals,checkthattheyareuncorrelatedwithpredictor*)​​residuals=Y2-Y1.p;​​Assert[Total[cov[residuals,Y1],2]≐0];​​Flatten@p​​];​​Print["coefs=",coefs];​​(*padcolumnwithzeros*)​​PadRight[coefs,n]​​];​​​​(*Returnscoefficientsneededtopredicti'thcoordinatefromremainingones*)​​phiR2L[i_,data_]:=Module[{},​​coefs=Switch[i,​​n,​​{},(*emptylist,sinceno"previous coordinates"forthefirstcoord*)​​_,​​(*Drop[data,-(n-i+1)];(*predictors,dropcolumns>i*)*)​​Y1=onlyLastKColumns[data,n-i];​​Y2=extractKColumn[data,i];(*targetvariable,columni*)​​​​(*predictioncoeffs,3.26ofPourahmadi*)​​p=PseudoInverse[Y1.Y1].(Y1.Y2);​​​​(*Formresiduals,checkthattheyareuncorrelatedwithpredictor*)​​residuals=Y2-Y1.p;​​Assert[Total[cov[residuals,Y1],2]≐0];​​Flatten@p​​];​​(*padcolumnwithzeros*)​​PadLeft[coefs,n]​​];​​​​​​(*transposetoturncoefficientsintocolumns,becausemultiplyonleft:x-x.phiL2R*)​​coefsLeftToRight=Table[phiL2R[i,X],{i,n}];​​coefsRightToLeft=Table[phiR2L[i,X],{i,n}];​​Print["phiL2R=",coefsLeftToRight];​​​​(*testmodifiedCholesky*)​​mm=mchol[cov@X];​​Assert[mm.diag.diag.Transpose[mm]==cov@X];​​​​coefsChol=IdentityMatrix[n]-imchol[cov@X];​​coefsChol2=IdentityMatrix[n]-imchol[Inverse@cov@X];​​Print["Cholesky coefs match LR: ",coefsLeftToRight==coefsChol];​​Print["Inverse Cholesky coefs match RL: ",coefsLeftToRight==coefsChol];​​​​t=imchol[cov@X];​​​​d=DiagonalMatrix[Diagonal@CholeskyDecomposition@cov@X];​​Print["Cholesky inverse match: ",t.MatrixPower[d,-2].t==Inverse[cov@X]];​​​​(*connectiontoresiduals*)​​R2=X-X.coefsLeftToRight;​​Assert[R2.R2==diag.diag]​​​​​​residualNorm[mat_]:=Tr[mat.mat];​​Print["residuals from learned left-sided: ",residualNorm[X-X.coefsLeftToRight]]​​Print["residuals from transposed left-sided: ",residualNorm[X-X.coefsLeftToRight]]​​Print["residuals from learned right-sided: ",residualNorm[X-X.coefsRightToLeft]//N]​​Print["residuals from cholesky: ",residualNorm[X-X.(iiMat-imchol[cov@X])]];​​Print["residuals from two-sided: ",residualNorm[X-X.B]//N];​​​​With[{errors=X-X.coefsLeftToRight},​​Print["cholesky diagonal matches residuals: ",Sqrt@Diagonal[errors.errors]==Diagonal@CholeskyDecomposition@cov@X];​​]​​​​(*Iffeaturexsubtract1/2featureyduringL-to-Rwhitening,thenalsosubtract1/2xfromytogetinverse*)​​Print["inverse as double cholesky whitening: ",​​X.(IdentityMatrix[n]-coefsChol).MatrixPower[d,-2].(IdentityMatrix[n]-coefsChol)==X.Inverse[cov@X]]​​​​

Singular matrix Experiments

TLDR; Tikhonov always kind of works . The rest are brittle . Least squares regression has a regime where it switches from ignoring features to not ignoring them.
​
pinv(X’X) gives non-symmetrical result for ill-conditioned matrices, use pinv(X)pinv(X)^T instead.

Rank - deficiency=1

Underparameterized

Rank deficiency=2

Rank deficiency=1 + noise

TLDR; Tikhonov always kind of works. The rest are brittle. Least squares regression has a regime where it switches.
​
noise=0.001, Tikhonov(0.001) starts ignoring noise feature
noise=0.000001, pseudo-inverse1 explodes
noise=0.0000001, lstsq starts ignoring noise feature, orthoprojection kicks in
noise=0.000000001, orthoprojection method stops working
​
fine in this setting with a range of lambda. For 0.000001, two ways of computing pseudo-inverse give dramatically diff results. The second way is much better for the formula. At 0.0000000000001 or lower, orthoprojected method starts working, but second pseudo-inverse stops working.

Overparameterized + noise

4 examples, 3 features, and 2 noise features
Tikhonov(0.000001) switches to ignoring noise features around noise=0.0001.
​
Pseudo-inverse always gives incorrect result (more features than examples)
​