Whitening connection Aw=b

In[]:=

setupDataFullRank;Print["X=",MatrixForm@X];B=getB[X];Print["B= ",B//MatrixForm]getResiduals[X_]:=X-X.getB[X];R=getResiduals[X];Print["Residuals=",Diagonal[R.R]](*aftertransforming,covarianceofnewdatasetwitholddatasetisdiagonal*)Assert[DiagonalMatrixQ[cov[X-X.getB[X],X]]](*D2matrixof5.6*)R=getResiduals[X];D2=DiagonalMatrix[1/Diagonal@(R.R)];Print["D2= ",D2//MatrixForm]Print[Inverse[X.X]//MatrixForm];(*checkforinversecorrectness*)precisionMat=(iiMat-B).D2;Assert[precisionMat==Inverse[cov@X]](*Diagonalentriesofprecisionmatrixareinverseresiduals*)Assert[Diagonal[precisionMat]==1/Diagonal[R.R]](*dividingbycovarianceistransposeofthepseudo-inverse*)Assert[X.Inverse[X.X]==PseudoInverse[X]]Assert[phiToInv[getB@X]==Inverse@cov@X];Assert[invToPhi[Inverse@cov@X]==getB@X];Assert[invToPhi@phiToInv@getB[X]==getB[X]];Assert[covToPhi@cov[X]==getB[X]];Assert[covToPhiRobust@cov[X]==getB[X]];

X=

2	-2	2
-1	1	1
-2	-2	1
1	1	3

B=

0	4 67	2 5
4 73	0	- 1 5
20 73	- 10 67	0

Residuals=

650

73

,

650

67

,13

D2=

73 650	0	0
0	67 650	0
0	0	1 13

73 650	- 2 325	- 2 65
- 2 325	67 650	1 65
- 2 65	1 65	1 13

One sided

In[]:=

(*forX,returnslowertriangularCandsetsglobal"diag"suchthatX=C.diag.diag.C*)mchol[cov_]:=Module[{},chol=CholeskyDecomposition[cov];diag=DiagonalMatrix@Diagonal@chol;chol.Inverse[diag]];(*inversemodifiedCholesky*)imchol[cov_]:=Module[{},Inverse[mchol@cov]];(*OnedimensionalviewfromCholesky.Doesnothavedirectmappingtodecorrelation,becauseneedtoinvertCholeskyfactor*)(*Returnscoefficientsneededtopredicti'thcoordinatefrompreviouscoords*)phiL2R[i_,data_]:=Module[{},coefs=Switch[i,1,{},(*emptylist,sinceno"previous coordinates"forthefirstcoord*)_,(*Drop[data,-(n-i+1)];(*predictors,dropcolumns>i*)*)Y1=onlyFirstKColumns[data,i-1];Y2=extractKColumn[data,i];(*targetvariable,columni*)(*predictioncoeffs,3.26ofPourahmadi*)p=PseudoInverse[Y1.Y1].(Y1.Y2);(*Formresiduals,checkthattheyareuncorrelatedwithpredictor*)residuals=Y2-Y1.p;Assert[Total[cov[residuals,Y1],2]≐0];Flatten@p];Print["coefs=",coefs];(*padcolumnwithzeros*)PadRight[coefs,n]];(*Returnscoefficientsneededtopredicti'thcoordinatefromremainingones*)phiR2L[i_,data_]:=Module[{},coefs=Switch[i,n,{},(*emptylist,sinceno"previous coordinates"forthefirstcoord*)_,(*Drop[data,-(n-i+1)];(*predictors,dropcolumns>i*)*)Y1=onlyLastKColumns[data,n-i];Y2=extractKColumn[data,i];(*targetvariable,columni*)(*predictioncoeffs,3.26ofPourahmadi*)p=PseudoInverse[Y1.Y1].(Y1.Y2);(*Formresiduals,checkthattheyareuncorrelatedwithpredictor*)residuals=Y2-Y1.p;Assert[Total[cov[residuals,Y1],2]≐0];Flatten@p];(*padcolumnwithzeros*)PadLeft[coefs,n]];(*transposetoturncoefficientsintocolumns,becausemultiplyonleft:x-x.phiL2R*)coefsLeftToRight=Table[phiL2R[i,X],{i,n}];coefsRightToLeft=Table[phiR2L[i,X],{i,n}];Print["phiL2R=",coefsLeftToRight];(*testmodifiedCholesky*)mm=mchol[cov@X];Assert[mm.diag.diag.Transpose[mm]==cov@X];coefsChol=IdentityMatrix[n]-imchol[cov@X];coefsChol2=IdentityMatrix[n]-imchol[Inverse@cov@X];Print["Cholesky coefs match LR: ",coefsLeftToRight==coefsChol];Print["Inverse Cholesky coefs match RL: ",coefsLeftToRight==coefsChol];t=imchol[cov@X];d=DiagonalMatrix[Diagonal@CholeskyDecomposition@cov@X];Print["Cholesky inverse match: ",t.MatrixPower[d,-2].t==Inverse[cov@X]];(*connectiontoresiduals*)R2=X-X.coefsLeftToRight;Assert[R2.R2==diag.diag]residualNorm[mat_]:=Tr[mat.mat];Print["residuals from learned left-sided: ",residualNorm[X-X.coefsLeftToRight]]Print["residuals from transposed left-sided: ",residualNorm[X-X.coefsLeftToRight]]Print["residuals from learned right-sided: ",residualNorm[X-X.coefsRightToLeft]//N]Print["residuals from cholesky: ",residualNorm[X-X.(iiMat-imchol[cov@X])]];Print["residuals from two-sided: ",residualNorm[X-X.B]//N];With[{errors=X-X.coefsLeftToRight},Print["cholesky diagonal matches residuals: ",Sqrt@Diagonal[errors.errors]==Diagonal@CholeskyDecomposition@cov@X];](*Iffeaturexsubtract1/2featureyduringL-to-Rwhitening,thenalsosubtract1/2xfromytogetinverse*)Print["inverse as double cholesky whitening: ",X.(IdentityMatrix[n]-coefsChol).MatrixPower[d,-2].(IdentityMatrix[n]-coefsChol)==X.Inverse[cov@X]]

Singular matrix Experiments

TLDR; Tikhonov always kind of works . The rest are brittle . Least squares regression has a regime where it switches from ignoring features to not ignoring them.

pinv(X’X) gives non-symmetrical result for ill-conditioned matrices, use pinv(X)pinv(X)^T instead.

Rank - deficiency=1

Underparameterized

Rank deficiency=2

Rank deficiency=1 + noise

TLDR; Tikhonov always kind of works. The rest are brittle. Least squares regression has a regime where it switches.

noise=0.001, Tikhonov(0.001) starts ignoring noise feature
noise=0.000001, pseudo-inverse1 explodes
noise=0.0000001, lstsq starts ignoring noise feature, orthoprojection kicks in
noise=0.000000001, orthoprojection method stops working

fine in this setting with a range of lambda. For 0.000001, two ways of computing pseudo-inverse give dramatically diff results. The second way is much better for the formula. At 0.0000000000001 or lower, orthoprojected method starts working, but second pseudo-inverse stops working.

Overparameterized + noise

4 examples, 3 features, and 2 noise features
Tikhonov(0.000001) switches to ignoring noise features around noise=0.0001.

Pseudo-inverse always gives incorrect result (more features than examples)