In[]:=
deploy
Fri 15 Jul 2022 11:49:19
- Python notebook: https://colab.research.google.com/drive/1tXmS6kZcJ8hsfvzKaZSCvXf7iNRT25UG
- Notes (NN<>Least Squares): https://notability.com/n/06JGhcgRtAcAR7~nqNsLaO
- Notes (NN<>Least Squares): https://notability.com/n/06JGhcgRtAcAR7~nqNsLaO
Whitening connection Aw=b
Whitening connection Aw=b
In[]:=
setupDataFullRank;Print["X=",MatrixForm@X];B=getB[X];Print["B= ",B//MatrixForm]getResiduals[X_]:=X-X.getB[X];R=getResiduals[X];Print["Residuals=",Diagonal[R.R]](*aftertransforming,covarianceofnewdatasetwitholddatasetisdiagonal*)Assert[DiagonalMatrixQ[cov[X-X.getB[X],X]]](*D2matrixof5.6*)R=getResiduals[X];D2=DiagonalMatrix[1/Diagonal@(R.R)];Print["D2= ",D2//MatrixForm]Print[Inverse[X.X]//MatrixForm];(*checkforinversecorrectness*)precisionMat=(iiMat-B).D2;Assert[precisionMat==Inverse[cov@X]](*Diagonalentriesofprecisionmatrixareinverseresiduals*)Assert[Diagonal[precisionMat]==1/Diagonal[R.R]](*dividingbycovarianceistransposeofthepseudo-inverse*)Assert[X.Inverse[X.X]==PseudoInverse[X]]Assert[phiToInv[getB@X]==Inverse@cov@X];Assert[invToPhi[Inverse@cov@X]==getB@X];Assert[invToPhi@phiToInv@getB[X]==getB[X]];Assert[covToPhi@cov[X]==getB[X]];Assert[covToPhiRobust@cov[X]==getB[X]];
X=
2 | -2 | 2 |
-1 | 1 | 1 |
-2 | -2 | 1 |
1 | 1 | 3 |
B=
0 | 4 67 | 2 5 |
4 73 | 0 | - 1 5 |
20 73 | - 10 67 | 0 |
Residuals=,,13
650
73
650
67
D2=
73 650 | 0 | 0 |
0 | 67 650 | 0 |
0 | 0 | 1 13 |
73 650 | - 2 325 | - 2 65 |
- 2 325 | 67 650 | 1 65 |
- 2 65 | 1 65 | 1 13 |
One sided
One sided
In[]:=
(*forX,returnslowertriangularCandsetsglobal"diag"suchthatX=C.diag.diag.C*)mchol[cov_]:=Module[{},chol=CholeskyDecomposition[cov];diag=DiagonalMatrix@Diagonal@chol;chol.Inverse[diag]];(*inversemodifiedCholesky*)imchol[cov_]:=Module[{},Inverse[mchol@cov]];(*OnedimensionalviewfromCholesky.Doesnothavedirectmappingtodecorrelation,becauseneedtoinvertCholeskyfactor*)(*Returnscoefficientsneededtopredicti'thcoordinatefrompreviouscoords*)phiL2R[i_,data_]:=Module[{},coefs=Switch[i,1,{},(*emptylist,sinceno"previous coordinates"forthefirstcoord*)_,(*Drop[data,-(n-i+1)];(*predictors,dropcolumns>i*)*)Y1=onlyFirstKColumns[data,i-1];Y2=extractKColumn[data,i];(*targetvariable,columni*)(*predictioncoeffs,3.26ofPourahmadi*)p=PseudoInverse[Y1.Y1].(Y1.Y2);(*Formresiduals,checkthattheyareuncorrelatedwithpredictor*)residuals=Y2-Y1.p;Assert[Total[cov[residuals,Y1],2]≐0];Flatten@p];Print["coefs=",coefs];(*padcolumnwithzeros*)PadRight[coefs,n]];(*Returnscoefficientsneededtopredicti'thcoordinatefromremainingones*)phiR2L[i_,data_]:=Module[{},coefs=Switch[i,n,{},(*emptylist,sinceno"previous coordinates"forthefirstcoord*)_,(*Drop[data,-(n-i+1)];(*predictors,dropcolumns>i*)*)Y1=onlyLastKColumns[data,n-i];Y2=extractKColumn[data,i];(*targetvariable,columni*)(*predictioncoeffs,3.26ofPourahmadi*)p=PseudoInverse[Y1.Y1].(Y1.Y2);(*Formresiduals,checkthattheyareuncorrelatedwithpredictor*)residuals=Y2-Y1.p;Assert[Total[cov[residuals,Y1],2]≐0];Flatten@p];(*padcolumnwithzeros*)PadLeft[coefs,n]];(*transposetoturncoefficientsintocolumns,becausemultiplyonleft:x-x.phiL2R*)coefsLeftToRight=Table[phiL2R[i,X],{i,n}];coefsRightToLeft=Table[phiR2L[i,X],{i,n}];Print["phiL2R=",coefsLeftToRight];(*testmodifiedCholesky*)mm=mchol[cov@X];Assert[mm.diag.diag.Transpose[mm]==cov@X];coefsChol=IdentityMatrix[n]-imchol[cov@X];coefsChol2=IdentityMatrix[n]-imchol[Inverse@cov@X];Print["Cholesky coefs match LR: ",coefsLeftToRight==coefsChol];Print["Inverse Cholesky coefs match RL: ",coefsLeftToRight==coefsChol];t=imchol[cov@X];d=DiagonalMatrix[Diagonal@CholeskyDecomposition@cov@X];Print["Cholesky inverse match: ",t.MatrixPower[d,-2].t==Inverse[cov@X]];(*connectiontoresiduals*)R2=X-X.coefsLeftToRight;Assert[R2.R2==diag.diag]residualNorm[mat_]:=Tr[mat.mat];Print["residuals from learned left-sided: ",residualNorm[X-X.coefsLeftToRight]]Print["residuals from transposed left-sided: ",residualNorm[X-X.coefsLeftToRight]]Print["residuals from learned right-sided: ",residualNorm[X-X.coefsRightToLeft]//N]Print["residuals from cholesky: ",residualNorm[X-X.(iiMat-imchol[cov@X])]];Print["residuals from two-sided: ",residualNorm[X-X.B]//N];With[{errors=X-X.coefsLeftToRight},Print["cholesky diagonal matches residuals: ",Sqrt@Diagonal[errors.errors]==Diagonal@CholeskyDecomposition@cov@X];](*Iffeaturexsubtract1/2featureyduringL-to-Rwhitening,thenalsosubtract1/2xfromytogetinverse*)Print["inverse as double cholesky whitening: ",X.(IdentityMatrix[n]-coefsChol).MatrixPower[d,-2].(IdentityMatrix[n]-coefsChol)==X.Inverse[cov@X]]
Singular matrix Experiments
Singular matrix Experiments
TLDR; Tikhonov always kind of works . The rest are brittle . Least squares regression has a regime where it switches from ignoring features to not ignoring them.
pinv(X’X) gives non-symmetrical result for ill-conditioned matrices, use pinv(X)pinv(X)^T instead.
pinv(X’X) gives non-symmetrical result for ill-conditioned matrices, use pinv(X)pinv(X)^T instead.
Rank - deficiency=1
Rank - deficiency=1
Underparameterized
Underparameterized
Rank deficiency=2
Rank deficiency=2
Rank deficiency=1 + noise
Rank deficiency=1 + noise
TLDR; Tikhonov always kind of works. The rest are brittle. Least squares regression has a regime where it switches.
noise=0.001, Tikhonov(0.001) starts ignoring noise feature
noise=0.000001, pseudo-inverse1 explodes
noise=0.0000001, lstsq starts ignoring noise feature, orthoprojection kicks in
noise=0.000000001, orthoprojection method stops working
fine in this setting with a range of lambda. For 0.000001, two ways of computing pseudo-inverse give dramatically diff results. The second way is much better for the formula. At 0.0000000000001 or lower, orthoprojected method starts working, but second pseudo-inverse stops working.
noise=0.001, Tikhonov(0.001) starts ignoring noise feature
noise=0.000001, pseudo-inverse1 explodes
noise=0.0000001, lstsq starts ignoring noise feature, orthoprojection kicks in
noise=0.000000001, orthoprojection method stops working
fine in this setting with a range of lambda. For 0.000001, two ways of computing pseudo-inverse give dramatically diff results. The second way is much better for the formula. At 0.0000000000001 or lower, orthoprojected method starts working, but second pseudo-inverse stops working.
Overparameterized + noise
Overparameterized + noise
4 examples, 3 features, and 2 noise features
Tikhonov(0.000001) switches to ignoring noise features around noise=0.0001.
Pseudo-inverse always gives incorrect result (more features than examples)
Tikhonov(0.000001) switches to ignoring noise features around noise=0.0001.
Pseudo-inverse always gives incorrect result (more features than examples)