TLDR; Tikhonov always kind of works . The rest are brittle . Least squares regression has a regime where it switches from ignoring features to not ignoring them. pinv(X’X) gives non-symmetrical result for ill-conditioned matrices, use pinv(X)pinv(X)^T instead.
Rank - deficiency=1
Rank deficiency=1 + noise
TLDR; Tikhonov always kind of works. The rest are brittle. Least squares regression has a regime where it switches. noise=0.001, Tikhonov(0.001) starts ignoring noise feature noise=0.000001, pseudo-inverse1 explodes noise=0.0000001, lstsq starts ignoring noise feature, orthoprojection kicks in noise=0.000000001, orthoprojection method stops working fine in this setting with a range of lambda. For 0.000001, two ways of computing pseudo-inverse give dramatically diff results. The second way is much better for the formula. At 0.0000000000001 or lower, orthoprojected method starts working, but second pseudo-inverse stops working.
Overparameterized + noise
4 examples, 3 features, and 2 noise features Tikhonov(0.000001) switches to ignoring noise features around noise=0.0001. Pseudo-inverse always gives incorrect result (more features than examples)