# Please Help! Linear Regression: How do I compute the t-values for the beta coefficients a design matrix with less than full rank?

 2 Is there a way to calculate the standard error estimates and t-values of my beta coefficients if my matrix does not have full rank and therefore is not invertible? I am writing a script in Ruby to do least squares regression and analysis of variance. I can compute the beta coefficients for my least squares approximation with... b = ((X'*X)^(-1))*X*y'  However, this assumes my matrix has full rank. I have data sets that occasionally contain a set of linearly dependent columns and therefore will not invert. I want to keep the linearly dependent columns in my design matrix, so I found discovered at PlanetMath that I can compute the betas using the "pseudo-inverse" from singular value decomposition (SVD)... b = V*(S'*S)^(-1)*S'*U'*y'  T-values are calculated by taking the diagonal of the inverse of the covariance matrix C[i][j] t[i] = b[i] / (s * Math.sqrt(C[i][i]))  where b[i] : beta for the ith feature s : standard error of entire data set and its estimates X : design matrix--the matrix with your independent variables C = ((X'*X)^(-1))  Is there another way to calculate C using the S, U, or V from Singular Value Decomposition? P.S.: I also found this paper to be quite relevant. asked 04 Aug '12, 19:40 dpott197 43●1●5 accept rate: 0% fbahr ♦ 4.6k●7●17

 3 There is no free lunch here. If X is less than full rank, then there is a unique best (least square error) set of estimates of y that are linear in x, but there are uncountably many coefficient vectors that produce those estimates. For variables that are involved in a linear relation (which may be all of x or a subset), it is guaranteed that at least one coefficient estimate has a coefficient 0 for that variable, and at least one that doesn't (unless, by cosmic coincidence, all the variables in that subset of x get zero coefficients). This would make t-values uninterpretable even if you could get them. answered 05 Aug '12, 15:52 Paul Rubin ♦♦ 14.6k●5●13 accept rate: 19% 1 Which makes me wonder... both PCR and PLSR   have been developed to overcome problems which arise when $$X$$ is rank-deficient. So: are $$t$$-values calculated from $$\beta$$s obtained through PCR or PLSR different to interpret from $$t$$-values obtained from "standard" (multiple) LLSR results? [see also: Numerical Linear Algebra in Data Mining, section 3.4] (07 Aug '12, 14:56) fbahr ♦ 1 Oh well... on second thought, "overcome ... rank deficiencies" probably refers to column rank deficiencies (i.e., cases of collinearity) only. (07 Aug '12, 15:41) fbahr ♦ 2 PCR and PLSR linearly transform the original predictor (X) matrix into a full rank matrix (Z) of smaller dimension (few variables), which eliminates the multicollinearity problem. You get the beta coefficients and their t-statistics for the regression of the response variable on the z variables the usual way; but the z variables are linear combinations of the x variables, and there is no way to convert t-statistics for the coefficients of the z variables to t-statistics for coefficients of the x variables. (07 Aug '12, 17:46) Paul Rubin ♦♦
 2 Unfortunately, the matrix C does not exist. You have to consider reducing the size of the matrix X, for example through the principal component analysis ... hmmm, $C := V \times [S'S]^{-1} \times V'$ SVD: $\begin{eqnarray} X &:= &U S V'\\ b &:= &[X'X]^{-1} X' y, \quad\textrm{so} &&\\ b &= &[V S' U' U S V']^{-1} V S' U' y\\ U' U &= &I\\ b &= &[V S' S V']^{-1} V S U' y = V [S' S]^{-1} V' V S' U' y\\ V' V &= &I\\ b &= &V [S' S]^{-1} S' U' y\\ cov(b) &= &[X'X]^{-1} s^2 = V [S' S]^{-1} V' s^2 ??? \end{eqnarray}$ Is it there in that problem $$\textrm{det}(S'S)=0$$ and again $$[S'S]^{-1}$$ does not exist and thus $$C$$ also does not exist? PCA: $\begin{eqnarray} Z &= &P \times X \end{eqnarray}$ We can remove some variables in $$Z$$ and instead $$b_x$$ we can estimate $$b_z$$. If Z has less variables, we can not write something like that: $\begin{eqnarray} b_x &= &P' \times b_z\\ t_x &= &P' \times t_z \end{eqnarray}$ Well done, Florian! answered 07 Aug '12, 18:06 Slavko 205●1●5 accept rate: 12% 3 FAQ @ OR-X: Hey, how do I get that fancy math stuff? (08 Aug '12, 04:15) fbahr ♦ 1 Many thanks, Florian! Last night I spent a lot of time for a looking for an example of how to use the LaTeX on this site. (08 Aug '12, 04:53) Slavko 1 Regarding the PCA derivation above, if X has less than full rank, one or more of the Z variables will have zero variance (i.e., be constant). If we start out with X and Y centered, then the constant columns of Z will be identically zero, which means you will be unable to estimate regression coefficients for them. (08 Aug '12, 17:03) Paul Rubin ♦♦ 1 The matrix $$Z$$ is smaller, because we omit variables with zero variance. Moreover, other variables in $$Z$$ are uncorrelated with each other, which results from the method of PCA. (08 Aug '12, 18:49) Slavko 1 Thank you Paul, in fact PCR it is a one way ticket. If $$Z$$ has less variables, it is not possible to estimate regression coefficients for X. Well, yesterday at 2 am I forgot about the variable containing same 1's .. In general, last night these arrays somehow looked different :-) I corrected my answer. I am glad that there is a such place like this, where I can get to know the views of wonderful people who are willing to help others. Creating this site was very valuable initiative. (09 Aug '12, 03:49) Slavko
 toggle preview community wiki

By Email:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "Title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported

Tags:

×6
×3
×2
×1
×1