Chapter 13 Appendix

13.1 Brief introduction to Matrix

Eigenvector (characteristic vector) and eigenvalue (characteristic value) are useful in solving system of linear equations and providing quicker solution to solving task such as finding the nth power of a matrix. The eigenvector v of a n x n square matrix M is defined as a non-zero vector of M such that the product of Mev is equal to the product of the scalar eigenvalue λ and v.

Eigenvector analysis can be used to describe pattern. Here, the eigenvector has an interpretation in term of the direction of the data and the eigenvalue provides a scaling measure of the length or change in direction of the vector (when both are multiplied). Using the description above regarding finding the nth power of a matrix M, the eigenvectors remain unchanged but the eigenvalues change in proportion to the nth power of M.

In terms of applications, eigen decomposition approach provides a non-bias method for discovering pattern in complex data; this is typically performed by eigen decomposition of the covariance matrix. This approach had been used in describing pattern of infarct in hypoxic ischaemic injury780, facial recognition.

In the section on regression, we can show that the eigenvalue can be interpreted in term of the variance of data. Eigenvalue can be described

    I is the n x n identity matrix

In geometric term, the eigenvalue describes the length of its associated eigenvector and in regression analyss, it can be considered as the variance of the data. Eigenvalue is calculated by finding the solution for the root of characteristic polynomial equation. For a 2 x 2 matrix,

The characteristic polynomial is given by

The Trace(M) is given by The determinant is given by
The quadratic formula can be used to solve for λ.

In this case, the eigenvalue is -1 and 3. The eigenvector of M is given by

The identity matrix consist of ones on the diagnonal and zero elsewhere. For a 2 x 2 matrix, the identity matrix is

For the eigenvalue -1, its eigenvector is given by

The eigenvectors include (1, 1) . For the eigenvalue 3, its associated eigenctor include (1, -1). The solution for the characteristic equation becomes more complex as the size of the matrix increase. For large matrices, the power method is used to derive the eigenvalue. The determinant is equal to the product if the eigenvalues (-1 x 3= -3). The trace is equal to the sum of the eigenvalues (-1+3=2) or the sum of the diagonal of the matrix (1+1 =2).

A matrix is invertible (non-singular) if it satisfies the following

The matrix M-1 is the inverse of matrix M. The inverse of M is calculated as follow.

w=1, x=0, y=1,z=0 Matrix which cannot be inverted is termed singular. The determinant of a square matrix is zero. The significance of invertible matrix will be seen in the section on collinearity when dealing with regression analsysis.

Sparse matrix is a matrix populated mostly by zeros. In a network sense it implies lack of cohesion in the network. Inverting sparse matrix is challenging due to the reduced rank associate with this type of matrix. The solutions require various form of penalisation (penalised regression is discuss next under Regression).

The null space of a matrix can be considered as the solutions to homogenous system of linear equations.

The null space of a matrix is also termed the kernel of that matrix (3, -9, 1). As shown above, the null space of a matrix contains a zero vector. In other word the coefficient matrix is zeros. The augemented matrix is displayed on the left. The null space is a subspace of this matrix.

The rank of matrix of a matrix can be considered as a measure of the ‘non-singularness’ of a matrix. For example, a 3 x 4 matrix with 2 independent rows has rank of 2. In other word, it describes the number of independent columns of a matrix or its eigenvector. It also describes the dimension of the image of the linear transformation that is performed on the matrix. Often the expression that the matrix is full rank (contains independent rows and columns of data) is used in regression to infer that the matrix is invertible and the data are non-collinear. An example of zero rank or collinear matrix is shown here

Observe that the second column is three times the first column. The determinant of this matrix is 3 x 6 – 2 x 9=0. This matrix is not invertible and rank deficient. The interpretation is that the rank of the augmented matrix is equal or to that for the coefficient matrix, the solution to system of linear equation is stable. If the rank of an augmented matrix is larger than the coefficient matrix, the matrix is rank deficient.mA special form of multivariate regression use the rank of matrix in regression (reduced rank regression). This type of regression is useful in the case of of minority class distribution where the majority of the data are at one end of the spectrum.

A positive semi-definite matrix is invertible and has full rank. It is defined as a matrix which can be obtained by the multiplication of a matrix and its transpose (denoted by T in upper case).

Such a matrix is symmetrical. Examples include correlation and covariance matrices.

13.2 Regression

In the section below, a matrix approach to regression is provided. This step was performed as it explains for issues encountered when doing regression analyses. It is easier to explain collinearity using matrix. Examples of performing regression analyses and steps required to check for errors are given. This section deals with univariable and multivariable analyses. The next chapter will discuss the different multivariate analyses.

13.2.1 Linear regression

A matrix description of parameter estimation is given here because it is easier to describe the multiple regression (Smith 1998). \(Y= \beta_1X_1+ \beta_2X_2+...\beta_jX_j\) A matrix is an array of data as rows and columns. For the imaging data, the individual voxel is represented on each column and each row refers to another patient. A vector refers to a matrix of one column.

In matrix form, the multiple regression equation takes the form \(Y= \beta_j X + E\) where X is the predictor matrix, Y is the dependent matrix, β is the regression coefficient and E is the error term (not the intercept). The X is a \(j^{th}\) rows by ith columns matrix and Y and E is a \(j^{th}\) column vector.

Algebraic manipulation of equation 1, shows that the solution for β is \((X^TX)^{-1}X^TY\). \(X^T\) is the transpose of X such that the columns of are now written as rows. The correlation matrix of X is given by \(X^TX\). The solution for β is possible if the correlation matrix \(X^TX\) can be inverted. The inverse of a matrix can be found if it is has a square shape (columns and rows of the matrix are equal) and the determinant of the matrix is not singular or nonzero (Smith 1998). The uniqueness of the matrix inverse is that when a matrix is multiplied by its inverse, the solution is an identity matrix. The diagonal elements of a matrix are ones and the remainders of the square matrix are zeros for an identity matrix. For a rectangular matrix, multiplication of the matrix by its Moore-Penrose pseudo-inverse results in an identity matrix.

The terms in the inverse of \(X^TX\) are divided by the determinants of \(X^TX\) . For simplicity, the determinant of a 2 x 2 matrix is given by \(ad-bc\) for a matrix A. \(A=\left[\begin{array}{cc}a & b\\c & d\end{array}\right]\) The inverse of this matrix A is given by \(\frac{1}{ad-bc}\left[\begin{array}{cc}d & -b\\-c & a\end{array}\right]\) From this equation, it can be seen that the determinant and the inverse exist if the result is nonzero. For a correlation matrix \(X^TX\) of the form \(\left[\begin{array}{cc}n & nX_*\\nX_* & nX_*^2\end{array}\right]\) then the determinant is zero. \(n \times nX_*^2 -nX_* \times nX_*\) Hence, there is no unique solution for this equation. If near collinearity exists and the determinant approach zero, there are infinite possible combinations that can result in a least squares estimate of the parameter. In this example, the matrix is singular then the columns of X are likely to be linearly related to each other (collinearity). In this case, the regression coefficient is unstable with the variance of the regression coefficient large. Further, small changes in the dependent variables lead to fluctuations in the regression solution.

13.2.1.1 Leasts squares

The least squares solution for the parameter β refers to the fitting of the line between the intercept and the variables of X such that the Euclidian distance between the observed variables Y and expected or predicted variables are as small as possible. The metric for the fit is the sum of squared errors SSE (or residual mean square error/residual sum of squares) and is given by \(SSE=\sum(Y-\beta X)^T(Y-\beta X)\) The variance-covariance matrix of \(\beta\) is given by \(Var(\beta)=\delta^2(X^TX)\)

13.2.1.2 Collinearity

Collinearity or relatedness among the predictors is often forgotten in many analysis. This issue can lead to instability in the regression coefficients. There are several tests for collinearity: variance inflation factor and condition index. The variance inflation factor (VIF) is proportional to \(VIF = 1/1-R^2\). In this example, as the predictors become strogly correlated \(R^2\) apporaches 1 and VIF will approaches infintity. Collinearity is present if VIF >10(Kleinbaum, Kupper, and Muller 1978). Collinearity can also be assessed by measuring the condition index (Phan et al. 2006). This can be given as ratio between the largest and the corresponding eignvalue \(CI_i=\sqrt(\frac{\lamda_{max}{\lambda_i})\). Collinearity is present when the condition index is >30(Kleinbaum, Kupper, and Muller 1978).

13.2.1.3 weighted least squares

In the section above, it was not stated explicitly but the least squares regression model is appropriate when the variances of the predictor variables are uniform (Smith 1998). In this case the variance of the error matrix is a diagonal matrix with equal diagonal elements. When there are unreliable data or errors in measurement of some of the data, the variance of the error matrix contains unequal diagonal elements. The consequence is that the least squares formula leads to instability of the parameter estimate. Weighted least squares regression is similar to least squares regression except that the variance matrix is weighted \(w\) by the variance of the columns of the predictor variables. \(\beta=(X^TV^-1)^-1X^TV^-1Y\).The diagonal matrix \(V\) contains weights expressed as 1/w along the diagonal elements. These weights are used to down play the importance of the regions where noise occurs and gives appropriate importance to the true data region. The result is a reduction in the variance of the regression coefficients and hence stability in their estimate. Weighted least squares is introduced here because the weighted PLS is used in the PLS-PLR model.

13.2.2 Logistic regression

In logistic regression, there is no algebraic solution to determine the parameter estimate (β coefficient) and a numerical method (trial and error approach) such as maximum likelihood estimate is used to determine the parameter estimate. The General Linear Model \(Y=\beta X\) is modified to the form of the Generalised Linear Model (GLIM) by adding a non-linear link function \(g(\mu)\) . The equation now resembles \(Y=g(\beta X)\).

Consider the binary response (1, 0) as proportions of the predictor variables. is the probability of an event and is the probability of an event not occurring. The odds ratio \(OR\) is given by \(OR=\frac{1}{1-p}\). A logit transformation take the form \(\n=logit_i(p_i)=ln(\frac{p_i}{1-p_i})=\sum(X_ij\beta_i)\). The logistic equation takes the form \(p_i=\frac{e^{X_j\beta_j}}{1-e^{X_j\beta_j}}\)

References

Kleinbaum, David G, Lawrence L. Kupper, and Keith E. Muller. 1978. “Applied Regression Analysis and Other Multivariable Methods.” In.

Phan, T. G., G. A. Donnan, M. Koga, L. A. Mitchell, M. Molan, G. Fitt, W. Chong, M. Holt, and D. C. Reutens. 2006. “The ASPECTS template is weighted in favor of the striatocapsular region.” Neuroimage 31 (2): 477–81.

Smith, Norman R. Draper. Harry. 1998. Applied Regression Analysis, Third Edition. 3rd ed. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc.