Eigen Decomposition and PCA
Role of Eigenvalues and eigenvectors in Principal Component Analysis (PCA)

Often, we encounter situations where a certain outcome or a decision is dependent on not just a single factor (predictor) but on multiple factors that complicate the decision making process. This is called the curse of dimensionality. It’s a commonly known fact that it is critical to reduce the number of factors or features to a few important ones to arrive at the right decision. This process is called dimensionality reduction. In machine learning, the problem of high dimensionality is dealt in two ways:
1. Feature selection — is carefully selecting the important features by filtering out the irrelevant features
2. Feature extraction — is creating new and more relevant features from the original features
Principal Component Analysis (PCA) is one of the key techniques of feature extraction.
The intuition behind PCA and when to use it
We see that data almost always comes with information, redundancy, and noise. We always aim to extract the maximum possible information from the data while reducing the noise and ignoring the redundant information. This forms the goal of PCA.
Let’s say, we want to capture a picture of a large group of friends in a single frame. We would try and search for the best possible angle or rearrange the group so that everybody can be captured in a single frame instead of having to click multiple pictures. PCA, in a similar fashion, transforms the correlated features in the data into linearly independent (orthogonal) components so that all the important information from the data is captured while reducing its dimensionality.
PCA can be used when we want to:
- Reduce the number of features but cannot identify the unimportant ones that can be ignored, and
- Ensure that the features of the data are independent of one another even if the features become less interpretable
High school concepts used in PCA
Matrix decomposition… Which school going kid must have thought that the concepts of linear algebra would find such importance in machine learning!?
Well, matrix decomposition is about the factorization of a matrix into a product of matrices. It breaks down a matrix into constituent parts to make certain operations on the matrix easier to perform.
Of the many matrix decompositions, PCA uses eigendecomposition. ‘Eigen’ is a German word that means ‘own’. Here, a matrix (A) is decomposed into:
- A diagonal matrix formed from eigenvalues of matrix-A
- And a matrix formed by the eigenvectors of matrix-A
A square matrix can have one eigenvector and as many eigenvalues as the dimension of the matrix. For example, a 4x4 matrix will have 4 eigenvalues.
How PCA uses this concept of eigendecomposition?
Say, we have a dataset with ‘n’ predictor variables. We center the predictors to their respective means and then get an n x n covariance matrix. This covariance matrix is then decomposed into eigenvalues and eigenvectors.
Covariance matrix (also called as dispersion matrix or variance-covariance matrix) is a matrix whose element in the i,j position is the covariance between the i-th and j-th element (feature) of a random vector (A random vector is a random variable with multiple dimensions).
From the properties of covariance and a covariance matrix we know that:
- Covariance of a random variable (a predictor) with itself is simply its variance
- Each element on the principal diagonal of a covariance matrix shows the variances of each of the random variables
- Every covariance matrix is symmetric
So, a covariance matrix has variances (covariance of a predictor with itself) and covariances (between predictors).
Eigenvectors are unit vectors with length or magnitude equal to 1. They are often referred to as right vectors, which simply means a column vector.
Eigenvalues are coefficients applied to eigenvectors that give the vectors their length or magnitude.
So, PCA is a method that:
- Measures how each variable is associated with one another using a Covariance matrix
- Understands the directions of the spread of our data using Eigenvectors
- Brings out the relative importance of these directions using Eigenvalues
PCA on a data set
PCA method can be described and implemented using the tools of linear algebra using numpy package in python (without using its direct implementation function from the sklearn package).
Let’s say we have a data like this:

We can represent this data as a 4x3 matrix and call it ‘A’.

Now for this matrix, we center the features to mean, calculate with covariance matrix and implement eigendecomposition as below:

In the above output, eigenvectors give the PCA components and eigenvalues give the explained variances of the components. As we have 3 predictors here, we get 3 eigenvalues.
The eigenvectors can now be sorted by the eigenvalues in descending order to provide a ranking of the components or axes of the new subspace for matrix A.
If there are eigenvalues close to zero, they represent components that may be discarded.
A total of ‘n’ (here 3) or fewer components must be selected to comprise the chosen subspace. Ideally, we would select k (< n) eigenvectors, called principal components, that have the k largest eigenvalues.
Let’s check the explained variance ratio of the first component as:
explained variance of 1st component / (total of all explained variances)

We see that the first component is enough to explain up to 99% variance in the data. So, we can now project our data into a 4x1 matrix instead of a 4x3 matrix, thereby reducing the dimension of data, of course with a minor loss in information.
Note: All these steps have a straight-forward implementation in python’s sklearn package for PCA. We have implemented these steps in ‘numpy’ only to bring out the role of eigendecomposition in PCA Analysis.
References: