Detailed Steps in Principal Component Analysis
The detailed steps in Principal Component Analysis are as follows:
Step 1: Standardization
The purpose of this step is to standardize the ranges of the variables of the input dataset in such a way that each of them can be analyzed in a roughly proportional manner.
More specifically, the reason the data must be standardized before using PCA is that PCA is very sensitive to the variance of the initial variables. That is, if there is a large difference between the ranges of the initial variables, then variables with larger ranges will occupy variables with smaller ranges (e.g., a variable with a range between 0 and 100 will occupy a variable with a range between 0 and 1), which will lead to bias in the principal components.
Thus, converting the data into comparable proportions avoids this problem. Mathematically, this step can be accomplished by subtracting the mean and dividing by the standard deviation of each variable’s value. As soon as the standardization is complete, all variables will be converted to the same range [0,1].
Step 2: Covariance Matrix Calculation
To understand how the variables in the input dataset vary relative to the mean. Or in other words, to see if there is any relationship between them. Because sometimes, variables are highly correlated with each other because they contain a lot of information. Therefore, to identify these correlations, we perform covariance matrix calculations.
Step 3: Calculate the eigenvectors and eigenvalues of the covariance matrix to identify the principal components
Eigenvectors and eigenvalues are linear algebra concepts that need to be calculated from the covariance matrix in order to identify the principal components of the data. Before we start explaining these concepts, let us first understand the meaning of principal components.
Principal components are new variables that consist of a linear combination or mixture of initial variables. The new variables in that combination (e.g., principal components) are uncorrelated with each other, and most of the initial variables are compressed into the first component.
So, 10-dimensional data will show 10 principal components, but PCA tries to get as much information as possible in the first component, then as much of the remaining information as possible in the second component, and so on.
Step 4: Eigenvectors
As we saw in the previous step, calculating the eigenvectors and arranging them in descending order of their eigenvalues allows us to find the principal components in order of importance. What we have to do in this step is to choose whether to keep all the components or discard the ones that are less important (low eigenvalues) and form a vector matrix with the other components, which we call the eigenvectors.
The feature vector is thus just a matrix containing the eigenvectors of the components we decide to keep as columns. This is the first step in dimensionality reduction, because if we choose to keep only p of the n feature vectors (components), the final dataset will be only p-dimensional.
Step 5: Re-plot the data along the principal component axes
In the previous steps, you didn’t need to change any of the data except for the normalization, just select the principal components and form the eigenvectors, but input them into the dataset always in unison with the original axes (i.e., the initial variables).
The goal of this step, and the final step, is to use the eigenvectors of the covariance matrix to form new eigenvectors that will relocate the data from the original axes into the axes consisting of the principal components (hence the name principal component analysis). This can be accomplished by multiplying the transpose of the original dataset by the transpose of the eigenvectors.
Application of principal component analysis in mathematical modeling and detailed steps
Steps of analysis:
Seeking the correlation coefficient matrix;
A series of orthogonal transformations, so that the number on the non-diagonal line is set to 0 and added to the main diagonal;
Get the eigenroots (i.e., the variance of the variance due to the corresponding one of the principal components) and arrange the eigenroots in order of the largest to the smallest;
The corresponding eigenvectors of each eigenroot;
Calculated each eigenroot with the following formula. The eigenroots are arranged in the order from the largest to the smallest;
The eigenvector corresponding to each eigenroot is obtained;
Calculate the contribution rate of each eigenroot Vi with the following formula;Vi=xi/(x1+x2+……..)
Interpret the physical meaning of principal components based on the eigenroots and their eigenvectors.
Principal component analysis, also known as principal component analysis, aims to utilize the idea of dimensionality reduction to transform multiple indicators into a few composite indicators. In the study of practical problems, in order to comprehensively and systematically analyze the problem, we must consider numerous influencing factors. These involved factors are generally called indicators, which are also called variables in multivariate statistical analysis. Because each variable reflects, to varying degrees, certain information about the problem under study, and because the indicators have a certain correlation with each other, the information reflected in the resulting statistical data overlaps to some extent. In the use of statistical methods to study multivariate problems, too many variables will increase the amount of calculation and increase the complexity of the analysis of the problem, people want to carry out quantitative analysis process, involving fewer variables, the amount of information obtained more.