WOLFRAM|DEMONSTRATIONS PROJECT

Loading Plot of a Principal Component Analysis (PCA)

phase ϕ

frequency

amplitude

corr(

) = 0.38

corr(

) = -0.04

corr(

) = 0.91



sin(

x+ϕ)

sin(x)

cos(x)

percentage of variance: 100.%

Principal component analysis (PCA) is a statistical procedure that converts data with possibly correlated variables into a set of linearly uncorrelated variables, analogous to a principal-axis transformation in mechanics.

This Demonstration shows the loading plot in the space of principal components (PCs) extrapolated from a dataset of three rows. The rows come from three periodic functions, two fixed and uncorrelated and one is described by the parameters phase, frequency, and amplitude. The data is shown at the top-right and the correlation factors, at the top-left.

The calculation of PCs initially requires data standardization in order for the correlation matrix to be obtained. The latter has been used to obtain the eigenvectors matrix which, when multiplied by the original standardized data, gives the PC matrix, whose initial two columns give the new coordinates in the PC space (PC1, PC2) [1].

The percentage of variance explained by this model is calculated by using eigenvalues. The ones in PCA tell you how much variance can be explained by its associated eigenvector. Therefore, the highest eigenvalue indicates the highest variance in the data was observed in the direction of its eigenvector. Singular contribution to variance is calculated by first summing up all eigenvalues and then dividing by an eigenvalue.

Generally, the PCA is used for large datasets as a powerful tool allowing the identification of any correlation among any subsets. In this case, only three sets have been used to better understand this type of data representation.