What is Dimension Reduction?
Many times, we have more than two or three predictor variables in our analysis. Visualizing only a couple of variables at a time does not tell the entire story and may lead to bias.
The machine learning tool called “dimension reduction” uses an aggregation technique to compress a large number of variables into fewer dimensions while still holding the attributes original properties (i.e. the distance between one another). This allows us to visualize data sets with many variables in a single plot.
Many times, we have more than two or three predictor variables in our analysis. Visualizing only a couple of variables at a time does not tell the entire story and may lead to bias.
The machine learning tool called “dimension reduction” uses an aggregation technique to compress a large number of variables into fewer dimensions while still holding the attributes original properties (i.e. the distance between one another). This allows us to visualize data sets with many variables in a single plot.
Dimension Reduction Demonstration
The example to the right is from the publicly available “metabo” data set. The data set has 136 observations with 425 predictor variables. The observations are grouped by serum sample types of tuberculin skin test negative (NEG), positive (POS) and clinical tuberculosis (TB). This data set demonstrates how a dimension reduction tool allows the user to view all the 425 variables simultaneously in a single graph. Questions to Ask:

Graph Interpretations

Dimension Reduction Plot

Proportion of Variance Plot
<
>
Cumulative Proportion of Variance Explained Plot
The plot to the left demonstrates how much of the data is explained by incrementally increasing the number of principal components (i.e. the number of dimensions). For example, a plot with three principal components can be interpreted as, “Viewing the data in three dimensions represents 20.3% of variation in the data set”. 