Exploring Multivariate Data
Exploring Multivariate Data
This Demonstration explores two five-dimensional datasets. Basic multivariate numerical statistical summaries are provided, but the emphasis is on interactive graphical exploration using three common displays. The "simulation" dataset has 1000 (five-dimensional) observations and is produced in real-time using built-in Mathematica functions. The variables in the simulation data are uniform(0, 10), triangular(0, 10), Poisson(3), standard normal, and beta(5, 3). The "pollen" data has 3848 (six-dimensional) observations. The "pollen" data is a famous dataset used in a statistical analysis competition at the 1986 Joint Meetings of the American Statistical Association. In the pollen dataset, the first five variables are "ridge", "nub", "crack", "weight", and "density". The sixth variable is just an index number and is not used. A careful exploration of the pollen data will reveal some surprising results.