WOLFRAM|DEMONSTRATIONS PROJECT

Exploring Multivariate Data

​
dataset
simulation data
pollen data
display
histogram
2D (x, y)
3D (x, y, z)
statistics
x for 3D scatter plot
uniform
y for 3D scatter plot
triangular
z for 3D scatter plot
Poisson
3D point size
3D point cloud controls
zoom point cloud
helicopter flights (viewpoint)
v
1
v
2
v
3
This Demonstration explores two five-dimensional datasets. Basic multivariate numerical statistical summaries are provided, but the emphasis is on interactive graphical exploration using three common displays. The "simulation" dataset has 1000 (five-dimensional) observations and is produced in real-time using built-in Mathematica functions. The variables in the simulation data are uniform(0, 10), triangular(0, 10), Poisson(3), standard normal, and beta(5, 3). The "pollen" data has 3848 (six-dimensional) observations. The "pollen" data is a famous dataset used in a statistical analysis competition at the 1986 Joint Meetings of the American Statistical Association. In the pollen dataset, the first five variables are "ridge", "nub", "crack", "weight", and "density". The sixth variable is just an index number and is not used. A careful exploration of the pollen data will reveal some surprising results.