Choosing a Data Transformation with the Box-Whisker Plot
Choosing a Data Transformation with the Box-Whisker Plot
This Demonstration shows the effect of a power data transformation,
(λ)
y
λ y | λ≠0 |
log(y) | λ=0 |
on data, , from simulated samples of size or from normal, exponential, lognormal, inverse Gaussian, or Weibull distributions for .
y
n=50
n=999
λ∈(-2,2)
In practice, a suitable power transformation can be selected by examining the effect of the transformation using a box-and-whisker plot. The simplest power transformation which makes the data approximately symmetric is selected. With actual data, often corresponding to reciprocal, log, square root, or no transformation.
λ=-1,0,0.5,1
Two skewness statistics—the usual Pearson skewness, , and the Bowley skewness, —are displayed for comparison with the plot.
g
3
B
Another method for choosing treats as a parameter and makes the assumption that for some value of , the data is normally distributed. Under this assumption, the likelihood function may be obtained and it may be numerically maximized to obtain the maximum likelihood estimate for , =argmaxℒ(λ). A range of plausible values for is given by all for which , where .
λ
λ
λ
ℒ(λ)
λ
λ
λ
λ
R(λ)>1%
R(λ)=ℒ(λ)ℒ
λ
Try experimenting with different sample sizes and different distributions.
n
In actual applications, real data (not simulated data) would be used. Using a suitable power transformation often simplifies the statistical analysis.