WOLFRAM|DEMONSTRATIONS PROJECT

Choosing a Data Transformation with the Box-Whisker Plot

​
n
50
999
λ
1
data distribution
normal
exponential
lognormal
inverse Gaussian
Weibull
random seed
212
This Demonstration shows the effect of a power data transformation,
(λ)
y
=

λ
y
-1λ
λ≠0
log(y)
λ=0
,
on data,
y
, from simulated samples of size
n=50
or
n=999
from normal, exponential, lognormal, inverse Gaussian, or Weibull distributions for
λ∈(-2,2)
.
In practice, a suitable power transformation can be selected by examining the effect of the transformation using a box-and-whisker plot. The simplest power transformation which makes the data approximately symmetric is selected. With actual data, often
λ=-1,0,0.5,1
corresponding to reciprocal, log, square root, or no transformation.
Two skewness statistics—the usual Pearson skewness,
g
3
, and the Bowley skewness,
B
—are displayed for comparison with the plot.
Another method for choosing
λ
treats
λ
as a parameter and makes the assumption that for some value of
λ
, the data is normally distributed. Under this assumption, the likelihood function
ℒ(λ)
may be obtained and it may be numerically maximized to obtain the maximum likelihood estimate for
λ
,

λ
=argmaxℒ(λ)
. A range of plausible values for
λ
is given by all
λ
for which
R(λ)>1%
, where
R(λ)=ℒ(λ)ℒ

λ

.
Try experimenting with different sample sizes
n
and different distributions.
In actual applications, real data (not simulated data) would be used. Using a suitable power transformation often simplifies the statistical analysis.