WOLFRAM NOTEBOOK

WOLFRAM|DEMONSTRATIONS PROJECT

Nonparametric Curve Estimation by Kernel Smoothers: Efficiency of Unbiased Risk Estimate and GCV Selectors

true function that
underlies the data
bell: 64 (1-x
3
)
3
x
noise level σ
0.32
seed for data
generation
321
dataset size
16
32
64
128
256
512
1024
2048
4096
8192
trial-bandwidth
0.8
data ()
curve estimate (red) with trial-bandwidth
data ()
C
L
curve estimate (purple)
GCV curve estimate (dashed purple)
+
true curve (blue)
and
ASE-optimal curve (green)
This Demonstration considers one of the simplest nonparametric-regression problems: let
f
be a smooth real-valued function over the interval
[0,1]
; recover (or estimate)
f
when one only knows
n
approximate values
y
i
for
i1,2,...,n
that satisfy the model
y
i
=f(
x
i
)+σ
ϵ
i
, where
x
i
=i/n
and the
ϵ
i
are independent, standard normal random variables. Sometimes one assumes that the noise level
σ
is known. Such a curve estimation problem is also called a signal denoising problem.
Perhaps the simplest "solution" to this problem is by the classical and widely used kernel-smoothing method, which is a particular case of the Loess method (locally weighted scatterplot smoothing); see the Details below.
In the kernel-smoothing method (like in any similar nonparametric method, e.g. the smoothing spline method) one or several smoothing parameters have to be appropriately chosen to obtain a "good" estimate of the true curve
f(·)
. For the kernel method, it is known that choosing a good value for the famous bandwidth parameter (see Details) is crucial, much more than the choice of the class of the kernels, which is fixed here.
The curve estimate corresponding to a given value
h
for the bandwidth parameter is then denoted
f
n,h
. Notice that
f
n,h
depends on
f(·)
,
σ
, and the
ϵ
i
's only through the data
y
i
.
Three very popular methods are available for choosing
h
: cross-validation (also called the "leave-one-out" principle; see PRESS statistic), generalized cross validation (GCV), and Mallows
C
L
(also sometimes denoted by
C
p
or UBR for unbiased risk estimate). In fact, cross-validation and GCV coincide in our context, where periodic end conditions are assumed. See [1] for a review; and see [2] for the definition and an analysis of
C
L
. Notice that GCV (in contrast to the
C
L
method) does not require that
σ
be known.
This Demonstration provides interactive assessments of the statistical efficiency of GCV and
C
L
smoothing-parameter selectors; and this can be done for rather large
n
(here,
n
could be taken as large as
13
2
8192
with reasonably fast interactivity on a current personal computer).
Here six examples for
f(·)
(the "true curve") can be tried:
f
is plotted (in blue) in the third of the three possible views using the tabs: 1. The data and the curve-estimate
f
n,h
, where you choose
h
by the trial-bandwidth slider, show that the choice of the bandwidth is crucial. 2. The two-curve estimates given by GCV and
C
L
are very often so similar that they can not be distinguished. 3. The third tab displays quantities related to the "truth": the ASE-optimal choice yields the green curve, this is the
h
-value that minimizes a global discrepancy between
f
n,h
, and
f
, defined by
ASE(h)
1
n
n
i1
2
f(
x
i
)-
f
n,h
(
x
i
)
,
where ASE stands for average of the squared errors. (Note that this ASE-optimal choice is a target that seems to be unattainable in practice, since we do not know the true curve
f
.)
In the third view, the curve-estimate associated with the automatic
C
L
choice is again plotted (purple curve): we can then assess the typical efficiency of
C
L
(or the quite close GCV choice) by comparing this curve estimate with the targeted ASE-optimal curve.
This third view also displays the two associated ASE values, in addition to the one associated with the bandwidth chosen by-eye. This shows that it is difficult to choose by eye a trial-bandwidth in the first view (thus without the help of the true curve plot) that will turn to be as least as good as the GCV or
C
L
choice, when "better" still means a lower ASE-value.
Staying in the third view and only increasing the size of the data set, one typically observes that the
C
L
curve estimate and the ASE-optimal curve generally become very close, and the two associated ASE distances often become relatively similar. This agrees with the known asymptotic theory, see [3] and the references therein.
The rate at which the relative difference between the
C
L
(or GCV) bandwidth and the optimal bandwidth converges to zero can be assessed in the additional panel, top right, in the third view, which shows the two kernels (precisely only the right-hand part of each of the two kernels, since they are even functions) associated with the two bandwidths: the differences between the two bandwidths can be scrutinized with a much better precision than by looking only at the two associated curves.
By varying the seed that generates the data, you can also observe that, in certain cases, there sometimes remains non-negligible differences between the
C
L
(or GCV) choice and the optimal bandwidth.
Wolfram Cloud

You are using a browser not supported by the Wolfram Cloud

Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.


I understand and wish to continue anyway »

You are using a browser not supported by the Wolfram Cloud. Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.