# Kernel Density Estimation

Kernel Density Estimation

Histograms are a useful but limited way to estimate or visualize the true, underlying density of some observed data with an unknown distribution. Histograms are essentially discontinuous step functions. So, if you believe that observed data is generated by a continuous density—or even a differentiable density—then another histogram-like estimation procedure might be preferable.

A kernel histogram is a generalization of the usual histogram. It associates to each data point a function (called a kernel function). The kernel histogram is the (properly renormalized) sum of these functions. Kernel functions typically depend on a parameter, usually called the bandwidth, that significantly affects the roughness or smoothness of the kernel histogram that is ultimately generated. (Somewhat confusingly, kernel functions are themselves density functions.)

Choose a target distribution from which to generate random data, as well as a type of kernel function. For example, the Epanechnikov kernel has certain asymptotic properties that make it a highly desirable kernel, though you can obtain a kernel histogram very much like the usual histogram by choosing a uniform kernel. Add additional random realizations from the target distribution and watch the kernel histogram converge to the true, underlying density.