A kernel density estimate provides a means of estimating and visualizing the probability distribution function of a random variable based on a random sample. In contrast to a histogram, a kernel density estimate provides a smooth estimate, via the effect of a smoothing parameter called the bandwidth, here denoted by h. With the correct choice of bandwidth, important features of the distribution can be seen; an incorrect choice will result in undersmoothing or oversmoothing and obscure those features.
Here we see a histogram and three kernel density estimates for a sample of
waiting times in minutes between eruptions of
Old Faithful Geyser in Yellowstone National
Park, taken from R’s
faithful
dataset. The data follow a bimodal distribution; short
eruptions are followed by a wait time averaging about 55 minutes, and long
eruptions by a wait time averaging about 80 minutes. In recent years, wait
times have been increasing, possibly due to the effects of earthquakes on the
geyser’s geohydrology.