John Firebaugh

Open Source, Ruby, Rubinius, RubySpec, Rails.

Kernel Density Estimation With Protovis

A kernel density estimate provides a means of estimating and visualizing the probability distribution function of a random variable based on a random sample. In contrast to a histogram, a kernel density estimate provides a smooth estimate, via the effect of a smoothing parameter called the bandwidth, here denoted by h. With the correct choice of bandwidth, important features of the distribution can be seen; an incorrect choice will result in undersmoothing or oversmoothing and obscure those features.

Here we see a histogram and three kernel density estimates for a sample of waiting times in minutes between eruptions of Old Faithful Geyser in Yellowstone National Park, taken from R’s faithful dataset. The data follow a bimodal distribution; short eruptions are followed by a wait time averaging about 55 minutes, and long eruptions by a wait time averaging about 80 minutes. In recent years, wait times have been increasing, possibly due to the effects of earthquakes on the geyser’s geohydrology.