Warm tip: This article is reproduced from stackoverflow.com, please click
kernel-density r

Gaussian kernel density estimation in R

发布于 2021-01-28 19:03:16

I am having trouble understanding how to implement a Gaussian kernel density estimation of the following dataset in R. I appreciate if you can help me understand the mechanism of how to do it. I am currently trying to get a formula for the bell shaped curves at the bottom of the following picture. As you can see there is one bell shaped curve for each data point. (Note the picture does not represent the data I am using.)

enter image description here

This is my data:

x<-c(4.09, 4.46, 4.61, 4.30, 4.03, 5.22, 4.21, 4.07, 4.02, 4.58, 4.66, 4.05, 4.23, 5.51, 4.03, 4.72, 4.47, 4.50, 5.80, 4.30, 4.09, 4.78, 4.18, 4.45, 4.40, 5.60, 4.37, 4.42, 4.88, 4.20, 4.45, 4.10, 4.43, 4.58, 4.40, 4.38) (x has 36 elements)

This is the kernel density estimator:

enter image description here

(If you can't see the image, it's from this page http://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/tutorials/xlghtmlnode33.html)

where K(u)= enter image description here

is the Gaussian kernel function and h=.1516 is the bandwidth selected by Scott.

So, plugging in we get f hat (x) = 1/(36*.1516) (1/sqrt(2pi))[e^(-1/2 ((4.09-x)/.1516)^2 + e^(-1/2 ((4.46-x)/.1516)^2 + ... + e^(-1/2 ((4.38-x)/.1516)^2]

Ok. So we have a function of x. But how do we get the equation of each of the bell shaped curves in the above diagram? If we plug in, for example, 4.09, into f hat (x) we get a number, not a curve/function/distribution. Can someone help me understand the procedure to find the equation for the bell shaped curve/kernel density estimate?

Questioner
Stacker
Viewed
0
MrFlick 2020-10-07 09:30

Here's a function that will return your fhat function given your x values and h value

get_fhat <- function(x, h) {
  Vectorize(function(z) 1/length(x)/h*sum(dnorm((x-z)/h)))  
}

This function returns a function that we can use to get values. We Vectorize it so we can pass in multiple values at once to the function.

We can get a single value or plot it with

fhat <- get_fhat(x, .1516)
fhat(4.09)
# [1] 0.9121099
curve(fhat, from=min(x), to=max(x))

enter image description here