Warm tip: This article is reproduced from stackoverflow.com, please click
matplotlib python scipy

rayleigh distribution curve on histogram

发布于 2020-04-18 09:54:48

I have an array of velocity data in directions V_x and V_y. I've plotted a histogram for the velocity norm using the code below,

plt.hist(V_norm_hist, bins=60, density=True, rwidth=0.95)

which gives the following figure: Histograph for velocity norm

Now I also want to add a Rayleigh Distribution curve on top of this, but I can't get it to work. I've been trying different combinations using scipy.stats.rayleigh but the scipy homepage isn't really intuative so I can't get it to function properly... What exactly does the lines

mean, var, skew, kurt = rayleigh.stats(moments='mvsk')

and

x = np.linspace(rayleigh.ppf(0.01),rayleigh.ppf(0.99), 100)
ax.plot(x, rayleigh.pdf(x),'r-', lw=5, alpha=0.6, label='rayleigh pdf')

do?

Questioner
Birks
Viewed
13
JohanC 2020-02-05 03:09

You might need to first follow the link to rv_continuous, from which rayleigh is subclassed. And from there to the ppf to find out that ppf is the 'Percent point function'. x0 = ppf(0.01) tells at which spot everything less than x0 has accumulated 1% of its total 'weight' and similarly x1 = ppf(0.99) is where 99% of the 'weight' is accumulated. np.linspace(x0, x1, 100) divides the space from x0 to x1 in 100 short intervals. As a continuous distribution can be infinite, these x0 and x1 limits are needed to only show the interesting interval.

rayleigh.pdf(x) gives the pdf at x. So, an indication of how probable each x is.

rayleigh.stats(moments='mvsk') where moments is composed of letters [‘mvsk’] defines which moments to compute: ‘m’ = mean, ‘v’ = variance, ‘s’ = (Fisher’s) skew, ‘k’ = (Fisher’s) kurtosis.

To plot both the histogram and the distribution on the same plot, we need to know the parameters of Raleigh that correspond to your sample (loc and scale). Furthermore, both the pdf and the histogram would need the same x and same y. For the x we can take the limits of the histogram bins. For the y, we can scale up the pdf, knowing that the total area of the pdf is supposed to be 1. And the histogram bins are proportional to the number of entries.

If you do know the loc is 0 but don't know the scale, the wikipedia article gives a formula that connects the scale to the mean of your samples:

estimated_rayleigh_scale = samples.mean() / np.sqrt(np.pi / 2)

Supposing a loc of 0 and a scale of 0.08 the code would look like:

from matplotlib import pyplot as plt
import numpy as np
from scipy.stats import rayleigh

N = 1000
# V = np.random.uniform(0, 0.1, 2*N).reshape((N,2))
# V_norm = (np.linalg.norm(V, axis=1))
scale = 0.08
V_norm_hist = scale * np.sqrt( -2* np.log (np.random.uniform(0, 1, N)))
fig, ax = plt.subplots(1, 1)

num_bins = 60
_binvalues, bins, _patches = plt.hist(V_norm_hist, bins=num_bins, density=False, rwidth=1, ec='white', label='Histogram')
x = np.linspace(bins[0], bins[-1], 100)
binwidth = (bins[-1] - bins[0]) / num_bins

scale = V_norm_hist.mean() / np.sqrt(np.pi / 2)

plt.plot(x, rayleigh(loc=0, scale=scale).pdf(x)*len(V_norm_hist)*binwidth, lw=5, alpha=0.6, label=f'Rayleigh pdf (s={scale:.3f})')
plt.legend()
plt.show()

example plot