Poisson distribution in Python

A Poisson distribution is the probability distribution of independent occurrences in an interval. Poisson distribution is used for count-based distributions where these events happen with a known average rate and independently of the time since the last event. For example, If the average number of cars that cross a particular street in a day is 25, then you can find the probability of 28 cars passing the street using the poisson formula given by.

e is the base of natural logarithms (2.7183)
μ is the mean number of occurrences (25 in this case)
x is the number of occurrences in question (28 in this case)

At any day we can see 0,1,2,3,….25.. 30.. numbers on cars on the street with an average of around 25 cars. So to find 28 cars we would have to calculate

With the Poisson function, we define the mean value, which is 25 cars. The python function gives the probability, which is around (0.0632) 6%, that 28 cars will pass the street.

The formula may seem complicated to solve through hands but with python libraries its a piece of cake.

In this article, we will see how we can create a Poisson probability mass function plot in Python. In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume. 

In order to plot the Poisson distribution, we will use scipy module. SciPy is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.

Expected number of events occurring in a fixed-time interval, must be >= 0. A sequence must be broadcastable over the requested size.

sizeint or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if lam is a scalar. Otherwise, np.array(lam).size samples are drawn.

Returns:outndarray or scalar

Drawn samples from the parameterized Poisson distribution.

See also

which should be used for new code.

Notes

The Poisson distribution

\[f(k; \lambda)=\frac{\lambda^k e^{-\lambda}}{k!}\]

For events with an expected separation \(\lambda\) the Poisson distribution \(f(k; \lambda)\) describes the probability of \(k\) events occurring within the observed interval \(\lambda\).

Because the output is limited to the range of the C int64 type, a ValueError is raised when lam is within 10 sigma of the maximum representable value.

References

[1]

Weisstein, Eric W. “Poisson Distribution.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/PoissonDistribution.html

A Poisson distribution is a distribution which shows the likely number of times that an event will occur within a pre-determined period of time. It is used for independent events which occur at a constant rate within a given interval of time. The Poisson distribution is a discrete function, meaning that the event can only be measured as occurring or not as occurring, meaning the variable can only be measured in whole numbers.

We use the seaborn python library which has in-built functions to create such probability distribution graphs. Also the scipy package helps is creating the binomial distribution.

from scipy.stats import poisson
import seaborn as sb

data_binom = poisson.rvs(mu=4, size=10000)
ax = sb.distplot(data_binom,
                  kde=True,
                  color='green',
                  hist_kws={"linewidth": 25,'alpha':1})
ax.set(xlabel='Poisson', ylabel='Frequency')

Its output is as follows −

To continue following this tutorial we will need the following Python libraries: scipy, numpy, and matplotlib.

If you don’t have it installed, please open “Command Prompt” (on Windows) and install it using the following code:


pip install scipy
pip install numpy
pip install matplotlib

What is a Poisson process

A Poisson point process (or simply, Poisson process) is a collection of points randomly located in mathematical space.

Due to its several properties, the Poisson process is often defined on a real line, where it can be considered a random (stochastic) process in one dimension. This further allows to build mathematical systems and study certain events that appear in a random manner.

One of its important properties is that each point of the process is stochastically independent from other points in the process.

As an example we can think of an example where such process can be observed in real life. Suppose you are studying the historical frequencies of hurricanes. This indeed is a random process, since the number of hurricanes this year is independent of the number of hurricanes las year and so on. However, over time you may be observing some trends, average frequency, and more.

Mathematically speaking, in this case, the point process depends on something that might be some constant, such as average rate (average number of customers calling, for example).

A Poisson process is defined by a Poisson distribution.


What is a Poisson distribution?

A Poisson distribution is a discrete probability distribution of a number of events occurring in a fixed interval of time given two conditions:

  1. Events occur with some constant mean rate.
  2. Events are independent of each other and independent of time.

To put this in some context, consider our example of frequencies of hurricanes from the previous section.

Assume that when we have data on observing hurricanes over a period of 20 years. We find that the average number of hurricanes per year is 7. Each year is independent of previous years, which means that if we observed 8 hurricanes this year, it doesn’t mean we will observe 8 next year.

The PMF (probability mass function) of a Poisson distribution is given by:

$$p(k, \lambda) = \frac{\lambda^{k}e^{-\lambda}}{k!}$$

where:

  • \(\lambda\) is a real positive number given by \(\lambda = E(X) = \mu\)
  • \(k\) is the number of occurrences
  • \(e = 2.71828\)

The \(Pr(X=k)\) can be read as: Poisson probability of k events in an interval.

And the CDF (cumulative distribution function) of a Poisson distribution is given by:

$$F(k, \lambda) = \sum^{k}_{i=0} \frac{\lambda^{i}e^{-\lambda}}{i!}$$


Poisson distribution example

Now that we know some formulas to work with, let’s go through an example in detail.

Recall the hurricanes data we mentioned in the previous sections. We know that the historical frequency of hurricanes is 7 per year (which is the rate, \(\mu\), and this forms our \(\lambda\) value (since \(\lambda=\mu\)):

$$\lambda = 7$$

The question we can have is what is the probability of observing exactly 5 hurricanes this year? And this forms our \(k\) value:

$$k = 5$$

Using the formula from the previous section, we can calculate the Poisson probability:

$$p(5, 7) = \frac{(7^{5})(e^{-7})}{5!} = 0.12772 \approx 12.77\%$$

Therefore, the probability of observing exactly 5 hurricanes next year is equal to 12.77%.

Naturally, we are curious about the probabilities of other frequencies.


Poisson PMF (probability mass function)

Consider the table below which shows the Poisson probability of hurricane frequencies (0-15):

\(k\)\(p(k, \lambda)\)%00.000910.09%10.006380.64%20.022342.23%30.052135.21%40.091239.12%50.1277212.77%60.1490014.9%70.1490014.9%80.1303813.04%90.1014010.14%100.070987.01%110.045174.52%120.026352.64%130.014191.42%140.007090.71%150.003310.33%160.001450.15%

Using the above table we can create the following visualization of the Poisson probability mass function for this example:

Poisson PMF


Poisson CDF (cumulative distribution function)

Consider the table below which shows the Poisson cumulative probability of hurricane frequencies (0-15):

\(k\)\(F(k, \lambda)\)%00.000910.09%10.007300.73%20.029642.96%30.081778.18%40.1729917.3%50.3007130.07%60.4497144.97%70.5987159.87%80.7290972.91%90.8305083.05%100.9014890.15%110.9466594.67%120.9730097.3%130.9871998.72%140.9942899.43%150.9975999.76%160.9990499.9%

Using the above table we can create the following visualization of the Poisson cumulative distribution function for this example:

Poisson CDF

The table also allows us to answer some interesting questions.

For example, what if we wanted to find out the probability of seeing up to 5 hurricanes (mathematically: \(k\leq5\)), we can see that it’s \(0.30071\) or \(30.07\%\).

On the other hand, we can be interested in probability of observing more than 5 hurricanes (mathematically: \(k>5\)), which would be \(1-p(5,7) = 1-0.30071 = 0.69929\) or \(69.93\%\).


Poisson distribution example in Python

In the previous section we computed probability mass function and cumulative distribution function by hand. In this section, we will reproduce the same results using Python.

We will begin with importing the required dependencies:


import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson

Next we will need an array of the \(k\) values for which we will compute the Poisson PMF. In the previous section, we calculated it for 16 values of \(k\) from 0 to 16, so let’s create an array with these values:


k = np.arange(0, 17)

print(k)

You should get:

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16]

In the following sections we will focus on calculating the PMF and CDF using Python.


Poisson PMF (probability mass function) in Python

In order to calculate the Poisson PMF using Python, we will use the .pmf() method of the scipy.poisson generator. It will need two parameters:

  1. \(k\) value (the k array that we created)
  2. \(\mu\) value (which we will set to 7 as in our example)

And now we can create an array with Poisson probability values:


pmf = poisson.pmf(k, mu=7)
pmf = np.round(pmf, 5)

print(pmf)

And you should get:

[0.00091 0.00638 0.02234 0.05213 0.09123 0.12772 0.149   0.149   0.13038 0.1014  0.07098 0.04517 0.02635 0.01419 0.00709 0.00331 0.00145]

Note:

If you want to print it in a nicer way with each \(k\) value and the corresponding probability:


for val, prob in zip(k,pmf):
    print(f"k-value {val} has probability = {prob}")

And you should get:

k-value 0 has probability = 0.00091
k-value 1 has probability = 0.00638
k-value 2 has probability = 0.02234
k-value 3 has probability = 0.05213
k-value 4 has probability = 0.09123
k-value 5 has probability = 0.12772
k-value 6 has probability = 0.149
k-value 7 has probability = 0.149
k-value 8 has probability = 0.13038
k-value 9 has probability = 0.1014
k-value 10 has probability = 0.07098
k-value 11 has probability = 0.04517
k-value 12 has probability = 0.02635
k-value 13 has probability = 0.01419
k-value 14 has probability = 0.00709
k-value 15 has probability = 0.00331
k-value 16 has probability = 0.00145

which is exactly the same as we saw in the example where we calculated probabilities by hand.


Plot Poisson PMF using Python

We will need the k values array that we created earlier as well as the pmf values array in this step.

Using matplotlib library, we can easily plot the Poisson PMF using Python:


plt.plot(k, pmf, marker='o')
plt.xlabel('k')
plt.ylabel('Probability')

plt.show()

And you should get:

Poisson PMF using Python


Poisson CDF (cumulative distribution function) in Python

In order to calculate the Poisson CDF using Python, we will use the .cdf() method of the scipy.poisson generator. It will need two parameters:

  1. \(k\) value (the k array that we created)
  2. \(\mu\) value (which we will set to 7 as in our example)

And now we can create an array with Poisson cumulative probability values:


cdf = poisson.cdf(k, mu=7)
cdf = np.round(cdf, 3)

print(cdf)

And you should get:


import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson
0

Note:

If you want to print it in a nicer way with each \(k\) value and the corresponding cumulative probability:


import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson
1

And you should get:


import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson
2

which is exactly the same as we saw in the example where we calculated cumulative probabilities by hand.


Plot Poisson CDF using Python

We will need the k values array that we created earlier as well as the pmf values array in this step.

Using matplotlib library, we can easily plot the Poisson PMF using Python:


import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson
3

And you should get:

Poisson CDF using Python


Conclusion

In this article we explored Poisson distribution and Poisson process, as well as how to create and plot Poisson distribution in Python.

Feel free to leave comments below if you have any questions or have suggestions for some edits and check out more of my Statistics articles.

How do you write a Poisson distribution in Python?

P(X=r)=e−λ∗λrr! In the above formula, the λ represents the mean number of occurrences, r represents different values of random variable X.

How do you check if data follows a Poisson distribution in Python?

2 Answers.
Figure out which distribution you want to compare against..
For that distribution, identify what the relevant parameters are that completely describe that distribution. ... .
Use your own data to estimate that parameter. ... .
Compare the generated values of the Poisson distribution to the values of your actual data..

How to generate random numbers with Poisson distribution in Python?

With the help of numpy. random. poisson() method, we can get the random samples from poisson distribution and return the random samples by using this method. Return : Return the random samples as numpy array.

How do you find the probability distribution in Python?

Binomial Distribution in Python You can generate a binomial distributed discrete random variable using scipy. stats module's binom. rvs() method which takes $n$ (number of trials) and $p$ (probability of success) as shape parameters. To shift distribution use the loc parameter.