Geometric Distribution With Python

March 19, 2018

2 min read

Geometric Distribution With Python

This is the first part of a series of articles about different types of distributions. There is a lot of different distributions out there, but we will cover the most popular ones, those more often used to be approximate versions of real data.

The geometric distribution is the distribution of the number of trials needed to get the first success in repeated Bernoulli trials. Repeated Bernoulli trials mean that all trials are independent and each result have two possible outcomes. Random variable *X *represents the number of trials needed to get the first success. In order to the first success occur on the xth trial the first *x-1 *trials must be failures and the xth trial must be a success. The value x can be any integer from 0, it doesn’t have an upper bound.

probability of getting the first success in the x-th trial
probability of getting the first success in the x-th trial

Let’s take a look at the example. Imagine that we have the event with probability to occur equal to* 0.2*. What is the probability that this event occurs on the third attempt?

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
view raw pt_11_1.ipynb hosted with ❤ by GitHub

As you can see from the bar chart the minimum value of the attempt is 1 and there is no maximum value.

Let’s take a look at the characteristics of the geometric distribution. The mean and the variance:

the mean and the variance
the mean and the variance

There is also some interesting properties of the geometric distribution. Let’s find the probability that event occurs before the fourth attempt:

def probability_that_event_occur_before(attempt, probability):
    return 1 - (1 - probability)**(attempt - 1)

p = 0.2
probability_that_event_occur_before(4, p)
# 0.488