Ch. 3 - Discrete Distributions

Notes:

Outline

Random Variables
Discrete Probability Distribution
Cumulative Distribution Function
Expectation and Variance
Binomial Distribution

Random Variables

A real-valued random variable is a quantitative variable which takes its values depending on the outcome of a random experiment.

It is a variable because different numerical values are possible.
It is random because the observed value of the variable depends on the outcome of the random experiment.

Typically, we denote random variables by upper-case letters like X ,Y ,Z , etc., and their different realizations when the random experiment is performed by lower-case letters like x,y ,z, etc.

However, the lower-case letters need not always match the upper-case letters (i.e. the name of the random variables).

Example

Consider the random experiment of flipping a fair coin thrice. Here the sample space Sis given by

S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT} .

Let X denote the number of heads that appear. Observe that
- X = 0 if and only if {TTT} occurs
  - No heads occur.
- X = 1 if and only if {HTT , THT , TTH} occurs
  - Exactly one heads appeared.
- X = 2 if and only if {HHT , HTH, THH} occurs
  - Exactly two heads appeared.
- X = 3 if and only if {HHH} occurs
  - Exactly three heads appeared.
Clearly, X is a numerical variable taking the values 0,1,2,3, depending on the outcome of the random experiment. Hence, X defined as above is a random variable.
- Note that by definition a qualitative/categorical variable can never be a random variable

Discrete Random Variables

A random variable is discrete if it takes on countably many values (finite or infinite) only.

Number of heads in 10 flips of a fair coin
Sum of two faces when a six–faced fair die is rolled twice
Number of occurrences of car accidents at an intersection
Number of data packets arriving in a network device

Continuous Random Variable

A random variable is continuous if it assumes its values within an
interval (bounded or unbounded).
5. Arrival time of a message on a communication network.
6. Life time of an electrical or electronic device
7. Strength of concrete weld

Discrete Probability Distribution

Distribution of a Random Variable

Any outcome involving the occurrences of various values of a random
variable $X$ is an event.

As we discussed, a random variable will assume its different values depending on the outcome of the experiment
Will be depended on one ore more than one outcomes
The occurrences of various values of a random variable X can be considered as events, therefore you can attach probabilities to these events.
That means you are interested in learning what is the likelihood of the occurrences of such events.

Therefore, we can attach probabilities to find the chances or the
likelihoods of the occurrences of such events.

Probability Distribution

The probability distribution of a random variable $X$ is a law or function that completely characterizes the probabilities of the occurrences of all possible values of $X$ .

If we know the probability distribution of $X$ , we know everything about
$X$ , at least, theoretically.

Discrete Distributions
Continuous Distributions
Mixture Distributions (beyond the scope of this course)

Discrete Probability Distribution

Probability Mass Function
Let $X$ be a discrete random variable taking countably many values, say,
$x_{1}, x_{2}, . . .,$ only. Then the probability mass function (pmf) of $X$ is a
function $f$ defined at each $x_{i}$ such that

f (x_{i}) = P (X = x_{i}), for i ≥1

that must satisfy the following two properties:

The probability of each possible outcome is between 0 and 1:

0 \leq f (x_{i}) = P (X = x_{i}) \leq 1, for all i ≥1;

The sum of these individual probabilities must be 1:

\sum_{i} f (x_{i}) = \sum_{i} P (X = x_{i}) = 1

Informally, the PMF says “if you tell me the value of $x$ , I will give you back the probability with which $X$ takes the value $x$ ”.

For example, $f (2) = P (X = 2)$ denotes the probability that the resulting $X$ value is 2.

Write down the probability of the following events in terms of $f (x)$ when the random variable $X$ assumes the values 0,1,...,10:

P(3 ≤ X ≤ 6)
- {3 ≤ X ≤ 6} = {X = 3} OR {X = 4} OR {X = 5} OR
- They are all disjoint, cannot happen at the same time
  - Probability of their union is equal to the sum of their individual probabilities
- P(3 ≤ X ≤ 6)
  - P(X=3) + P(X=4) + P(X=5) + P(X=6)
  - = f(3) + f(4) + f(5) + f(6)
P(3 < X ≤ 6)
- {3 < X ≤ 6} = {X=4} OR {X=5} OR
- P(3 < X ≤ 6) = ...
P(3 < X < 6)

Example

Suppose you flip a fair coin thrice. Here, the sample space

S = {HHH,HHT,HTH,THH,HTT,THT,TTH,TTT} .

Let X denote the number of heads that appear. Then X ∈{0,1,2,3}.

X = 0 ⇔ {TTT} occurs ⇒ P(X = 0) = 1/8 > 0
X = 1 ⇔ {HTT, THT, TTH} occurs ⇒ P(X = 1) = 3/8 > 0
X = 2 ⇔ {HHT, HTH, THH} occurs ⇒ P(X = 2) = 3/8 > 0
X = 3 ⇔ {HHH} occurs ⇒ P(X = 3) = 1/8 > 0
You can apply the classical definition of probability to enumerate probabilities

Pasted image 20250916153359.png|150

Example (Contd.)

What is the probability of observing exactly one head?

Answer: $P (X = 1) = \frac{3}{8}$
What is the probability of observing at most one head?

Answer:
- Observe that, {X ≤1}= {X = 0}∪{X = 1}.
- The events {X = 0}, and {X = 1}are disjoint.
- P(X ≤1) = P(X = 0) + P(X = 1) = 1/8 + 3/8 = 4/8 = 1/2
What is the probability of observing at least one head?

Answer:
- {X ≥ 1}= {X = 1} ∪ {X = 2} ∪ {X = 3}.
- {X = 1}, {X = 2}, and {X = 3} are disjoint.
- P(X ≥ 1) = P(X = 1) + P(X = 2) + P(X = 3) = 3/8 + 3/8 + 1/8 = 7/8

Example

Suppose we roll two six–faced fair dice simultaneously. Then the sample space S would comprise of 36 many equally likely paired outcomes as

S = (x, y) : x, y = 1, 2, 3, 4, 5, 6 .

Observe that the sample space here is finite because it contains 36 many elementary outcomes and all of these elementary outcomes are equally likely to occur.

Let X denote the sum of the two faces obtained. Then X takes the values 2,3,4,...,11,12, depending on the outcome of the experiment.

Observe that

X =
2 if & only if (1,1) occurs
3 if & only if (1,2) or (2,1) occurs
4 if & only if (1,3), (2,2) or (3,1) occurs
5 if & only if (1,4), (2,3), (3,2) or (4,1) occurs
6 if & only if (1,5), (2,4), (3,3), (4,2) or (5,1) occurs
7 if & only if (1,6), (2,5), (3,4), (4,3), (5,2) or (6,1) occurs
8 if & only if (2,6), (3,5), (4,4), (5,3) or (6,2) occurs
9 if & only if (3,6), (4,5), (5,4) or (6,3) occurs
10 if & only if (4,6), (5,5) or (6,4) occurs
11 if & only if (5,6) or (6,5) occurs
12 if & only if (6,6) occurs

Now our next aim is to find the probability mass function of X.

Here the porbabilityt mass function of X is
- f(x) = P(X = x),
- where x = 2,3,...12.
- 0 <= f(x) <= 1, for all x.
- $\sum_{x = 2}^{12}$ f(x) = 1
  - "Summation should be equal to 1"

Example with f(2), f(3):

f(2) = P(X=2) = P({(1,1)}) = 1/36
- Follows from the classical definition of probability
f(3) = P(X=3) = P({(1,2), (2,1)}) = 2/36
f(4) = P(X=4) = P({(1,3), (2,2), (3,1)}) = 3/36

Pasted image 20250920124810.png|600

This is the probability mass function table for the discrete random variable $X$ .
In the first row you can see different possible $X$ values arranged in ascending order left to right
In the second row you will find various values of the corresponding probability mass function
- Corresponding probabilities of occurrences

What is the probability that the sum $X$ is an even number?

X is even \Leftrightarrow X = 2 \cup X = 4 \cup X = 6 \cup X = 8 \cup X = 10 \cup X = 12

Since the above events are mutually exclusive, using Axiom 3, we obtain

P(X is even)
= P({X = 2} ∪ {X = 4} ∪ {X = 6} ∪ {X = 8} ∪ {X = 10} ∪ {X = 12})
= P(X = 2) + P(X = 4) +···+ P(X = 12)
= 1/36 + 3/36 + 5/36 + 5/36 + 4/36 + 1/36 = 1/2.

P(X ≤ 4) = P(X = 2) + P(X = 3) + P(X = 4) = 6/36 = 1/6
- Observe that if x is either 2 or 3 or 4, then we must have X <= 4.
- These are 3 disjoint events
- Follows from Axiom 3 of probability
P(X > 4) = 1 − P(X ≤ 4) = 1−1/6 = 5/6
- Note that the event X > 4 is nothing but the complement of the event X <= 4.
  - {x > 4} = {x <= 4}c

Cumulative Distribution Function

The cumulative distribution function (cdf) $F (x)$ of a discrete r.v. (random variable) $X$ with pmf $f (x)$ is defined as:

F (x) = P (X \leq x) = \sum_{y : y \leq x} f (y)

Sum taken over all of those lowercase 'y' values such that 'y' is less than or equal to x.
The cumulative distribution function is defined for any arbitrary real number
- $- \infty < x < \infty$
cdf is the cumulative probability that the random variable capital $X$ is at most $x$ .
- This means the value of the cumulative distribution function must always lie in between zero and one, both inclusive for any real number $x$ .

For any real number $x, F (x)$ is the probability that the observed value of $X$ will be less than or equal to $x$ .

The CDF must satisfy the following properties:

$0 \leq F (x) \leq 1$ for any real number $x$
$F (- \infty) = lim_{x \to - \infty} F (x) = 0$ and $F (\infty) = lim_{x \to \infty} F (x) = 1$
It is non-decreasing: If $x \leq y$ then $F (x) \leq F (y)$
If $x < y, P (x < X \leq y) = F (y) - F (x)$
F is right-continuous everywhere: $F (x) = lim_{y \to x +} F (y)$ for all $x$
Note that if you know the cdf of a discrete random variable X, then you can obtain the probability that x should be smaller than X <= y as the CDF F(y) - CDF F(x).

Remarks

The pmf can be obtained from the cdf as

f (x_{1}) = F (x_{1}), f (x_{i}) = F (x_{i}) - F (x_{i - 1}), i = 2, 3, . . .

- f: PMF
- F: CDF

The probability of events of the form [a < X ≤ b] is given in terms of the pmf as

P (a < X \leq b) = \sum_{a < x_{i} \leq b} f (x_{i})

- Not this sum can also be written as (example):
- $[f (1) + f (2) + f (3) + f (4) + f (5) + f (6)] - [f (1) + f (2) + f (3)]$
- Writing the sum as the difference of these two sums

and the cdf

P (a < X \leq b) = F (b) - F (a)

Important Remark If X is discrete with cdf F, then

P (a \leq X \leq b) = F (b) - F (a)

need not be true.

It may not be true
However we will see later on that if X is a continuous random variable, this equality will actually always be true.

Example:

X : Discrete rv $\in$
f : PMF of X
F: CDF of X
f(i) > 0 for i = 1,2,...,10
- These points are called "Mass points"
P(3 < X <= 6) = P(X = 4) + P(X = 5) + P(X = 6)
$P (3 < X \leq 6) [f (1) + f (2) + f (3) + f (4) + f (5) + f (6)] - [f (1) + f (2) + f (3)]$
Now what is the first term?
- Nothing but P(X<=6)
The second term?
- P(X<=3)
You can write this probability as
- F(6) - F(3)
- Since
  - F(6) = P(X<=6), and
  - F(3) = P(X<=3)

Expectation and Variance

Expectation of a Discrete Random Variable

Suppose $X$ is a discrete r.v. with pmf $f$ , and taking countably many values, say, $x_{1}, x_{2}, . . . .$ Then, the expectation (or, expected value) of $X$ , denoted $E [X]$ or $µ X$ , is defined as:

µ X = E [X] = \sum_{i} x_{i} f (x_{i})

provided $\sum_{i} | x_{i} | f (x_{i}) < \infty$ .

(In this course, we shall always assume that $µ X = E [X]$ is well-defined unless stated otherwise.)

Important: $E [X]$ or $µ X$ is also known as the population mean or, population average of $X$ .

If $h (X)$ is any real-valued function of $X$ with $i | h (x_{i}) | f (x_{i}) < \infty$ , the expected value of $h (X)$ is defined as

E [h (X)] = \sum_{i} h (x_{i}) f (x_{i})

(In this course, we shall always assume that $E [h (X)]$ is well-defined unless stated otherwise.)

Long-Term Interpretation of E(X)

The expected value is often referred to as the ‘long-term’ average or
mean. This means:

”If the random experiment is replicated a large number of times under
identical conditions (a hypothetical situation), then the average of the $X$
values observed in those large number of replications of the random
experiment will be approximately $E (X)$ .”

Replicate the same random experiment under identical conditions a sufficiently large number of times, say, $N$ , many times, where $N$ is a large positive integer.
Record the values of $X$ observed in those $N$ identical replications of the random experiment, say, $X_{1}, X_{2}, . . ., X_{N}$ .
Then,

E (X) \approx \frac{1}{N} [X_{1} + X_{2} + \dots + X_{N}]

provided N is sufficiently large.

Also applicable for continuous random variables

Note: The expectation of X is nothing but a measure of center of a give probability distribution.

Note that the expected value of X may not be among one of the possible X values.
It may or may not belong to the set of possible X values.

Example 1

Suppose, a fair coin is flipped thrice. Let X be the number of heads that
Then X takes the values 0,1,2, and 3, with probabilities

f (0) = 1 / 8, f (1) = 3 / 8, f (2) = 3 / 8, and f (3) = 1 / 8,

respectively

Pasted image 20250920192218.png|300

For instance:
- The first entry in the third column can be obtained by multiplying 0 with its corresponding probability of occurrence (1/8)
- Likewise the second entry on the third column is 3/8 which can be obtained by multiplying the probable x value of one on the first column with a corresponding probability of occurrence 3/8 in the second column
Once enumerated, then you can simply take the sum of all of them
- The expected value of X would be equal to 1.5

Hence, $E (X) = \sum_{x} x f (x) = 1.5$ , which is not a probable X−value.

This can be interpreted with the help of long term frequent interpretation of expected values

Example 2

Let the random variable X denote the sum of the two faces when two six-faced fair dice are rolled simultaneously.

Pasted image 20250920192740.png|350

Hence, $E (X) = \sum_{x} x f (x) = 7$ , which is a probable value of $X$ .

Variance of discrete Random Variable

The most common measure of spread of a r.v. is its variance.

Suppose $X$ is a discrete rv with pmf $f$ , and taking countably many values, say, $x_{1}, x_{2}, . . . .$ Let $E [X] = µ X$

Then the variance of $X$ , denoted $σ_{X}^{2}$ , is defined as

σ_{X}^{2} = Var [X] = E [(X - µ_{X})^{2}] = \sum_{i} (x_{i} - µ_{X})^{2} f (x_{i}),

provided the sum is finite.

The standard deviation is $σ_{X} = \sqrt{Var [X]}$

Important: Again, these are population level quantities, not to be confused with the sample variance or sample standard deviation.

These are the population variance and the population standard deviation of the random variable $X$ . This is the most accurate definition.
Not to be confused with sample variance and sample standard deviation

Fact: $σ_{X}^{2} = E [X^{2}] - µ_{X}^{2} = \sum_{i} x_{i}^{2} f (x_{i}) - µ_{X}^{2}$

The variance of a discrete rv $X$ can be written as the difference of the expected value of $X^{2}$ minus the square of the expected value of $X$
This holds true for any arbitrary probability distribution, either discrete or continuous.

Binomial Distribution

Bernoulli Trial

A random experiment whose outcomes can be classified into one of two
mutually complementary categories: either a ‘success’ or, a ‘failure’, is
called a Bernoulli trial.

Flipping a coin once
- ‘Occurrence of a Head’ = ‘Success’
- ‘Occurrence of a Tail’ = ‘Failure’
Rolling a six faced die once
- ‘Occurrence of an even number’ = ‘Success’
- ‘Occurrence of an odd number’ = ‘Failure’
Drawing a card at random from a well-shuffled deck of 52 cards
- ‘Drawing a Spade’ = ‘Success’
- ‘Not drawing a Spade’ = ‘Failure’
Drawing a ball at random from an urn containing 6 blue and 4 red balls
- ‘Drawing of a blue ball’ = ‘Success’
- ‘Drawing of a red ball’ = ‘Failure’

For each of these examples you can see that the outcomes of the underlying random experiments can be classified in one of two mutually complimentary categories, namely a "success" and a "failure".

Binomial Experiment

Consider a random experiment consisting of a sequence of Bernoulli trials
such that

the number of trials is fixed, say, $n$ , before performing the experiment;
the trials are independent; and
- Means the outcome of one trial should not have any influence on the outcomes of the remaining trials
the probability of ‘success’, say, $p$ , remains fixed across the trials.

Such an experiment is referred to as a Binomial experiment with
parameters $n$ and $p$ .

Examples

A coin is flipped 10 times independently, and under identical conditions, and the number of heads are recorded: Yes−Binomial
- Observe that when a coin is flipped once, then either a head or a tail appears.
- Define head as "success" and tail as "failure"
  - One single flip of the coin can be regarded as a Bernoulli Trial
- There are 10 such Bernoulli Trials in total, that means the number of trials here is fixed.
- The trials are of course independent, the outcome of one flip of the coin does not have any influence on the outcome of the remaining flips
- The probability of success is fixed since the coin is flipped under identical conditions 10 times.
Cards are drawn at random, one by one and with replacement, from a well-shuffled deck of 52 cards until we get 5 jacks: No-Number of trials is not fixed
- The drawing of each card can be regarded as a Bernoulli Trial
  - The outcomes are dichotomous, either we get a jack or not
- Are the drawings independent? - yes
  - Since this is done with "replacement", it means the outcome of one drawing will not have any influence on the remaining drawings
- The probability of drawing a jack remains the same across all these Bernoulli Trials.
- But here we do not know what should be the total number of trials before we get five jacks!
  - Observe that in order to get 5 jacks you need to have at least 5 many such draws
  - Theoretically it can go up to infinity.
8 cards are drawn at random, one by one and with replacement, from a well-shuffled deck of 52 cards, and the number of spades are recorded: Yes−Binomial
- Is it a series of Bernoulli Trials?
  - Yes, drawing of each card can be regarded as a Bernoulli Trial
- Is the number of trials defined?
  - Yes, it is fixed before the random experiment is performed
- Are the trials independent?
  - Yes, done at random, one by one and with replacement
- What is the probability of success in each trial?
  - 13/25 - it remains fixed across these 8 independent Bernoulli Trials.
5 individuals are drawn at random, one by one and without replacement, from a group of 22 males and 15 females to form a committee, and the number of females selected is recorded: No−Trials are dependent
- Is it a series of Bernoulli Trials?
  - Observe that here drawing of each committee member may result in any one of the two outcomes, either it is a male or a female
  - one is success, the other is failure.
- Is the number of trials defined?
  - Since 5 committee members are to be chosen, therefore there are 5 such Bernoulli Trials
- Are the trials independent?
  - No, done at random, one by one, but without replacement.
  - The outcome of each draw will depend on the outcome of the remaining draws, as a result the trials are not independent, they are dependent.

Binomial Distribution

Consider a Binomial experiment with parameters $n$ and $p$ .

A finite sequence of $n$ many independent Bernoulli Trials, such that the probability of success $p$ remains fixed across those $n$ many Bernoulli Trials

Let $X$ be the number of successes out of those $n$ trials. We say that $X$
follows a Binomial distribution with parameters $n$ and $p$ , and write

X \sim Binomial (n, p) .

The ~ sign means "follows"

The probability mass function (p.m.f.) of $X$ is given by

P (X = x) = (\binom{n}{x}) p^{x} (1 - p)^{n - x}, for x = 0, 1, \dots, n .

Observe that since the random variable $x$ here is the number of successes out of $n$ many binary trials, therefore the possible $x$ values will be 0,1,..., $n$ .
This means there will be $n + 1$ many possible $x$ values.

For each $x$ = 0,1,..., $n$ , $(\binom{n}{x})$ denotes the number of ways through which $x$ objects can be drawn from $n$ objects at a time.

Note we can read $(\binom{n}{x})$ as " $n$ choose $x$ ".
This notation is also sometimes found as $^{n} C_{x}$
The formula for evaluating this constant is the following: $\frac{n!}{(n - x)! n!}$
- Where
  - $n! = 1 \times 2 \times \dots \times n$ . (the product of the first $n$ natural numbers)
  - $x! = 1 \times 2 \times \dots \times x$ . (the product of the first $x$ natural numbers)
  - $0! = 1$ . (by convention)

The mean and the variance of $X \sim Binomial (n, p)$ are given by

Population mean: $µ_{x} = n p$
Population variance: $σ_{X}^{2} = n p (1 - p) = n p q$

where $q = 1 - p$ is the probability of failure in a single trial.

Examples

Suppose we toss a fair coin independently 10 times. What would be the
distribution of $X$ , the number of heads that appear?

X \sim Binomial (n = 10, p = 0.5)

There are 10 such trials
Trials are independently of each other
Is the probability of success $p$ fixed? yes, the coin is fair.
Thus we have a Binomial Experiment.

What is the probability of exactly 4 heads?
- P(X = 4) = 0.20508
What is the probability of at most 3 heads?
- P(X <= 3) = 0.17188
What is the probability of seeing heads between 5 and 7?
- P(5 <= X <= 7)
- Note we can also write it as P(4 < X <= 7)
- And remember that $P (a < X \leq b) = P (X \leq b) - P (X \leq a)$
  - For any discrete rv $X$
- So lets rewrite it as P(X <= 7) - P(X <= 4)
- Now we can use the calculator!
  - P(X <= 7) = 0.94531
  - P(X <= 4) = 0.37695
- 0.94531 - 0.37695 = 0.56842
What is probability of seeing 2 tails?
- How can you express this event in terms of the rv $X$ .
- $X$ is the number of heads out of those 10 Bernoulli Trials
- Then note that $n - X = 10 - X$
  - $10 - X$ is the number of tails out of those 10 Bernoulli Trials.
- So you need to find $P (10 - X = 2) = P (X = 8)$
- P(X = 8) = 0.04395

Use this Online Binomial Distribution Calculator.

We can use it on exams and homeworks!
- Insert $n$
- Insert $p$ - the probability of success
- Insert $x$ - the given $x$ value
- Select between =, <=, or >= relation.

Example

Air-USA has a policy of booking as many as 25 persons on a small
airplane that can seat only 22 passengers. Past studies have revealed
that only 85% of the booked passengers actually arrive for the flight.
Find the probability that if Air-USA books 25 seats, not enough seats will
be available.

Assume that the tickets were purchased independently of each other, which means the passengers ride independently of each other.

Let the r.v. $X$ denote the number of passengers who actually show up.
Then,

X \sim Binomial (n = 25, p = 0.85) .

For each passenger there are 2 possibilities
- Either the passenger shows up ("success")
- Or the passenger doesn't show up ("failure")
There are 25 such Bernoulli Trials, which are independent according to our assumption.
The probability of success remains fixed because past studies have revealed that only 85% of the booked passengers actually arrive for the flight, that means the probability of "success" is 0.85, and it remains the same across all 25 passengers.
$X$ is nothing but the number of successes out of those 25 Bernoulli Trials

Passenger capacity of the flight is 22, and 25 seats were booked. Hence,
not enough seats will be available if and only if $X \geq 23$ .

Required probability:

P (X \geq 23) = \sum_{x = 23}^{25} P (X = x) = 0.25374

Use the Online Binomial Distribution Calculator.
- P(X >= 23) = 0.25374