Bernoulli Trials
The theory of Bernoulli trials describes the probability \( P(k;n) \) of observing exactly \( k \) successes in a sequence of \( n \) independent trials, where each trial has only two possible outcomes, success or failure: $$ P(k;n) = \binom{n}{k} p^k (1 - p)^{n-k} $$ In this expression, \( p \) represents the probability of success, while \( q = 1 - p \) represents the probability of failure in each individual trial.
In order to apply Bernoulli trials correctly, three fundamental conditions must be satisfied:
- The trials are mutually independent
- Each trial admits exactly two possible outcomes, success or failure
- The probability of success \( p \) is constant across all trials
When these conditions are met, the model yields the probability that exactly \( k \) successes occur in \( n \) trials.
Note. Bernoulli trials are a cornerstone of probability theory. They provide a rigorous mathematical framework for modeling real-world situations in which a binary experiment is repeated under identical conditions. From a theoretical standpoint, they form the foundation of the binomial distribution.
An example
Suppose a machine produces a defective item with probability \( p = 0.05 \).
Each item can therefore be classified as:
- defective, with probability \( p = 0.05 \)
- non-defective, with probability \( q = 1 - p = 0.95 \)
Consider a sample consisting of \( n = 10 \) items.
What is the probability of obtaining $ k=1 $ defective item?
The probability of obtaining exactly one defective item, that is $ k=1 $, is given by
$$ P(1;10) = \binom{10}{1} (0.05)^1 (0.95)^9 $$
Since the corresponding binomial coefficient is \( \binom{10}{1} = 10 \), we obtain
$$ P(1;10) = 10 \cdot 0.05 \cdot 0.630 = 0.315 $$
Thus, the probability of finding exactly one defective item in a sample of 10 is approximately 31.5%.
What is the probability of obtaining $ k=2 $ defective items?
To compute the probability that exactly 2 items are defective in a sample of \( n = 10 \), the Bernoulli trials model is applied once again.
$$ P(2;10) = \binom{10}{2} (0.05)^2 (0.95)^8 $$
In this case, the binomial coefficient is
$$ \binom{10}{2} = 45 $$
This result indicates that there are 45 distinct combinations that lead to exactly 2 defective items in 10 trials.
Note. The binomial coefficient is defined by the formula $$ \binom{n}{k} = \frac{n!}{k! \cdot (n - k)!} $$ In the present case $$ \binom{10}{2} = \frac{10!}{2! \cdot 8!} $$ $$ \require{cancel} \binom{10}{2} = \frac{10 \cdot 9 \cdot \cancel{ 8! }}{2 \cdot 1 \cdot \cancel{ 8! }} $$ $$ \binom{10}{2} = \frac{10 \cdot 9}{2} $$ $$ \binom{10}{2} = \frac{90}{2} = 45 $$ The key observation is that the factorial terms in common \(( 8! )\) cancel immediately, leaving only the relevant factors to be evaluated.
The probability expression therefore becomes
$$ P(2;10) = 45 \cdot 0.0025 \cdot 0.6634 \approx 0.0747 $$
Hence, the probability that exactly 2 out of 10 items are defective is approximately 7.47%.
Note. Although there are more possible combinations corresponding to 2 defective items \(( \binom{10}{2} = 45 )\) than to a single defective item \(( \binom{10}{1} = 10 )\), the overall probability is lower. $$ 0.05^2 = 0.0025 \ll 0.05 $$ This is because two rare events must occur simultaneously. In general, while the number of admissible combinations increases with \( k \), the probability associated with each individual configuration decreases much more rapidly. As a consequence, the total probability decreases as the number of rare successes increases.
More generally, the table below illustrates how the probability varies as a function of the number of successes $ k $.
| k (defective items) | \( \binom{10}{k} \) | Probability \( P(k;10) \) |
|---|---|---|
| 0 | 1 | 0.5987 |
| 1 | 10 | 0.3151 |
| 2 | 45 | 0.0747 |
| 3 | 120 | 0.0100 |
| 4 | 210 | 0.0009 |
| 5 | 252 | 0.00006 |
| 6 | 210 | 0.0000032 |
| 7 | 120 | 0.00000012 |
| 8 | 45 | 3.4 × 10⁻⁹ |
| 9 | 10 | 6.6 × 10⁻¹¹ |
| 10 | 1 | 9.8 × 10⁻¹³ |
As the table shows, even though each item has a 5% probability of being defective, the overall probability decreases exponentially as $ k $ increases, because the factor $ p^k $ rapidly becomes extremely small.
For instance, the probability $ P(10;10) $ of obtaining a sample in which all items are defective \( k=10 \) is effectively zero, on the order of one in a trillion.
Cumulative probability
When the objective is to compute the probability of obtaining at most \( k \) successes, one must sum the probabilities corresponding to all outcomes from 0 up to \( k \):
$$ P(\le k;,n) = \sum_{i=0}^{k} P(i;,n) $$
This quantity represents the cumulative probability of all outcomes with a number of successes not exceeding $ k $.
Example
Consider again the previous situation with $ n=10 $ trials and probability \( p = 0.05 \) that an item is defective.
The goal is to compute the probability of obtaining no more than two defective items.
In this case, the relevant probabilities are those corresponding to zero, one, or two defective items.
$$ P(\le 2;10) = P(0;10) + P(1;10) + P(2;10) $$
The table already provides all the necessary numerical values, which can therefore be substituted directly.
$$ P(\le 2;10) = 0.59874 + 0.31512 + 0.07465 = 0.98851 $$
Therefore, the probability of obtaining at most 2 defective items out of 10 is 98.851%.
In practical terms, almost all samples will contain zero, one, or two defects. In this example, outcomes involving three or more defects are exceedingly rare.
Note. When computing the probability of obtaining at least \( k \) successes, it is often advantageous to work with the complement of the cumulative distribution function. $$ P(\ge k;n) = 1 - \sum_{i=0}^{k-1} P(i;n) $$ The rationale is that the complementary approach typically reduces the computational burden. Rather than summing a large number of favorable outcomes, one evaluates only a small number of unfavorable outcomes, from 0 up to \( k-1 \). For example, suppose we wish to compute the probability that there are at most 8 defective items. $$ P(\le 8;10) $$ Explicitly, $$ P(\le 8;10) = P(0;10) + P(1;10) + P(2;10) + \ldots + P(7;10) + P(8;10) $$ In this setting, using the complement is more efficient, because it is far simpler to evaluate a few rare cases than to sum many terms. The probability of having no more than 8 defective items out of 10 is the complement of the probability of having 9 or 10 defective items. Accordingly, we write $$ P(\le 8;10) = 1 - P(\ge 9;10) $$ Expanding the right-hand side yields $$ P(\ge 9;10) = P(9;10) + P(10;10) $$ and therefore $$ P(\le 8;10) = 1 - P(9;10) - P(10;10) $$ In this way, it suffices to compute the binomial distribution with \( n = 10 \) and \( p = 0.05 \) for just these two cases: $$ P(9;10) = \binom{10}{9} (0.05)^9 (0.95) = 6.6 × 10^{-11} $$ $$ P(10;10) = \binom{10}{10} (0.05)^{10} = 9.8 × 10^{-13} $$ These probabilities are exceedingly small. Adding them gives $$ P(9;10) + P(10;10) \approx 0.00000000002 $$ Hence, $$ P(\le 8;10) = 1 - 0.00000000002 \approx 0.99999999998 $$ In practical terms, the probability of observing at most 8 defective items out of 10 is about 99.9%, and can therefore be regarded as virtually certain. This example illustrates clearly why the use of complements is a fundamental technique in probability theory. In general, when an event has very high probability, it is preferable to compute the probability of its complement, which typically involves only a few cases with extremely small probabilities, thereby greatly simplifying the calculations.
And so on.
