Probability
The probability of an event E is defined as the ratio of the number of favorable outcomes (f) to the total number of possible outcomes (n), under the assumption that all outcomes are equally likely. $$ p(E) = \frac{f}{n} $$
This interpretation is known as the classical definition of probability, first articulated by Pierre Simon de Laplace.
This ratio offers a precise measure of the likelihood that the event will occur.
The probability of a random event always lies between 0 and 1.
$$ 0 \le p(E) = \frac{f}{n} \le 1 $$
A certain event has probability 1, while an impossible event has probability 0.
- If no outcomes are favorable $ f=0 $, the event is impossible and its probability is $ P(E)=0 $.
- If $ f=n $, the event is certain and its probability is $ P(E)=1 $.
Whenever the probability falls strictly between 0 and 1, the event is neither certain nor impossible. It is simply uncertain in outcome and can only be described in probabilistic terms.
Probabilities are sometimes expressed as percentages, where 0% corresponds to 0 (an impossible event) and 100% corresponds to 1 (a certain event).
$$ 0 \% \le p(E) \le 100 \% $$
The meaning is identical in both representations. Only the numerical format changes.
Note: There are two main interpretations of probability. The frequentist interpretation treats probability as an objective property of an outcome, where the probability converges to a fixed value after many repetitions of the experiment. The subjectivist interpretation, on the other hand, views probability as a subjective estimate by the observer, meaning different analysts can assign different probabilities to the same outcome.
The set of all possible outcomes of an event is called the sample space (or universal set) and is often denoted by S, U, or Ω.
The set of favorable outcomes (F) is a subset of the sample space S, containing only the outcomes that favor event E.
Since F is a subset of U, the number of favorable outcomes is always less than or equal to the total number of possible outcomes.
$$ |F| \le |U| $$
Note: If the set of favorable outcomes is empty, F=Ø, the event is impossible. Conversely, if F matches U, meaning F=U, the event is certain.
A Practical Example
When rolling a die, there are several possible outcomes.
Since the die has six faces, there are six possible outcomes.
The sample space consists of n=6 elements (six possible outcomes).
$$ S = \{1,2,3,4,5,6 \} $$
To calculate the probability of rolling a 3, you need to find the ratio between the number of favorable outcomes (f) and the total number of possible outcomes (n).
In this case, the set of favorable outcomes (F) contains only one favorable outcome (f=1), which is rolling a 3.
$$ F=\{ 3 \} $$
Thus, the probability of rolling a 3 (A="rolling a 3") is 0.16
$$ P(A) = \frac{|F|}{|S|} = \frac{f}{n} = \frac{1}{6} = 0.16 $$
In percentage terms, the probability of this event is 16%.
Example 2
What is the probability of rolling an even number?
In this case, event E is "rolling an even number."
The sample space is the same, with n=6 possible outcomes.
$$ S = \{1,2,3,4,5,6 \} $$
The set of favorable outcomes contains f=3 outcomes.
$$ F = \{ 2 , 4, 6 \} $$
So, the probability of rolling an even number (A="rolling an even number") is 0.5
$$ P(A) = \frac{|F|}{|S|} = \frac{f}{n} = \frac{3}{6} = 0.5 $$
In percentage terms, the probability of this event is 50%.
Example 3
What is the probability of drawing a spade from a standard 52-card deck?
Because all outcomes are equally likely, the total number of possible outcomes is $ n = 52 $.
The deck contains thirteen spades, so the number of favorable outcomes is $ f = 13 $.
The probability of drawing a spade is 0.25.
$$ P(E) = \frac{f}{n} = \frac{13}{52} = 0.25 $$
Expressed as a percentage, the probability of this event is 25%.
Example 4
What is the probability of drawing a red card from a standard 52-card deck?
In such a deck, $ n = 52 $ and there are $ f = 26 $ red cards in total, consisting of 13 hearts and 13 diamonds.
Therefore, the probability of drawing a red card is 0.5.
$$ P(E) = \frac{f}{n} = \frac{26}{52} = 0.5 $$
In percentage terms, the probability of this event is 50%.
Example 5
Example
Consider the experiment of tossing three coins simultaneously.
We want to find the probability of obtaining exactly two tails (C) and one head (T).
$$ E = \text{two tails and one head} $$
The total number of ordered outcomes with repetition is given by the formula $ D_{n,k} = n^k $, where $ n = 2 $ possible outcomes per toss and $ k = 3 $ independent trials:
$$ D_{2,3} = 2^3 = 8 $$
The complete sample space is:
$$ CCC, CCT, CTC, TCC, TTT, TTC, TCT, CTT $$
To count the number of favorable outcomes, we use the standard formula for permutations of a multiset, $ P_n^{(r,s)} = \frac{n!}{r!s!} $. Here $ n = 3 $ total positions, with $ r = 2 $ identical elements (C) and $ s = 1 $ distinct element (T):
$$ P_3^{(2,1)} = \frac{3!}{2!1!} = \frac{6}{2} = 3 $$
Therefore, the probability of event E is
$$ P(E) = \frac{3}{8} $$
Understanding A Priori and A Posteriori Probability
Probability can be calculated in two ways, depending on the nature of the random phenomenon or event.
- A Priori (Theoretical) Probability
A priori probability is calculated without performing any real-world experiments (empirical data). It is based on a theoretical model built using classical probability theory. In this case, probability is defined as the ratio between the number of favorable outcomes (F) and the number of possible outcomes (N). $$ p = \frac{F}{N} $$ Where F is a theoretical value derived from analyzing the problem.
Example: A common example is a fair six-sided die. Each face has an equal chance of landing, so no estimation is needed to calculate the probability. I just need to know that each face has a probability of 1/6.
- A Posteriori (Statistical) Probability
A posteriori probability is determined through the collection of real-world empirical data, based on repeated trials, following the empirical law of probability. This method uses direct observation to provide a more accurate and representative estimate of an event’s probability. It is mainly used in inferential statistics. In practice, N trials are performed, and the number of favorable outcomes is recorded. Statistical probability is the ratio of favorable outcomes (F) to the total number of trials (N). $$ p = \frac{F}{N} $$ Where F represents the number of favorable outcomes observed in the trials.
Example: When a die is biased, the theoretical probability no longer accurately reflects the real-world outcomes. In such cases, a posteriori probability is used, where the probability of the event is measured through the relative frequency of favorable outcomes in repeated trials or experiments. However, many trials are needed for a reliable estimate. According to the empirical law of probability, the more observations are made, the closer the relative frequency (F/N) comes to the actual probability (P) of the event.
Which Should You Use?
A priori (theoretical) probability is easier and faster to calculate, but it can't always be applied to every random event.
A posteriori (statistical) probability is useful when the theoretical probability can't be determined, but it requires more data and time to be reliable.
For example, if a die is fair, a priori probability is reliable. However, if the die is biased, a posteriori probability must be used.
In general, the choice depends on the context and the problem at hand.
Theoretical probability may be less precise in some situations, but statistical probability is more time-consuming, as it requires extensive data collection to be accurate.
Often, the decision is also influenced by the level of risk you're willing to accept, making it a subjective choice.
For instance, in situations where I have limited data, a tight deadline, and the events are already somewhat predictable, a priori (theoretical) probability might be an acceptable estimate. It's easier and faster to calculate, provided the margin of error is low and the consequences of a mistake aren't too significant. In these cases, a careful cost-benefit analysis is crucial.
The Frequentist Interpretation of Probability
In the frequentist perspective, the probability of an event \( E \) is defined as the limit of its relative frequency as the number of trials \( n \) increases without bound: \[ P(E) = \lim_{n \to \infty} \frac{m}{n} \] Here \( m \) is the number of times the event occurs, \( n \) is the total number of trials performed under identical conditions, and \( \frac{m}{n} \) is the observed relative frequency.
Unlike the classical interpretation, which rests on symmetry assumptions and a priori reasoning, the frequentist view interprets probability as a measurable quantity grounded in repeated empirical observation.
In this framework, probability emerges directly from data rather than from theoretical assumptions. It is based on the empirical law of probability.
When is this interpretation most effective?
The frequentist approach is particularly useful when the underlying probability of an event is unknown and must be estimated empirically by repeating the same experiment many times.
As the number of observations increases, the estimated probability becomes more stable and more representative of the underlying process.
What are the limitations of the frequentist view?
This framework requires that the experiment be repeatable under identical or sufficiently controlled conditions. Only then can relative frequencies provide meaningful probability estimates.
Note. The frequentist interpretation cannot be applied to single, non-repeatable events because it treats probability as an objective quantity that must be inferred from repeated trials. Precise estimation demands a sufficiently large number of observations. Unlike subjective Bayesian approaches, this interpretation incorporates no personal beliefs. Probability is determined exclusively by empirical data.
An observed relative frequency of zero, \( f(E)=0 \), does not mean the event is impossible, only that it did not occur within the sample examined.
Similarly, a relative frequency of \( f(E)=1 \) does not imply that the event is logically guaranteed.
Example
Suppose we want to estimate the probability that a production line yields a defective part.
We examine a batch of \( n = 1000 \) items and identify \( m = 27 \) defective units. The observed relative frequency is:
\[ f(E) = \frac{27}{1000} = 0.027 \]
This provides the following empirical estimate of the probability:
\[ P(E) \approx 0.027 = 2.7\% \]
To improve the reliability of the estimate, it is advisable to repeat the measurement multiple times under comparable conditions.
If two additional batches of 1000 items yield frequencies of \( 0.030 \) and \( 0.025 \), these fluctuations simply reflect sampling variability, which is inherent to any empirical process.
To obtain a more stable estimate, we can compute the mean relative frequency across all trials:
\[ P(E) \approx \frac{f_1 + f_2 + \dots + f_k}{k} \]
With three measurements, the mean is:
\[ P(E) \approx \frac{0.027 + 0.030 + 0.025}{3} = 0.0273\ldots \]
Thus, our refined estimate becomes:
\[ P(E) \approx 0.0273 = 2.73\% \]
The averaged frequency provides a more robust approximation of the true probability of producing a defective item.
Increasing the number of trials
As the number of repetitions and the sample size \( n \) increase, the relative frequency tends to stabilize around a constant value.
For example, if instead of 1000 items we inspect a batch of \( n = 10000 \), the resulting estimate will typically be more precise.
As \( n \) grows, random fluctuations diminish, and the empirical frequency converges toward the true probability of the event:
\[ \frac{m}{n} \to P(E) \]
This phenomenon is formalized by the Law of Large Numbers, which ensures that relative frequencies converge to their theoretical probabilities in the long run.
In practice, however, increasing the number of trials is not always feasible. Many experiments involve non-trivial financial or operational costs.
For example, if defect detection requires destructive testing, each trial destroys one item, raising production costs. In such situations, practitioners must balance statistical precision with the practical constraints of the testing procedure.
Subjective probability
Subjective probability captures the degree of personal belief an individual assigns to the likelihood of an event. It quantifies a private judgement rather than any objective, data driven measure.
It does not arise from statistical evidence or long run frequencies. Instead, it reflects the agent’s own assessment based on the information they hold, their prior experience, and their interpretive judgement.
The subjective probability of an event (E) can be represented by the ratio:
\[ p(E)=\frac{P}{V} \]
Where
- \( P \) is the maximum amount the agent is willing to pay to enter a bet that pays out only if the event occurs;
- \( V \) is the payoff the agent would receive in the event of success.
The ratio (P/V) translates an intuitive belief into a number between 0 and 1. It serves as the agent’s subjective estimate of the probability of \( E \).
In practical terms, subjective probability expresses the price the agent considers fair to “purchase” the bet on the event.
Note. The core idea is that willingness to pay reveals the strength of belief. If I regard an event as highly likely, I will be prepared to pay an amount that is large relative to the potential gain. If I consider it unlikely, my willingness to pay will be much smaller.
For this judgement to count as rational, it must satisfy the standard coherence condition:
Anyone willing to pay \( P \) now in order to receive \( V \) if the event occurs must also be willing to receive \( P \) now and commit to paying \( V \) should the event occur.
In other words, the agent must be willing to adopt the counterparty’s perspective and accept the reversed bet. The subjective probability \( p(E)=\frac{P}{V} \) remains the same in both positions, so a coherent assessment cannot endorse one contract while rejecting the other.
If this symmetry is not accepted, the agent’s evaluation is incoherent.
Example
Suppose an individual considers the event \( E \) (“it will rain tomorrow”) reasonably likely.
This person is willing to pay 2 euros ( \( P=2 \) ) to enter a bet that pays 10 euros ( \( V=10 \) ) if it rains.
Their subjective probability of rain is therefore 0.2, or 20 percent.
\[ p(E)=\frac{P}{V}=\frac{2}{10}=0.2 \]
This does not imply that the true, physical probability of rain is 20 percent. It simply reflects the individual’s degree of belief given their information and prior assumptions.
Note (coherence condition). If this person regards it as rational to pay 2 euros now for the chance to receive 10 euros if it rains, they must also regard the symmetric position as rational: receiving 2 euros now while agreeing to pay 10 euros in the event of rain. They must be willing to accept the reversed bet. In both cases the ratio (P/V) remains \( \frac{2}{10} \), so the subjective probability stays at \( 0.2 \). Refusing the reversed arrangement would violate the coherence requirement, since the same probability judgement must apply from either side of the transaction.
Subjective probability does not describe an empirical feature of the world. It expresses the agent’s informational state and beliefs. It requires neither repeated trials nor large statistical samples. It requires only an internally consistent evaluation.
Different agents may assign different probabilities to the same event, and any given agent may revise their estimate as new evidence or insights emerge.
Subjective probability is especially valuable when objective data are limited or when the event in question cannot be replicated.
Axiomatic definition of probability
In probability theory, a probability measure is a function \( p \) that assigns a real number to each event \( E \) in the event space.
In the axiomatic approach, every possible outcome of a random experiment, known as a sample point, belongs to the sample space \( U \), which collects all outcomes that may occur in the experiment.
Each event \( E \) is a subset of the sample space.
The set of all subsets of \( U \), known as the event space, contains every event that can be formed from the sample space.
Note. An elementary event contains exactly one sample point, while a composite event contains multiple points. The certain event is the entire sample space \( U \), and the impossible event is the empty set \( \emptyset \).
The function \( p \) must satisfy three foundational conditions, known collectively as the Kolmogorov axioms, which establish the modern mathematical framework of probability.
The probability axioms
- Axiom 1: Non negativity
\( p(E) \ge 0 \). No event can have a negative probability. - Axiom 2: Normalization
\( p(U) = 1 \). The probability of the entire sample space is one, meaning some outcome is certain to occur. - Axiom 3: Additivity
If two events \( E_1 \) and \( E_2 \) are mutually exclusive, so that \( E_1 \cap E_2 = \varnothing \), then the probability of their union equals the sum of their individual probabilities:
\[ p(E_1 \cup E_2) = p(E_1) + p(E_2). \]
Together, these axioms provide a complete, internally consistent foundation for probability theory. Every other theorem or property of probability can be derived from them.
Note. This axiomatic structure integrates probability with the fundamental operations of set theory such as union, intersection, and complement, as well as with logical operations including negation, conjunction, and disjunction.
Example
Consider a standard experiment: rolling a fair six sided die.
The sample space is:
\[ U = \{1,2,3,4,5,6\} \]
Each element represents one of the possible outcomes.
Define the event \( E \):
\[ E = \text{“an even number is rolled”} = \{2,4,6\}. \]
This is a composite event consisting of three sample points. Since the die is fair, each outcome has probability \( \frac{1}{6} \):
\[ p(\{1\}) = p(\{2\}) = \dots = p(\{6\}) = \frac{1}{6}. \]
Because the event \( E \) is the union of three mutually exclusive elementary events, the additivity axiom gives:
\[ p(E) = p(\{2\}) + p(\{4\}) + p(\{6\}). \]
Substituting:
\[ p(E) = \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = \frac{3}{6} = \frac{1}{2}. \]
Thus, the probability of rolling an even number is \( \frac{1}{2} \), that is, fifty percent.
We now verify that the axioms hold in this setting.
- Axiom 1: Non negativity
Every probability must be non negative. The computed value \( p(E) = \frac{1}{2} \) clearly satisfies this requirement. - Axiom 2: Normalization
Since all six outcomes are equally likely, the total probability across the sample space must be one:
\( p(U) = p(1) + p(2) + p(3) + p(4) + p(5) + p(6) \)
\( p(U) = \frac{1}{6} + \frac{1}{6} + \frac{1}{6} + \frac{1}{6} + \frac{1}{6} + \frac{1}{6} \)
\( p(U) = \frac{6}{6} = 1. \) - Axiom 3: Additivity
Let
\[ E_1 = \{2\}, \quad E_2 = \{4\}. \]
These events are mutually exclusive because
\[ E_1 \cap E_2 = \varnothing. \]
Their union is
\[ E_1 \cup E_2 = \{2,4\}. \]
By the additivity axiom,
\[ p(\{2,4\}) = p(\{2\}) + p(\{4\}). \]
Since the die is fair,
\[ p(\{2,4\}) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6} = \frac{1}{3}. \]
Hence, the probability of rolling either 2 or 4 is one third.
This example shows how the axioms structure probability theory and ensure its coherence.
In this example, the event space is the power set of \( U \), meaning the set of all possible subsets of the sample space:
\[ \mathcal{P}(U) = \text{all subsets of } U. \]
It includes:
- the impossible event \[ \emptyset \]
- the certain event \[ U = \{1,2,3,4,5,6\} \]
- the six elementary events \[ \{1\}, \{2\}, \{3\}, \{4\}, \{5\}, \{6\} \]
- all composite events such as \[ \{1,2\}, \{2,4,6\}, \{1,3,5\}, \{1,2,3,4\}, \dots \]
If the sample space has six elements, its event space contains \( 2^6 = 64 \) distinct events.
And so on...
