Probability Spaces
A lot of systems are very complex. For example describing the motion of water molecules or the stock market. In such cases, it can be impossible to describe the system in such great detail to predict specific outcomes. Instead, we use probabilistic models to describe the system. These probabilistic models use and follow the rules of probability theory.
Another example of a complex system is quantum theory. For most already the name is enough to make them shiver. But in 1900 Max Planck came up with some building blocks to describe the system of quantum theory using probability theory. This was the birth of quantum theory. This was highly controversial at the time because physicists were used to deterministic models, i.e models that were able to predict outcomes with certainty. Einstein was one of the most vocal critics of quantum theory. He famously said “God does not play dice with the universe” and one of his life goals was to find a deterministic model for quantum theory. But so far no one has been able to do so.
These probabilistic models are based on the concept of a probability space, a mathematical model that describes the possible outcomes of a random experiment and the probability of each outcome.
Random Experiments
First of all, we need to define what a random experiment is. A random experiment or also called trial is an experiment that can be repeated arbitrarily often and leads to a mutually exclusive and exhaustive set of outcomes.
What does this mean? Mutually exclusive outcomes means that only one of the outcomes can happen at a time. Two outcomes can not happen at the same time. Exhaustive means that one of the outcomes must happen. There are no other outcomes.
Importantly, the outcome of a random experiment is not predictable with certainty beforehand, there is always some element of randomness involved. This is why it is called a random experiment and is the reason why we need probability theory to describe the system.
Some common examples of random experiments are flipping a coin, rolling a die. (We assume that the coin is fair and the die is fair and there is no cheating involved).

With this knowledge, we could already informally describe a probabilistic model of a coin flip. We could say that there are two possible outcomes: heads or tails and that the probability of each outcome is 0.5. A different probabilistic model for the random experiment of flipping a coin could also be if we include the outcome of the coin landing on its side and giving it a very small probability of happening. We could also use a biased coin that has a higher probability of landing on heads than on tails.
It turns out that flipping a coin isn’t actually fair, i.e the probability of heads and tails isn’t 0.5. You can watch this video by Numberphile to know more.
Law of Large Numbers
The interesting thing about random experiments is when you perform the experiment once you can not predict the outcome. But if you perform the experiment multiple times you can predict the outcome. For example, if you flip a coin once you can not predict if it will land on heads or tails. But if you flip the coin 1000 times you can predict that it will land on heads roughly 500 times and tails roughly 500 times. This is called the law of large numbers.
Sample Space
If we want to model a random experiment we first need a set of all the possible outcomes of the random experiment. This set is called the sample space and is denoted by \(\Omega\). The elements of the sample space are the mutually exclusive and exhaustive outcomes of the random experiment which are denoted by \(\omega \in \Omega\) and are also often called elementary events, elementary experiments or states.
We’ve seen what mutually exclusive means that they cannot occur simultaneously. For example a coin cannot land on heads and tails at the same time. So we could formally write our sample space for a coin flip as:
\[\Omega = \{\text{Heads}, \text{Tails}\} \]We’ve also seen that our model should be exhaustive, i.e our sample space should contain all possible outcomes. The number of outcomes in the sample space can vary depending on the random experiment but we will see more about this later.
Events
An event is a set of outcomes. In other words, an event is a subset of the sample space \(A \subseteq \Omega\). We can then get the set of all possible events by taking the power set of the sample space \(\mathscr{P}(\Omega)\).
Say we then perform a random experiment and the experiment results in the outcome \(\omega\) so \(\omega \in \Omega\).
- We then say if \(\omega \in A\) then the event \(A\) has occurred.
- If \(\omega \notin A\) then the event \(A\) has not occurred.
This then leads to two special events that can easily be defined and interpreted:
- The impossible event is the empty set \(\emptyset\). This corresponds to an event that will never occur.
- The certain/sure/guaranteed event is the sample space \(\Omega\). This corresponds to an event that will always occur.
We have already defined the sample space for rolling a six-sided die as \(\Omega = \{1,2,3,4,5,6\}\). We can now construct the following events, i.e subsets of the sample space:
- Rolling an even number: \(A=\{2,4,6\}\)
- Rolling a number divisible by 3: \(A=\{3,6\}\)
- Rolling a number greater than 2: \(A=\{3,4,5,6\}\)
- Rolling a seven: \(A=\emptyset\), i.e the impossible event since our die only has numbers from 1 to 6.
- Rolling a number between 1 and 6: \(A=\Omega\), i.e the certain event since we are guaranteed to roll a number from 1 to 6.
Sigma Algebra
In some cases we might not want to consider all possible events. For example when the sample space \(\Omega\) is very large. In such cases we can define a sigma algebra denoted by \(\F\) which is a specific collection of events that we are interested in. To make sure that all the properties of probability theory hold we need to make sure that the sigma algebra satisfies some specific properties.
Firstly the certain event must be in the sigma algebra so:
\[\Omega \in \F \]Secondly if an event is in the sigma algebra then the complement of that event must also be in the sigma algebra:
\[A \in \F \Rightarrow A^c \in \F \]Lastly if we have some events in the sigma algebra then the union of these events must also be in the sigma algebra:
\[A_1,A_2,... \in \F \Rightarrow \bigcup_{i=1}^{\infty} A_i \in \F \]These properties ensure that the sigma algebra is closed under the operations of complement and union. This is important because we want to be able to calculate the probability of complex events by breaking them down into their elementary parts which we will see later.
If we take our running example of rolling a six-sided die we can define different sigma algebras. The following are some valid sigma algebras:
- \(\F = \{\emptyset, \{1,2,3,4,5,6\}\}\), the smallest sigma algebra
- \(\F = \mathscr{P}(\Omega)\), i.e the power set of the sample space with \(|\F|=2^6=64\) the largest sigma algebra
- \(\F = \{\emptyset, \{1,2\}, \{3,4,5,6\}, \{1,2,3,4,5,6\}\}\)
- \(\F = \{\emptyset, \{1,2\}, \{3,4\}, \{5,6\}, \{1,2,3,4\}, \{1,2,5,6\}, \{3,4,5,6\}, \{1,2,3,4,5,6\}\}\)
The above sigma algebras are all valid because they satisfy the properties of a sigma algebra. The following are some invalid sigma algebras:
- \(\F = \{\emptyset, \{1,2\}, \{3,4,5,6\}\}\), the certain event is not in the sigma algebra
- \(\F = \{\emptyset, \{1,2\}, \{3,4\}, \{5,6\}\}\), the union of the events \(\{3,4\}\) and \(\{5,6\}\) is not in the sigma algebra
- \(\F = \{\{1,2,4,5,6\}\}\), the complement of the event \(\{1,2,4,5,6\}\) is not in the sigma algebra
Properties of Events
The above definition of a sigma algebra also has some consequences for events and the power set of the sample space as it is also a sigma algebra.
Because the certain event has to be in the sigma algebra and the complement of an event has to be in the sigma algebra we can see that the sigma algebra must therefore also contain the impossible event.
\[\Omega \in \F \Rightarrow \emptyset \in \F \text{ because } \Omega^c = \emptyset \]We have seen in the definition of a sigma algebra that the infinite union of events must also be in the sigma algebra. This general case can also be applied to finite unions of events for example if we have two events \(A\) and \(B\) in the sigma algebra then the union of these events must also be in the sigma algebra.
\[A,B \in \F \Rightarrow A \cup B \in \F \]The proof of this is rather simple as we already have the general case and we know that the empty set is in the sigma algebra. The idea is then that we just add the empty set to the union of the two events infinitely many times to again get to the general case. However, taking the union of the empty set with any set is just the set itself. So we can write the union of two events as:
\[A \cup B = A \cup B \cup \emptyset \cup \emptyset \cup \emptyset \cup \ldots \]So we know that the union of two events is in the sigma algebra but what about the intersection of two events? We can see that the intersection of two events is the complement of the union of the complements of the two events. To see this I recommend drawing a Venn diagram.
\[A \cap B = (A^c \cup B^c)^c \]So because the compliment of an event and the union of two events have to be in the sigma algebra and therefore then again its complement we can see that the intersection of two events must also be in the sigma algebra.
\[A,B \in \F \Rightarrow A \cap B \in \F \]Using the general case for the infinite union of events we can also see that the intersection of infinitely many events must also be in the sigma algebra.
\[A_1,A_2,... \in \F \Rightarrow \bigcap_{i=1}^{\infty} A_i \in \F \text{ because } \bigcap_{i=1}^{\infty} A_i = \left(\bigcup_{i=1}^{\infty} A_i^c\right)^c \]Interpretations of Events
We have seen that events are sets of outcomes and that we can perform operations on these events like unions, intersections and complements.

A lot of these events can be visualized and then also interpreted in natural language.

The same goes for the relations between events.

Probability Measure
So far we actually haven’t seen any probabilities. We have only defined what experiments are and what outcomes or events can occur. We haven’t defined with what probability these outcomes or events occur. This is where the probability measure comes in. The probability measure is a map that assigns a probability to each event in the sample space. More formally we can define the probability measure on a sample space \(\Omega\) and a sigma algebra \(\mathcal{F}\) as:
\[\begin{align*} \P: &\F \to [0,1] &A \mapsto \P(A) \end{align*} \]And that satisfies some properties. These properties are called the Kolmogorov axioms and were introduced by the Russian mathematician Andrey Kolmogorov in 1933. We can see that each event \(A\) is assigned a probability \(\P(A)\) which is a number between 0 and 1. 1 means that the event is certain to happen and 0 means that the event is impossible. This leads to the first property that the probability measure must satisfy.
\[\P(\Omega) = 1 \]so the probability of the certain event is 1 as it will always occur. The next property is called countable-additivity or sometimes also \(\sigma\)-additivity. This property states that the probability of the union of mutually exclusive events is equal to the sum of the probabilities of the individual events.
\[\P(A) = \sum_{i=1}^{\infty} \P(A_i) \text{ where } A = \bigcup_{i=1}^{\infty} A_i \text{ and } A_i \cap A_j = \emptyset \text{ for } i \neq j \text{ so a disjoint union} \]This property is very important because it allows us to calculate the probability of complex events by breaking them down into its disjoint parts which at the lowest level are the probabilities of the elementary events or outcomes.
We can now define the probability measure for the sample space of rolling a six-sided die. We can say that each outcome is equally likely so the probability of each outcome is \(\frac{1}{6}\). So the probability measure for the sample space is:
\[\P(\{1\}) = \frac{1}{6}, \P(\{2\}) = \frac{1}{6}, \ldots, \P(\{6\}) = \frac{1}{6} \]We can now calculate the probability of more complex events by breaking them down into their disjoint parts. For example, the probability of the event “rolling an even number” is:
\[\P(\{2,4,6\}) = \P(\{2\}) + \P(\{4\}) + \P(\{6\}) = \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = \frac{1}{2} \]Which matches our intuition that the probability of rolling an even number is \(\frac{1}{2}\).
Laplace Experiments
If all elementary events in a random experiment have the same probability of occurring, meaning all outcomes are equally likely, we speak of a Laplace experiment. This is also called a uniform distribution.
For a sample space of size \(|\Omega|=m\) with \(m\) equally likely elementary events, we have a Laplace space. Each elementary event \(\omega_i\) has the same probability, known as counting density:
\[P(\{\omega_i\}) = p(\omega_i)= \frac{1}{m} \text{ with } i=1,2,...,m \]Thus, the probability of an event \(A\) is defined as:
\[P(A) = \sum_{\omega_i \in A}{p(\omega_i)} = |A| \cdot \frac{1}{m} = \frac{|A|}{m} \]When rolling a die, all 6 outcomes are equally likely, making it a Laplace experiment.
For each elementary event, \(p(\omega_i) = \frac{1}{6}\)
For the event “even number,” \(A = \{2,4,6\}\), the probability is:
\[P(A)=\frac{3}{6} = \frac{1}{2} = 50\% \]Why use Sigma Algebras?
Finite Sample Space
This is the simplest case. The sample space contains only a finite number of outcomes:
\[\Omega = \{\omega_1,\omega_2,\omega_3,...\omega_n\} \]Where \(n\) is the number of possible outcomes and \(n \in \mathbb{N}\).
We are playing dungeons and dragons and we are rolling a 20-sided die. Then the sample space is:
\[\Omega = \{1,2,3, \ldots, 20\} \]Countable Sample Space
Contains infinitely many elementary events, but they can be numbered like the natural numbers.
We roll a die until we get a 6 for the first time. Theoretically, this could take forever, but we can count the number of rolls until the first 6 appears.
Thus, we have \(\omega_i = i\) for \(i=(1,2,...)\) and
\[\Omega = \{1,2,3,...\} \]Uncountable Sample Space
Conditional Probability
The conditional probability of \(B\) given \(A\) is defined as:
\[P(B | A)= \frac{P(A \cap B)}{P(A)} \]Multiplication Rule
Rearranging the conditional probability formula gives the multiplication rule:
\[P(A \cap B)=P(A) \cdot P(B|A) \] \[P(A) \cdot P(B|A) = P(B) \cdot P(A|B) \]Law of Total Probability
The total probability for an event \(B\), where \(A_i\) are the possible intermediate events leading to \(B\), is given by:
\[P(B)= \sum_{i=1}^{n}{P(A_i)\cdot P(B|A_i)} \]A good video explanation can be found here and here .
Bayes’ Theorem
Given that event \(B\) has already occurred, the probability that it happened via intermediate event \(A_j\) is given by Bayes’ theorem:
\[P(A_j|B)= {P(A_j \cap B) \over P(B)} = {P(A_j) \cdot P(B | A_j) \over P(B)} \]A good video explanation can be found here and here .
De Morgan’s Laws
De Morgan’s laws also apply to events:
\[\begin{align*} \overline{A \cup B} &= \overline{A} \cap \overline{B} \\ \overline{A \cap B} &= \overline{A} \cup \overline{B} \end{align*} \]Stochastic Independence
It is possible that the probability of an event \(B\) depends on another event \(A\). This leads us to the concept of conditional probability.
However, if this is not the case, meaning that the events do not depend on each other, we refer to such events as stochastically independent. In this case, the following holds:
\[P(A | B) = P(A) \text{ and } P(B | A) = P(B) \]From the multiplication rule, we then get:
\[P(A \cap B)=P(A) \cdot P(B|A)=P(A) \cdot P(B) \]Thus, we can define that two events are stochastically independent if:
\[P(A \cap B)= P(A) \cdot P(B) \]A coin is tossed three times, and we consider the following events:
- \(A=\) Heads on the 1st toss
- \(B=\) Heads on the 2nd toss
- \(C=\) Tails on the 3rd toss
They are all stochastically independent since they do not influence each other.
Multi-Stage Random Experiments
In a multi-stage random experiment, multiple random experiments are conducted sequentially. These are often represented using tree diagrams (event trees), distinguishing between final outcomes and intermediate outcomes.
We define the following rules:
- The probabilities along a path are multiplied together.
- If multiple paths lead to the same final outcome, their probabilities are added.
A good video explanation can be found here .
An urn contains 6 balls: 2 white and 4 black. We randomly draw 2 balls one after another without replacement, meaning 2 stages. We ask: What is the probability of drawing 2 balls of the same color (\(A\)) or 2 balls of different colors (\(B\))?
Stage 1:
- \(P(W) = {2 \over 6} = {1 \over 3}\)
- \(P(S) = {4 \over 6} = {2 \over 3}\)
Stage 2: After the first draw, only 5 balls remain. If a white ball was drawn:
- \(P(W|W) = {1 \over 5}\)
- \(P(S|W) = {4 \over 5}\)
If a black ball was drawn:
- \(P(W|S) = {2 \over 5}\)
- \(P(S|S) = {3 \over 5}\)
The results are:
For same-colored balls:
\[P(A)=P(WW) + P(SS) = {1 \over 3} \cdot {1 \over 5} + {2 \over 3} \cdot {2 \over 5} = {7 \over 15} \]For differently-colored balls:
\[P(A)=P(WS) + P(SW) = {1 \over 3} \cdot {4 \over 5} + {2 \over 3} \cdot {2 \over 5} = {8 \over 15} \]Birthday Paradox
The birthday paradox is an example of how certain probabilities are often misestimated intuitively.
We ask: “What is the probability that at least two people in a group of \(k\) people share the same birthday?”
To answer this, we first consider the probability that no two people share a birthday:
For 2 people: \({365 \over 365} \cdot {364 \over 365}\)
For 3 people: \({365 \over 365} \cdot {364 \over 365} \cdot {363 \over 365}\)
etc.
This probability approaches 0 as \(k\) increases. Thus, we can answer our question as follows:
\[P(\text{same})=1-P(\text{different}) \Leftrightarrow P(A)=1- \frac{365 \cdot (365-1)\cdot...\cdot (365-n+1)}{365^n} \]Bernoulli Experiment
A Bernoulli experiment is a random experiment with exactly two possible outcomes: success or failure.
A common example is rolling a die. We are only interested in whether we roll a 6. That is, rolling a 6 is considered a success, while all other outcomes are grouped together as a failure.
Unlike a Laplace experiment, the probabilities of the outcomes do not necessarily have to be equal. In the example above, the probability of success is \(\frac{1}{6}\).