2. Definition of Probability

Tim Smith; Shannon Levesque

2. Definition of Probability

2.1 Introduction from examples

When studying probability theory, it is very important to consider the perspective we have when investigating problems. As engineers or scientists, it is expected to have solution values that predict exactly when or exactly where some event will occur, i.e. deterministic solutions. However, in probability we do not have such solutions or problems; rather, we define the likelihood of outcomes. This begins with how we define our variables; namely we define a random variable (RV) as a number whose value depends on the outcome of a random experiment. The key point here is that the outcome of the experiment are random and not deterministic. A good example is the lottery: the odds say it is extremely unlikely to win but that does not mean you will not win. We do not know the outcome until the experiment of the lottery numbers being drawn is conducted.

There are, generally speaking, two “kinds” of variables: discrete variables and continuous variables. One of the simplest illustrations to demonstrate the difference between these two kinds of variables can be illustrated from a typical classroom situation; namely, the number of students in the class is a discrete variable while the time of the class is a continuous variable. A discrete variable is one that is finite and countable. For example, no matter how large the class is the number of students is countable! We also notice that the number of students is a finite discrete value, identified by a positive integer, as you can think when new students enter the class there is either 1 student or 2 students, but no in-between value such as a half of a student. On the other hand, a continuous variable is one that is infinite. For example, time is an infinite continuum. A student who is studying theoretical physics will be very interested to dialog about the matter of time as a variable and observable measurements. However, for simplification of the idea let us just look at one interesting property of continuous real numbers. There is a famous mathematical axiom that states between any two real numbers there is always at least one more value. Hence, any interval on the real number line contains an infinite number of values. Now, this idea can be illustrated by just considering two moments in time. Let us have a starting time of t=1 second and an ending time of t=2 seconds, and in doing so we see that there is a halfway point:

= ½[latex]\left( {1{\rm{\;}} + {\rm{\;}}2{\rm{\;}}} \right) = 1.5[/latex]

Repeating this process using the original starting time of t=1 second but a new ending time of t=1.5 seconds we see that there is a new halfway point:

= ½[latex]\left( {1{\rm{\;}} + {\rm{\;}}1.5{\rm{\;}}} \right) = 1.25[/latex]

Repeating this process once more using again the original starting time of t=1 second but a new ending time of t = 1.25 seconds we see that there is a new halfway point:

= ½[latex]\left( {1{\rm{\;}} + 1.{\rm{\;}}25{\rm{\;}}} \right) = 1.125[/latex]

As you can see, this process could go on indefinitely, hence proving that between any two values of a continuous variable there are infinitely many points.

The prior result is very interesting from a pure mathematical number theoretical point of view alone, but it also yields one very interesting probability result for us to take note of: the probability of our RV being any one single value is exactly equal to zero. While this will be developed more formally later on, we can see the idea as follows using the classical definition of probability when it is applied to the sample space being any interval from the real number line:

[latex]P\left( A \right) = \frac{{size\;of\;\left( {AKA\;number\;of\;elements\;in} \right)\;sample\;A\;}}{{size\;of\;sample\;space\;\Omega }} = \frac{1}{\infty } = 0.[/latex]

2.2 Theoretical vs experimental definition

Prior to beginning our formalization of probability, we must first summarize some key terminology.

Definition 2.2.1 – A simple event of an experiment under consideration in an application of probability

a single outcome of the experiment under consideration.

Definition 2.2.2 – An event of an experiment under consideration in an application of probability

a collection of one or more simple events.

For example, if you were considering the experiment of drawing a card from a deck of 52 cards, a simple event would be the ace of spades from this deck of cards, as that is a single outcome (card). However, an event, which is a collection of one of more simple events, could be drawing an ace card, as the outcome of an ace technically consist of four simple events.

Definition 2.2.3 – The sample space of an experiment under consideration in an application of probability

All possible outcomes. The symbol 𝝮 is often utilized to identify the sample space.

Now that we have defined the events and sample space, we can proceed to formalize our definition of probability!

Definition 2.2.4 – The theoretical definition of probability related to an experiment under consideration

[latex]P\left( {event} \right) = \frac{{number\;of\;favorable\;outcomes}}{{total\;number\;of\;outcomes\;in\;sample\;space}}[/latex]

Which is often written as

[latex]P\left( {event} \right) = \frac{{number\;of\;simple\;events\;in\;E}}{{size\;of\;\Omega }}[/latex]

For example, if we were playing a game of cards and wanted to find the probability of drawing an ace on a random trial, we would identify that there are 4 simple events in E, and the size of the sample space is 52. Hence, we can find

[latex]P\left( {event} \right) = \frac{{number\;of\;simple\;events\;in\;E}}{{size\;of\;\Omega }} = \frac{4}{{52}} = 0.0769..\;\;\left( {7\% } \right).[/latex]

It is very important to note here that we should not round up the solution, often a common practice is to cut, not round, solutions at the second decimal place and report the solution as a whole probability. Moreover, it may be desirable to keep more accuracy, perhaps 7.6% or 7.69% etc, but we should never round the solution up! While rounding 7.69% up to 7.7% (or perhaps rounding 7.6% up to 8%) may seem like an insignificant detail, in some real world applications such details can have serious consequences! Thus, for consistency in this text we will always follow the rule of “being a conservative statistician,” and always report our solutions as whole percentages, which are cut, not rounded, at the second decimal from our numerical results obtained.

Definition 2.2.5 – The empirical (or experimental) definition of probability related to an experiment under consideration

[latex]P\left( {event} \right) = \frac{{number\;of\;sucess}}{{numer\;of\;trials\;}}[/latex]

Which is often written as

[latex]P\left( {event} \right) = \frac{{number\;of\;simple\;events\;in\;E}}{{size\;of\;\Omega }}[/latex]

For example, if we were playing a game of cards that had four players which exhausted the deck, hence each player got 13 cards, and we got the hand: ace of hearts, jack of spades, 9 of hearts, 8 of clubs, 7 of spades, 6 of hearts, 5 of clubs, 4 of spades, 3 of hearts, 2 of clubs, king of spades, queen of spades, and the ace of spades. Then we could compute the probability of getting an ace as

[latex]P\left( {event} \right) = \frac{{number\;of\;simple\;events\;in\;E}}{{size\;of\;\Omega }} = \frac{2}{{13}} = 0.153..\;\;\left( {15\% } \right).[/latex]

Now, at this point students often find these results a bit confusing. A common question that arises is “why are they not the same?”. The general answer is the theoretical probability tells you what should happen, and the empirical probability tells you what did occur. Usually they are closer, but the bottom line is we cannot predict the future and there is always something that is left to chance; hence the reason why you often see people buying the lottery at the gas station. While the probability of actually winning is minuscule, there is still a chance. The point here is that the outcome of each event is random! The theoretical definition of probability tells us that for every 52 cards we should get 4 aces, i.e. for every 13 cards we should get 1. However, that does not tell us this will happen, but rather it just defines a likelihood of it occurring. The truth here – and the important point which separates probability theory from applications of deterministic mathematical models such as differential equations – is that outcomes of experiments under consideration in probability theory are random: a person could play this game of cards all day and never actually get an ace but another person could play this game of cards once and get a great hand like the one outlined here with two aces. However, there is one important “fine print” to keep in mind: the law of large numbers states that as the number of trials gets larger, the empirical definition will be approximately the same as theoretical definition. For example, if we draw 13 cards and get 2 aces, it is a big deal that we got one extra ace than expected. However, if we draw 1300 cards and get 101 aces, it is NOT a big deal that we got one extra ace than expected.

In general, we will use the theoretical definition of probability when working out application problems here. In general, the results are often presented in a “PDF” probability chart. For example, the outcomes of a single card drawn could be presented as

X	P(X)
Ace	4/52 ≈ 0.07
Other Face Card (King, Queen or Jack)	12/52 ≈ 0.23
Other Card (regular number card)	36/52 ≈ 0.69

It is very important to note that not every probability chart is valid. In order for a probability chart (or the function used to create the probabilities) to be valid, it must satisfy the following:

Definition 2.2.6 – Valid Probability Function Requirements

All P(X) values must be valid probability values: 0 < P < 1
The sum of all P(X) values must be 100%: ∑P= 1

It is worthy to note that in many applications the condition (ii) will not exactly be one, but it should be extremely close. For example, looking at our probability function for the game of cards the sum would be ≈1 as 0.07 + 0.23 + 0.69 = 0.99, but this is of course just due to truncating.

2.3 Mathematical FORMALISMS

Suppose that [latex]\Omega = \left\{ {{e_1},{e_2}, \ldots ,{e_N}} \right\}[/latex]

is a finite sample space. Then the probability of an event E is the sum of the probabilities of all of the simple events contained in E. In symbols, this yields the definition

Definition 2.3.1 – The theoretical definition of probability related to an experiment under consideration

[latex]P\left( E \right) = \mathop \sum \limits_{{e_k}\;in\;E}^{} \;\;\;\;{P_k}[/latex]

Now, it is interesting to notice that if all the events

[latex]{e_k}[/latex]

are equally likely, then every [latex]{P_k}[/latex]is the same, namely [latex]{P_k} = \frac{1}{N}.[/latex]

Thus, in this case the definition becomes

[latex]\mathop \sum \limits_{{e_k}\;in\;E}^{} \;\;\;\;{P_k} = \mathop \sum \limits_{{e_k}\;in\;E}^{} \;\;\;\frac{k}{N}[/latex]

which, by calling k the number of events, yield

[latex]P\left( E \right) = \frac{{number\;of\;simple\;events\;in\;E}}{N}.[/latex]

It is interesting to compare this definition to our prior definition in the last section with the main difference being this result only applies if all of the events are equally likely to occur. Moreover, on the latter definition, if we notice that N is the same as the size of the sample space, then the definition 2.3.1 is exactly the same as those from the prior section. However, if the events are not equally likely, then only the first definition can be applied, which will be illustrated in the next two examples.

Example 2.3.1

Use proper mathematical notation to both compute and set up the corresponding formulas needed to find the probability of drawing a heart card and separately a club card into a two card hand where each draw is taken from a separate 52 card deck, hence we can assume independence.

To solve this, we need to first define the events. Let us call the first event, the event of drawing a heart card, e₁. Doing so we can identify p₁ to be 1/4 which is equivalent to 13/52 or the number of heart cards divided by N. Likewise, we call the second event, the event of drawing a heart card, e₂. Doing so we can identify p₂ to be 1/4 which is equivalent to 13/52 or the number of heart cards divided by N. Now, the formal definition yields

[latex]\mathop \sum \limits_{{e_k}\;in\;E}^{} \;\;\;\;{P_k} = {P_1} + {P_2} = \frac{1}{4} + \frac{1}{4} = 0.5\;\left( {50\% } \right)[/latex]

Alternatively, this could have been computed using the prior definition

[latex]P\left( {event} \right) = \frac{{number\;of\;hears + number\;of\;clubs\;}}{{size\;of\;\Omega }} = \frac{{26}}{{52}} = 0.5\;\left( {50\% } \right).[/latex]

Example 2.3.2

Use proper mathematical notation to both compute and set up the corresponding formulas needed to find the probability of drawing a blackjack on the second card, assuming you are holding a King, and then separately winning in a single roll on a 38 slot roulette wheel.

To solve this, we must first define the events. Let us call the first event, the event of winning the blackjack, e₁.This would actually be to draw an Ace from the remaining 51 cards, hence we can identify p₁ to be 4/51. Now, we call the second event, the event of winning on the roulette wheel, e₂. Doing so we can identify p₂ to be 1/38 since the only way to win is if the ball falls into the single slot chosen. Now, the formal definition yields

[latex]\mathop \sum \limits_{{e_k}\;in\;E}^{} \;\;\;\;{P_k} = {P_1} + {P_2} = \frac{4}{{51}} + \frac{1}{{38}} \approx 0.1\;\left( {10\% } \right)[/latex]

This cannot be rewritten using the prior definition, as the two events are not equal likely; moreover, it is important to note that what is computed here is not the “and probability”. The interpretation of this result is just a probability sum of both probabilities and it is not representing the probability that someone would win both games in sequence. Moreover, while we will not be covering the definitions in full detail, nor discussing examples, the following formulas can be used to calculate the probability of sequential events given that the probability of the first event, which we will call A, is known in addition to the probability of the second event, which we will call B.

Definition 2.3.2 – Addition Rule and Multiplication Rule

The addition and multiplication rules state that the probability of event A or event B occurring is found to be

[latex]P\left( {A\;or\;B} \right) = P\left( A \right) + P\left( B \right) - P\left( {A\;and\;B} \right)[/latex]

while, under the assumption of independence, the probability of event A and event B in sequence is found to be

[latex]P\left( {A\;and\;B} \right) = P\left( A \right) \cdot P\left( B \right)[/latex]

It is worthy to note that in some examples the union & intersection notations are utilized, hence the probability of event A or event B occurring is often rewritten as

[latex]P\left( {A\; \cup \;B} \right) = P\left( A \right) + P\left( B \right) - P\left( {A\; \cap \;B} \right)[/latex]

likewise the probability of event A and event B in sequence, under the assumption that the events are independent, is written as

[latex]P\left( {A\; \cap \;B} \right)=P\left( A \right) \cdot P\left( B \right)[/latex]

Also, it is often common to see the notation

[latex]P\left( {A'} \right)[/latex]

which reference to the complement A, e.g

[latex]P\left( {A'} \right) = 1 - P\left( A \right)[/latex]

For example, if event A is drawing an ace from a standard deck of cards then

[latex]P\left( {A'} \right) = 1 - P\left( A \right) = 1 - \frac{4}{{52}} = \frac{{48}}{{52}}[/latex]

which is the probability of drawing any card other than an ace from a standard deck of cards.

Now, at this point we will not dive into the details, it is worthy to note that the formula given above for the “AND” probability is only valid under the assumption that event A is independent of event B, e.g. the outcome of one of the events does not have any impact on the outcome of the other. This is not always the case. For example, consider the case of drawing two cards from a deck of 52 cards. Let’s call the first draw event A and the second draw event B. Say we wanted to find the probability that the second card was a King, it is reasonable to conclude that

[latex]P\left( B \right) = \frac{4}{{51}}[/latex]

since there would only be 51 cards remaining, but this would be on the assumption that we knew the first draw was not a king. If the first draw was a king then we would conclude that

[latex]P\left( B \right) = \frac{3}{{51}}[/latex]

The situation under consideration here is referred to as conditional probability, and the truth is that it is not really practical to define the probability of B until we know what happened on the first draw. However, we define the conditional probability

[latex]P\left( {B\;|\;A} \right)[/latex]

as the probability of B occurring given that A already did. Moreover, in the case of dependent probabilities we redefine the multiplication rule to be

[latex]P\left( {A\;and\;B} \right) = P\left( A \right) \cdot P\left( B\ |\ A \right)[/latex]

This of course reverts to the prior rule if event B is truly an independent event, as in that case the conditional probability would just be the same as

[latex]P\left( B \right)[/latex]

due to the fact that, in this case event B would not have any dependence on event A.

It is common to see the dependent conditional probabilities formula rewritten as

[latex]P\;\left( {B\;|\;A} \right) = \frac{{P\;\left( {A\;and\;B} \right)}}{{P\left( A \right)}}[/latex]

as often in applications, the conditional probability value is obtained by counting outcomes, which would have the same format as the right hand side here. Moreover, an interesting result, known as Bayes’ theorem, comes if we begin by thinking the labels for A and B are just labels. Now, by reversing them the above could be rewritten as

[latex]P\;\left( {A\;|\;B} \right) = \frac{{P\;\left( {B\;and\;A} \right)}}{{P\left( B \right)}}[/latex]

It is logical to conclude that the probability of A and B is exactly the same as the probability of B and A, so solving the first equation we find

[latex]P(A \space and \space B)=P(A) \cdot P(B|A)[/latex]

Likewise, solving the second equation we find

[latex]P(B \space and \space A)=P(B) \cdot P(A|B)[/latex]

And, equating the two then solving for the conditional probability [latex]P\;\left( {A\;|\;B} \right)[/latex], we obtain the famous Baye’s theorem

[latex]P(A|B)=\frac {P(A) \cdot P(B|A)} {P(B)}[/latex]

The above formula has many applications, especially in business, as it allows one to find a conditional probability in another direction given the other. For example, consider the situation where event A is the event that the future economy will be good, so the market will go up, while event B is the event that an economist gives a good forecast. The conditional probability [latex]P\left( {B\;|\;A} \right)[/latex] would be a probability that could be obtained from prior data, namely the probability that the economist gave a good forecast when the economy was good. However, the conditional probability [latex]P\left( {A\;|\;B} \right)[/latex] would not be possible to obtain as nobody knows what the future of the economy will be, but the trick is by using Bayes’ theorem one can approximate this value!

Chapter 2 Exercises

Jamie is joining a movie club. As part of her introductory package, she can choose from 12 action selections, 10 comedy selections, 7 fantasy selections and 5 horror selections. If Jamie chooses one selection from each category, how many ways can she choose her introductory package?
How many different four-letter secret codes can be formed if the first letter must be an A or B?
In a contest in which 15 contestants are entered, in how many ways can the 4 distinct prizes be awarded? (Meaning there is a different prize for 1^st, 2^nd, 3^rd, and 4^th.)
For the following problems, consider a group of 50 students. There are 8 Computer Engineering (CE) majors, 12 Computer Science (CS) majors, 20 Electrical Engineering (EE) majors, and 10 Software Engineering (SE) majors. There are no dual major students.
- The department chair will pay for 16 students to go to a conference. In how many ways can the 16 students be selected if exactly 4 are selected from each major?
- 8 of the students are lined up from left to right. In how many ways can this be done when we consider their individual names, not their majors?
- 8 of the students are lined up from left to right. In how many ways can this be done if we consider only their majors, and not their names?
Amy, Jean, Keith, Tom, Susan, and Dave have all been invited to a birthday party. They arrive randomly and each person arrives at a different time. Find the probability that Jean will arrive first and Keith will arrive last.
A committee consisting of 6 people is to be selected from eight parents and four teachers. Find the probability that the selected group will consist of all parents.
You are dealt one card from a 52-card deck. Find the probability that you are NOT dealt a jack.
The physics department of a college has 15 male professors, 11 female professors, 7 male teaching assistants, and 5 female teaching assistants. If a person is selected at random from the group, find the probability that the selected person is a teaching assistant or a female.
A card is drawn from a 52-card deck and a fair coin is flipped. What is the probability of drawing a heart and flipping heads?
There are 45 chocolates in a box, all identically shaped. There are 16 filled with nuts, 15 with caramel, and 14 are solid chocolate. You randomly select one piece, eat it, and then select a second piece. Find the probability of selecting 2 solid chocolates in a row.
Numbered disks are placed in a box and one disk is selected at random. If there are 8 red disks numbered 1 through 8, and 2 yellow disks numbered 9 through 10, find the probability of selecting a red disk, given that an even-numbered disk is selected.
The two-way frequency table below shows the preference of sports to watch among males and females of a sample of 150 people.

	Hockey (H)	Basketball (B)	Tennis (T)	Total
Male (M)	41	23	15	79
Female (F)	10	16	45	71
Total	51	39	60	150

Find the following probabilities. Write each answer as a simplified fraction:

- [latex]{\rm{P}}\left( {\rm{T}} \right){\rm{\;}} =[/latex]
- [latex]{\rm{P}}\left( {\rm{F}} \right){\rm{\;}} = {\rm{\;}}[/latex]
- [latex]{\rm{P}}\left( {{\rm{F}} \cap {\rm{T}}} \right) =[/latex]
- [latex]{\rm{P}}\left( {{\rm{F}} \cup {\rm{T}}} \right) =[/latex]>

License

Icon for the Creative Commons Attribution 4.0 International License

A Self-Contained Course in Mathematical Theory of Probability Copyright © 2024 by Tim Smith and Shannon Levesque is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.