Random Variables and Great Expectations
Some problems involving probability can seem really messy. For example, if we randomly put 100 letters into 100 addressed envelopes, on average how many letters will end up in their correct envelopes? Or, on average, in a class of size m, how many pairs of people will share the same birthday?
To solve such problems, we can turn to a formidable tool in probability called "linearity of expectation." To use this tool, we have to understand the concept of a random variable.
Random Variables
We begin with the notion of a finite probability distribution

, which consists of a finite set

of elements, or samples, where each

has a weight, or probability,

.
Example
Let the probability of heads coming up be

. For HH, the probability is

; for TT, it is

; for TH, it is

; and for HT, it is

.
Recall that an event is a subset of the sample space. So, if a sample space consists of seven samples with the following probabilities

and the event

is the subset of elements with the following probabilities

, then

.
Running example
We throw a white die and a black die. For this example, we have the following sample space of all 36 possible outcomes:
So, for a uniform distribution, we have

,

.
Suppose that the event in which we are interested is all outcomes that produce a sum of 3 or less:

= event that sum equals 3 or less.

proportion of

in

.
A random variable is a (real-valued) function on

.
So, a random variable can be used to describe the process of rolling a fair die and the possible outcomes

. Another random variable might describe the possible outcomes of picking a random person and measuring his or her height.
Unlike the usual practice with other mathematical variables, a random variable cannot be assigned a value; a random variable does not describe the actual outcome of a particular experiment, but rather describes the possible, as-yet-undetermined outcomes in terms of real numbers.
There are many possible random variables. Consider the event of tossing a coin. Thus,

, where

is the event in which the coin falls head and

is the event in which it falls tails. Let

be the number of tails in the experiment. Then

is a random variable.
Examples
We throw a white die and a black die. In this sample space, we have many possible choices of random variable.
Let the function

be the value of the white die.

, and so on.
Let the function

be the sum of the values of the two dice.

, and so on.
Let the function

be the value of the white die taken to the power of the value of the black die.

, and so on.
Let

be the function such that it equals 1 if the two dice are equal, 0 otherwise.

, and so on.
Example
Suppose that we toss a fair coin

times. The sample space

consists of all sequences of

. We also have a uniform distribution

on

so

.
Let the random variable

be the number of heads.
For

.
Let the random variable

be 1 if the number of heads equals the number of tails, 0 otherwise.
For

.
In terms of notation, we will use letters such as

and

for events and letters such as

and

for random variables.
We can think of a random variable as a function on the sample space

to the reals

. In other words, we can think of the input to the function as random. Or we can think of a random variable in terms of the induced distribution on

: a variable with a probability distribution on its values. In effect, the randomness is "pushed" to the values of the function.
For two tossed coins, for example, the function

counts the number of heads.
We can also consider it as a distribution on

.

and

. 2 is associated with a probability 1/4, 0 with a probability 1/4, and 1 with a probability 1/2.
Example
Again, we have two dice. The function

is the sum of both dice.
We can also picture it as a probability distribution of

, with the sums as the horizontal axis and the probabilities as the vertical axis.
From Random Variables to Events
For any random variable

and value

, we can define the event

that

.
Examples
For two tossed coins, the function

counts the number of heads.
Suppose that we want the probability that one head and one tail will come up. In this case,

.
For two dice, the function

gives the sum. We want the probability of obtaining 6.
For any event

, there is a simple random variable called the indicator random variable of

, whose value tells us whether or not

has occurred. We define the indicator random variable for

as follows:

if

and

if
Expected Value
The expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome multiplied by its value. In effect, it represents the average amount one "expects" to win per bet if bets with identical odds are repeated many times
The expectation, or expected value, of a random variable

is a weighted average of the possible values of

by their corresponding probabilities. It is written as

, and is given by the expression:
In this expression,

is a function on the sample space

and it also has a distribution on its values.
Example
For two tossed coins, we can write the following expression for the expected value.
What if we flip a coin three times? What is the expected number of heads?
Clearly, however, we cannot obtain 1.5 heads. So,

. Note that the value itself may not be expected in the general sense; it may be unlikely or even impossible to obtain. Moral: Don't always expect the expected:

may be 0!
For an indicator random variable for event

(defined above),

.
Linearity of Expectation
We can add random variables. If

and

are random variables on the same set

, then

is also a random variable.
For instance, we can roll one die and let

be the random variable giving the die's value. We can roll a second die and let

be the random variable giving this die's value. We can then define the random variable

to be the sum of the two dice

.
Example
Consider picking a random person in the world. Let

length of the person's left arm in inches;

length of the person's right arm in inches. Let

.

measures the combined arm lengths.
Formally,

people in world

,

uniform distribution on

.
Two random variables

and

are said to be independent if and only if the value of

has no influence on the value of

and vice versa. Hence. two random variables

and

are independent if for every

the events

and

are independent.
Expected values obey a simple, very helpful rule called linearity of expectation. In its simplest form, it says that the expected value of a sum of random variables is the sum of the expected values of the variables.
If

, then
The great thing about linearity of expectation is that no independence is required. This can be very useful. We often need to work with random variables that are not independent.
Proof
Also, we have

for any constant

.
Exercises
Consider the example of tossing a coin twice. Let

be the first toss and

be the second toss. Let

total number of heads. What is

?

?

?
Or we can let

be at least one coin heads and

to be both coins heads. Are

and

independent? What is

?

?

?
Multiple Variables
Using induction, we can extend the idea of linearity of expectation to encompass

random variables.
In general, the expectation of the sum is the sum of the expectations.
Example
If we randomly put 100 letters into 100 addressed envelopes, on average how many letters will end up in their correct envelopes?
Let

be the event the

letter ends up in its correct envelope. Let

be the indicator random variable for

.

if

occurs. 0 otherwise.
Let

. We are asking for

.
What is

?

.
What is

?

.
So, in expectation, one card will be in the same position as it started. Note that it doesn't depend on how many letters we have! Were the

independent? No. Consider what happens in the case

.
Here is the general approach that we can use for solving such problems.
- View the thing that we care about as the expected value of some random variable.
- Write this random variable as the sum of simpler random variables (typically indicator random variables).
- Solve for their expectations and add them up!
Example
We flip

coins of bias

. What is the expected number of heads?
We could do this by summing:
But now we know a better way.
Let

number of heads when

independent coins of bias

are flipped.
Break

into

simpler random variables.

if the

coin is tails.

if the

coin is heads.
Product of Expectations
What about products of random variables? If

, then

? No!
For example, if

is the indicator random variable for "first flip is heads" and

is the indicator random variable for "1st flip is tails"

.
But it is true if the random variables are independent.
Proof
Suppose that we flip two fair coins, and we let

indicator random variable for first coin being heads,

indicator random variable for second coin being heads, and

indicator random variable for "both are heads."
Note that

and

, so

. In fact,

is called the variance of

.
Most of the time, however, power will come from using sums, mainly because linearity of expectations holds even if random variables are not independent.
Example
On average, in a class of size

, how many pairs of people will have the same birthday?
Suppose that we have

people, each with a uniformly chosen birthday from 1 to 366.
Let

number of pairs of people with the same birthday.
We need to find

.
We use

indicator variables, one for each pair of people.

if person

and person

have the same birthday.

otherwise.
Note that there are many dependencies among the indicator variables. For example,

and

and

are dependent. But when using linearity of expectations, it doesn't matter.
Example
You pick a number

. You roll three dice. If any match

, you win 1. Otherwise, you pay me 1. Want to play?
Let

event that

die matches. Let

indicator random variable for

. Expected number of dice that match:
But this is not the same as Pr(at least one die matches). Pr(at least one die matches) = 1 - Pr(none match) = 1 -

.
In general, be careful that you are modeling the problem correctly. And watch out for carnival games.