Derivation of Hypergeometric Distribution's Expectation and Variance

Let \(X\) denote the number of white balls selected when \(n\) balls are chosen at random from an urn containing \(N\) balls \(K\) of which are white. The distribution of \(X\) is Hypergeometric Distribution. You can find detail description at Wikipedia, but the derivation of Expectation and Variance is omitted. I’ll show the derivation here from two different aspects.

First, Just Do It!
P\{X=i\} &= \frac{{K \choose i}{N-K \choose n-i}}{{N \choose n}} \\
N &\in \{0, 1, 2, \dots\} \\
K &\in \{0, 1, 2, \dots, N\} \\
n &\in \{0, 1, 2, \dots, N\} \\
i &\in \{max(0, n+K-N), \dots, min(n, K)\}\\ \\
E[X] &= \sum iP\{X=i\} =\frac{i{K \choose i}{N-K \choose n-i}}{{N \choose n}} \\
&= K\sum \frac{{K-1 \choose i-1}{N-K \choose n-i}}{{N \choose n}}
We find the latter part in sum is very similar to the sum of the probability of a variable obeying hypergeometric distribution(which will equal to 1). Thus we do some transformations and get,
E[X] &= K\sum \frac{{K-1 \choose i-1}{(N-1)-(K-1) \choose (n-1)-(i-1)}}{\frac{N}{n}{N-1 \choose n-1}} \\
&= \frac{kn}{N} \sum \frac{{K-1 \choose i-1}{(N-1)-(K-1) \choose (n-1)-(i-1)}}{{N-1 \choose n-1}} \\
&= \frac{kn}{N} \\
We can do it recursively and get,
E[X(X-1)] &= \frac{K(K-1)n(n-1)}{N(N-1)} \sum \frac{{K-2 \choose i-2}{(N-2)-(K-2) \choose (n-2)-(i-2)}}{{N-2 \choose n-2}} \\
&= \frac{K(K-1)n(n-1)}{N(N-1)} \\
Var(X) &= E[X^2] – E^2[X] = E[X(X-1)] + E[X] – E^2[X]\\
&= \frac{Kn(N-K)(N-n)}{N^2(N-1)}

It’s very time-consuming. Though hardcore, it works. Next I want to introduce a smarter way.

Second, Divide and Conquer!

Let \(X_i\) to be the \(ith\) ball in the N balls and \(X_i = 1\) if it’s white, it’s easy to see that \(P(X_i = 1) = \frac{K}{N}\) which is independent from the position \(i\) and whether the balls was removed after chosen. So we get,
E[X_i] &= \frac{K}{N}\\
Var(X_i) &= \frac{K(N-K)}{N^2}
And we can easily get the expectation,
$$ E[X] = \sum_{n}E[X_i] = \frac{Kn}{N} $$
Though all the \(X_i\) obeys the same distribution, they are not independent. To calculate the variance, we have to compute their covariance first.
Cov(X_i, X_j) &= E[X_i X_j] – E[X_i]E[X_j] \\
&= P\{X_i = 1, X_j = 1\} – \frac{K^2}{N^2}\\
&= P\{X_i = 1 | X_j = 1\}P\{X_j = 1\} – \frac{K^2}{N^2}\\
&= \frac{K-1}{N-1} \frac{K}{N} – \frac{K^2}{N^2}\\
&= -\frac{K(N-K)}{N^2(N-1)}\\
Var(X) &= \sum Var(X_i) + \sum_{i\neq j}Cov(X_i, X_j) \\
&= n\frac{K(N-K)}{N^2} – n(n-1)\frac{K(N-K)}{N^2(N-1)} \\
&= \frac{Kn(N-K)(N-n)}{N^2(N-1)}

A Little More 🙂

Since all \(X_i\) obeys the same distribution, and is independent from the position \(i\) and whether the balls was removed after chosen, we can also conclude that “Draw Lots” is a fair game. Regardless the order you draw the lot, you get the equal probability.

Leave a Reply

Your email address will not be published. Required fields are marked *

The maximum upload file size: 32 MB. You can upload: image, audio, video, document, spreadsheet, interactive, text, archive, code, other. Links to YouTube, Facebook, Twitter and other services inserted in the comment text will be automatically embedded. Drop file here