We may model the problem as a roulette problem, where we have $N$ balls inside a box in which one of them is special.

The simplest model is the geometric model --- we put the non-special ball back and retry the whole process. It follows a geometric distribution with expectation $\frac{1}{p}$. But this is far too slow, and it has a large variance with $p$ is small.

The second simplest one is the Russian roulette model --- we do not put the non-special ball that we got back to the ball. In such a case for each $k = 1,...,N$, $P(X=k) = \frac{1}{N}$ and we can get

$E(X) = \sum \frac{k}{N} = \frac{N+1}{2}$ and $V(X) = \frac{N^2-1}{12}$

The problem with these methods is that they have a very long tail and has a large variance ($O(n^2)$). (The case for geometric model should be $\frac{1-p}{p^2}\approx \frac{1}{p^2}$ when $n = \frac{1}{p}$ is large) So instead of simply dealing with the balls we have taken out we add another measure. We add another special ball to the pool for every time we didn't get the special ball.

It is clear that doing in this way is much faster because the probability grows approximately linear in the beginning, that allows us to capture the balls quickly rather than compensating the chance at the later stages.

Consider the case where we replace a non-special ball by a special ball every time, then the chance is given by

$$P(X_n = k) = \frac{k^2 n!}{n^{k+1}(n-k)!}$$

It is pretty amazing that the expectation of this random variable is asymptotically extremely close to $\sqrt{n}$. It has a linear variance as well. In fact if you do a linear regression on $E(X_n)^2$ you will get a R-squared value of over 0.999998! It has a linear variance that guarantees is pick in a very narrow range as well. For example this is the graph when $n = 81$:

Another way is to put the non-special ball back but also add another special ball into the ball, then the chance is given by:

$$P(Y_n = k) = \frac{k^2 (n-1)^k (n-2)!}{(n+k-1)!}$$

One may observe that they operates almost the same, especially when $k$ is small, therefore the following questions come naturally:

1) Are there any closed formula for $E(X)$? As it involves factorial and exponential summation this does not look obvious.

2) Is $\lim _{n\to \infty} \frac{E(X_n)^2}{n} = \lim _{n\to \infty} \frac{E(Y_n)^2}{n}$? If yes we can ask a strong question, does $\lim _{n\to \infty} \frac{E(X_n)^2-E(Y_n)^2}{\sqrt{n}}$ converges? Empirically both $\frac{E(X_n)^2}{n}$ and $\frac{E(Y_n)^2}{n}$ hovers around 1.57 and $\frac{E(X_n)^2-E(Y_n)^2}{\sqrt{n}}$ 'looks' converging to 1.66.

Of course in practical sense the first one ($X_n$) is better because the range is finite [promised to be] and it has a smaller variance, but it is the nearly-exact behavior that attracted me to compute such bits.

## No comments:

## Post a comment