## Sunday, 30 March 2014

### Sequence of random variables of smaller variance in average

Define the problem as follows.

Suppose we have a sequence of random variables $X_1, X_2,...$ that takes value $\left\{ 0,1\right\}$ with $\lim _{n\rightarrow \infty} \frac{\sum_{i=1}^n X_i}{n} = 0.5$. i.e. In long term this is a fair coin toss. Design an mechanism so that for some $n$ and $k$, the distribution of $\sum_{i=n}^{n+k-1} X_i$ has a smaller variance.

It is very clear that if the variables are independent then the distribution is always fixed. Therefore we focus on dependent variables, in particular we consider creating a Markov Chain, that $X_{n+k}$ depends on $\frac{\sum_{i=n}^{n+k-1} X_i}{k}$. We call this dependence function be $f: [0,1]\rightarrow [0,1]$, which is rotational symmetric around $(0.5,0.5)$.

In order to investigate the effectiveness of our mechanism we have to go deep into the calculations because the mean calculations won't work (we need variance!)

Consider $k=2$ with $f(x) = 0.9 - 0.8x$ and the four states are $00,01,10,11$. Then the transition matrix is given by:

$P = \begin{bmatrix} 0.1&0&0.5&0\\0.9&0&0.5&0\\0&0.5&0&0.9\\0&0.5&0&0.1\end{bmatrix}$

Where the transition matrix assuming independence equal to

$P = \begin{bmatrix} 0.5&0&0.5&0\\0.5&0&0.5&0\\0&0.5&0&0.5\\0&0.5&0&0.5\end{bmatrix}$

Since $P$ represents a regular Markov chain it's clear that it has an eigenvalue 1 with the steady state vector $\frac{5}{28}(1,1.8,1.8,1)^T$. Comparing with $(0.25,0.25,0.25,0.25)^T$ this one has a high tendency at the central.

What about higher k? Consider the case $k=3$ with 8 states $000,001,010,011,100,101,110,111$, where

$P =\begin{bmatrix} \frac{1}{10}&0&0&0&\frac{11}{30}&0&0&0\\ \frac{9}{30}&0&0&0&\frac{19}{30}&0&0&0\\ 0&\frac{11}{30}&0&0&0&\frac{19}{30}&0&0\\ 0&\frac{19}{30}&0&0&0&\frac{11}{30}&0&0\\ 0&0&\frac{11}{30}&0&0&0&\frac{19}{30}&0\\ 0&0&\frac{19}{30}&0&0&0&\frac{11}{30}&0\\ 0&0&0&\frac{19}{30}&0&0&0&\frac{9}{10}\\ 0&0&0&\frac{11}{30}&0&0&0&\frac{1}{10}\\\end{bmatrix}$

With steady state vector, according to mathematica, $c(0.161905, 0.397403, 0.397403, 0.397403, 0.397403, 0.397403, 0.397403, 0.161905)^T$ (where $c$ is the constant to make the sum of the probabilities 1.) Surely there is a higher tendency to stay at the centre, but it looks less centre-located than $k=2$. Of course it's very hard to determine the variance through the steady state vector from the Markov chain as every states represents a series of random variables, but we can plot the distribution for it as follows. Here we sample 10000 times for $\sum_{i=1}^{100} X_i$.

Here we see the black line as the standard binomial distribution, red as $k=2$, green as $k = 8$, blue as $k=25$ and mathematically the line tends to the standard binomial distribution as $n \rightarrow \infty$. If we take a function that is closer to $f(x) = 0.5$ (take the distance in inner product space, if you like) it will, of course, closer to the original distribution.

For the above graph, red refers to $y = 0.9-0.8x$, green is $y = 0.8-0.6x$ while blue is $0.7-0.4x$, and the difference is obvious.

Convexity also affect the ability of 'forcing' the RV back to the mean using the same idea (distance between functions). The variance is naturally smaller if $f(x)$ is larger approaching to $0.5^-$ and smaller from $0.5^+$. If you do some Fourier analysis we can 'force' the random variable sequence into a perfect alternating 0 1 0 1 0 1 0 1!!!

The red line refers to $f(x) = 0.5 + 0.4((1-x)^5-x^5))$, the green line being $f(x) = 0.9-0.8x$ and the blue line is $f(x) = 0.5+0.4\cos (\pi x)$.

Let's demonstrate the Fourier series correlation here. For the function $f: [0,1]\rightarrow [0,1]$ where $f(x) = 1$ for $x\in [0,0.5]$ and $f(x) = 0$ for $x\in (0.5,1]$ we have $f(x) = \frac{1}{2} + \sum_{k=1}^{\infty}\frac{2\sin ((4n-2)\pi x)}{(2n-1)\pi}$. Turncating the values that exceeds 1 and below 0 we have the following distribution by taking $1,2,3$ terms with $k=15$:

We put a large graph here because it takes great effort for us to separate the three lines. Zooming gives

We see (once more) that Fourier series converges very quick despite the occurrence of Gibbs' phenomenon here. There's an interesting question that why does the distribution converges to something non-trivial instead of going to the impulse $\delta (\frac{n}{2})$, where the sequence is perfectly 1 0 1 0 ......? I'll leave this question to the reader as the solution is quite simple.

I don't intend to 'teach' something here but creating a sequence of dependent random variables is a very practical topic and the variance could play an important role. For instance in some round-based RPG battle you certainly don't want your game to be dominated by randomness, so it makes sense to restrict the randomness so that it's about the same for both sides over most if not all short periods. The faster converging speed is what we want to tweek the variance.

Foods for thought.

1) Why does the distribution using Fourier series as $f(x)$ does not converge to an impulse, where the sequence of distribution has no variance? You may want to change the following variables to see what happens (a) $n$, the length of sequence simulated (b) $k$ the length of correlation (c) the number of terms in the Fourier series or the turncation (d) numerical precision.

2) It is possible to calculate the variance directly from the Markov chain steady state probability?

3) By inverting  the idea we can't create sequence of RVs of larger variance directly, for instance, $f(x) = 0.5-0.4\cos (\pi x)$ doesn't give a sequence of larger variance. Think about if it is possible, then see if there are practical applications for it.

4) For the Markov Chain case $k=3$, why does the steady state probability for all 6 states except $000,111$ are equal? (i.e., That looks 'unusual'.)

R code, for $f(x)$:

f  = function(n,k,g) {
s = NULL
for(i in 1:k) {
s = c(s,rbinom(1,1,0.5))
}
for(i in (k+1):n) {
prev_mean = mean(s[(i-k):(i-1)])
s = c(s,rbinom(1,1,g(prev_mean)))
}
return(sum(s))
}