Wednesday, 26 June 2024

DP and Markov Chain

Decided to give a little push on projecteuler. Almost 10 years since I actively solve problems. Many new problems have been added but the average difficulty drops a bit comparing to P200-300. This is good as the site does not necessarily serve the purpose of asking harder and harder questions to defeat the best math programmer, but a site where most can enjoy.

But the core idea has not changed: it's most likely about reducing complexity using maths plus some searching optimization. The math behind has been the same: everything about number theory plus a bit of geometry and combinatorics. On the searching side nothing changed as well: it's always dynamic programming -- if it isn't then probably the math part alone suffices to solve without the need to search.

Opposite to those purely mathematical problems, there are some purely computational problems too. Well, maybe not that computational like P345 "Matrix sum" where you find maximum trace (sort of) in a large matrix, but there are numerical problems with zero mathematical insight involved.

I wanted to use the example of cycle number aka friends of 142857 but there is still minimal maths behind like its relations to the numbers $10^k-1$ (which is also related to questions concerning repeating periods). But we can go much less mathematical. How about numbers that used all 9 digits? Primes with a particular appearance (not mathematical properties)? Fortunately there is a framework to solve those.

I solved about 20 problems recently and half of them done using the same framework. It's so handy that it's worth a talk, and perhaps it will guide you to solve those questions as well.

*

It is all about dynamic programming. When we first encounter DP we usually learn to solve backwards like all those shortest path problems. But it is also possible to recursively solve from the front. Mathematically there is a fancier name: Markov chain.

The idea is the same: we recursively calculate the state at each iteration till we need reach the end. We first define the initial state $A_0$. For each iteration we calculate $A_{k+1}$ from $A_k$ and extract information from the state if necessary. Terminating at $A_n$ would give the answer immediately. 

Example 1. Consider a simple random walk problem where an ant starts from position zero, where he can only walks by +2, +3 or -1 each turn with equal probability. There are traps on each odd (positive) prime where the ant will get stuck and stop moving. Can you give the expected position of stuck ants in 10000 turns?

A typical absorbing Markov chain. What's not typical is the randomly assigned "odd prime traps". Without those traps it's actually easy to find the chance for an ant to reach position $k$ at turn $n$: we have the equations $2a + 3b - c = k$ and $a+b+c = n$ with a dimension 1 solution space. Combining with the non-negativity condition should give a set of easily summable "ways". Adding across $n$ should give the chance of reaching the position on or before turn $n$...that is, without assuming the traps.

Instead, we set up a dictionary $X$ with a single value: $X[0] = 1$. In each iteration we read $X[i] = p_i$. If $i$ is not an odd prime, assign(add) the probability to a temporary dictionary $X'[i-1], X'[i+2], X'[i+3] += p_i/3$. Of course there are technical things to take care of like tracing the elements in $X$ if we cannot alter that inside a loop, but that should be routine to take care of.

Example 2. The same from example 1, except that the ant has a protective mechanism that allows it to land on a trap once and keep proceeding.

In that case, the state shall store another information other than the position: the number of times of traps that the any has stepped on. From $X[(i,j)] = p_{ij}$ where $i$ is not odd prime or $j \leq 1$ we send to $X'[(i',j')]$ where $j' = j+1$ is $i'$ is an odd prime.

Example 3. Consider a random binary array of length $n$. Compute the probability $p(n,k)$ of the array containing a monotone (i.e. just 0 or just 1) subarray of length $k$ up to a certain precision (call that a streak of length $k$).

Again a problem seemingly possible by direct calculation: fixing a subarray of length $k$ should give a probability of $(n-k+1)2^{-k}$, but you will face a wall of inclusion-exclusion consideration, plus the magnitude of numbers that you work with increases exponentially with $n$ and $k$.

Instead taking light from example 2, we only need two pieces of information: the last bit (also equal to bit in the recent streak) and recent streak length.

There will be $2k-2$ states in the form $(i,j)$ where $i \in \left\{ 0, 1\right\}$ and $0\leq j \leq k-1$. For each $X[(i,j)] = p_{ij}$ assign $p_{ij}/2$ chance to $X'[(1-i, 1)]$ and another half to $X'[(i,j+1)]$. If $j+1 = k$ sent that to the absorbing state instead.

In that way we can get arbitrarily accurate answer if we run every iteration manually, but $n$ can easily be much larger -- that's why we were merely asked for the probability up to a certain precision. We need an approximation.

Notice that a streak of length $k$ itself appears with a chance of $2^{-k}$. Even with the freedom of appearing anywhere in the streak, the range of $k$ so that $p(n,k)$ is non-negligible (i.e. not 0 or 1 rounded by error allowance) should be around $c\log _2(n)$. That is, sensible $k$ is very small comparing to $n$ for any reasonably large $n$. Then the chance of a $k$ streak appearing in an array of $2n$ bits is roughly equal to the streak appearing at either the first $n$ or the last $n$ bits. As a result we approximate $p(n, k)$ with $1-(1-p(\alpha,k))^{n/\alpha}$. 

The $\alpha$ is a threshold where we would like to simulate and should run in acceptable time. Take a large enough threshold and we should be done...right? Not yet. It seems like the approximates aren't precise enough. We can do, for example, 5 millions iterations at best within reasonable time. But that does not stop an error of order $10^{-6}$ from creeping around. We need the fix below:

Since this is an absorbing Markov chain the probability is strictly increasing (or decreasing depending on the way you view it) with $n$, and probability is moving at an exponentially decaying rate (you can verify this). We can exploit that to do an ad-hoc improvement: set a few checkpoints along the iteration to $p(\alpha, k)$ to approximate the convergence rate, then apply geometric sum to fix the probability in one go. It is likely to cause error of an order lower, but who cares?

And of course, we can apply the same idea to the dual of probability: success trials vs total trials. The same iteration method can be applied except we add them instead of splitting the chances.

Example 4. How many $20$-digit numbers (no leading zeros) such that no digit occurs no more than 3 times in it?

Well, partitions and multinomials should help better especially if we go way beyond such parameter. But what can we do without assuming those?

The states this time should be tuples $(a_0,...,a_9)$ where $a_i \leq 3$. For each iteration $X[(a_0,...,a_9)]$ is added to the states with one of $a_i$ higher by 1 (and still not exceeding 3). There are 9 initial states at the beginning representing the 9 single digit numbers. 

*

And that's it! I said at the beginning that the technique is widely applicable to many questions on projecteuler or other sites and that's not a bluff. Check the range P151-200 if you want a few easy scores ;).

Markov chains and DP aside, I just came across to the concept of semiconvergents when approximating $\sqrt{n}$ by fractions. Either I forgot everything about it on lessons or that it's new to me. I knew the convergents are the best approximations, but those marginally non-semiconvergents are so nasty! 

Suppose that $p_{k}/q_{k}$ are convergents for $\sqrt{n}$. Let $a_k$ be the corresponding continued fraction coefficients. It turns out that for any $[(a_k+1)/2] < r < a_k$, $(p_{k-1}+rp_k)/(q_{k-1}+rq_k)$ are better convergents than $p_k/q_k$ (but at a bigger denominator) called semiconvergents.

More interestingly, when $r = [(a_k+1)/2]$ the resulting fraction may or may not be a semiconvergent! I was so annoyed that such marginally non-semiconvergents are so close to the previous convergent that usual decimal calculation failed to distinguish them. I finally took the approach as follows:

Consider two fractions $p/q$ and $p'/q'$, assume that $p/q$ is larger. To avoid sign shenanigans I compare $\sqrt{n}$ to their average $(pq'+p'q)/(2qq')$. If $\sqrt{n}$ is larger then it's closer to the larger fraction and vice versa. I don't want any square root as well, so I compare the square of them. That is, to compute $(pq' + p'q)^2 - 4q^2q'^2n$. It turns out that if we are comparing the convergent and the next potentially semiconvergent $(p_{k-1}+rp_k)/(q_{k-1}+rq_k)$ where $r = [(a_k+1)/2]$, that difference is always $\pm 1$!!! If you still remember that $p_kq_{k+1}-q_kp_{k+1}$ is also $\pm 1$ you know these two fractions are really as close as they can get.

After few hours of number theory revision plus few hours of coding I finally got the right answer, only to find that Sage has the full packet to automate all these...

No comments:

Post a Comment