Wednesday 26 June 2024

DP and Markov Chain

Decided to give a little push on projecteuler. Almost 10 years since I actively solve problems. Many new problems have been added but the average difficulty drops a bit comparing to P200-300. This is good as the site does not necessarily serve the purpose of asking harder and harder questions to defeat the best math programmer, but a site where most can enjoy.

But the core idea has not changed: it's most likely about reducing complexity using maths plus some searching optimization. The math behind has been the same: everything about number theory plus a bit of geometry and combinatorics. On the searching side nothing changed as well: it's always dynamic programming -- if it isn't then probably the math part alone suffices to solve without the need to search.

Opposite to those purely mathematical problems, there are some purely computational problems too. Well, maybe not that computational like P345 "Matrix sum" where you find maximum trace (sort of) in a large matrix, but there are numerical problems with zero mathematical insight involved.

I wanted to use the example of cycle number aka friends of 142857 but there is still minimal maths behind like its relations to the numbers $10^k-1$ (which is also related to questions concerning repeating periods). But we can go much less mathematical. How about numbers that used all 9 digits? Primes with a particular appearance (not mathematical properties)? Fortunately there is a framework to solve those.

I solved about 20 problems recently and half of them done using the same framework. It's so handy that it's worth a talk, and perhaps it will guide you to solve those questions as well.

*

It is all about dynamic programming. When we first encounter DP we usually learn to solve backwards like all those shortest path problems. But it is also possible to recursively solve from the front. Mathematically there is a fancier name: Markov chain.

The idea is the same: we recursively calculate the state at each iteration till we need reach the end. We first define the initial state $A_0$. For each iteration we calculate $A_{k+1}$ from $A_k$ and extract information from the state if necessary. Terminating at $A_n$ would give the answer immediately. 

Example 1. Consider a simple random walk problem where an ant starts from position zero, where he can only walks by +2, +3 or -1 each turn with equal probability. There are traps on each odd (positive) prime where the ant will get stuck and stop moving. Can you give the expected position of stuck ants in 10000 turns?

A typical absorbing Markov chain. What's not typical is the randomly assigned "odd prime traps". Without those traps it's actually easy to find the chance for an ant to reach position $k$ at turn $n$: we have the equations $2a + 3b - c = k$ and $a+b+c = n$ with a dimension 1 solution space. Combining with the non-negativity condition should give a set of easily summable "ways". Adding across $n$ should give the chance of reaching the position on or before turn $n$...that is, without assuming the traps.

Instead, we set up a dictionary $X$ with a single value: $X[0] = 1$. In each iteration we read $X[i] = p_i$. If $i$ is not an odd prime, assign(add) the probability to a temporary dictionary $X'[i-1], X'[i+2], X'[i+3] += p_i/3$. Of course there are technical things to take care of like tracing the elements in $X$ if we cannot alter that inside a loop, but that should be routine to take care of.

Example 2. The same from example 1, except that the ant has a protective mechanism that allows it to land on a trap once and keep proceeding.

In that case, the state shall store another information other than the position: the number of times of traps that the any has stepped on. From $X[(i,j)] = p_{ij}$ where $i$ is not odd prime or $j \leq 1$ we send to $X'[(i',j')]$ where $j' = j+1$ is $i'$ is an odd prime.

Example 3. Consider a random binary array of length $n$. Compute the probability $p(n,k)$ of the array containing a monotone (i.e. just 0 or just 1) subarray of length $k$ up to a certain precision (call that a streak of length $k$).

Again a problem seemingly possible by direct calculation: fixing a subarray of length $k$ should give a probability of $(n-k+1)2^{-k}$, but you will face a wall of inclusion-exclusion consideration, plus the magnitude of numbers that you work with increases exponentially with $n$ and $k$.

Instead taking light from example 2, we only need two pieces of information: the last bit (also equal to bit in the recent streak) and recent streak length.

There will be $2k-2$ states in the form $(i,j)$ where $i \in \left\{ 0, 1\right\}$ and $0\leq j \leq k-1$. For each $X[(i,j)] = p_{ij}$ assign $p_{ij}/2$ chance to $X'[(1-i, 1)]$ and another half to $X'[(i,j+1)]$. If $j+1 = k$ sent that to the absorbing state instead.

In that way we can get arbitrarily accurate answer if we run every iteration manually, but $n$ can easily be much larger -- that's why we were merely asked for the probability up to a certain precision. We need an approximation.

Notice that a streak of length $k$ itself appears with a chance of $2^{-k}$. Even with the freedom of appearing anywhere in the streak, the range of $k$ so that $p(n,k)$ is non-negligible (i.e. not 0 or 1 rounded by error allowance) should be around $c\log _2(n)$. That is, sensible $k$ is very small comparing to $n$ for any reasonably large $n$. Then the chance of a $k$ streak appearing in an array of $2n$ bits is roughly equal to the streak appearing at either the first $n$ or the last $n$ bits. As a result we approximate $p(n, k)$ with $1-(1-p(\alpha,k))^{n/\alpha}$. 

The $\alpha$ is a threshold where we would like to simulate and should run in acceptable time. Take a large enough threshold and we should be done...right? Not yet. It seems like the approximates aren't precise enough. We can do, for example, 5 millions iterations at best within reasonable time. But that does not stop an error of order $10^{-6}$ from creeping around. We need the fix below:

Since this is an absorbing Markov chain the probability is strictly increasing (or decreasing depending on the way you view it) with $n$, and probability is moving at an exponentially decaying rate (you can verify this). We can exploit that to do an ad-hoc improvement: set a few checkpoints along the iteration to $p(\alpha, k)$ to approximate the convergence rate, then apply geometric sum to fix the probability in one go. It is likely to cause error of an order lower, but who cares?

And of course, we can apply the same idea to the dual of probability: success trials vs total trials. The same iteration method can be applied except we add them instead of splitting the chances.

Example 4. How many $20$-digit numbers (no leading zeros) such that no digit occurs no more than 3 times in it?

Well, partitions and multinomials should help better especially if we go way beyond such parameter. But what can we do without assuming those?

The states this time should be tuples $(a_0,...,a_9)$ where $a_i \leq 3$. For each iteration $X[(a_0,...,a_9)]$ is added to the states with one of $a_i$ higher by 1 (and still not exceeding 3). There are 9 initial states at the beginning representing the 9 single digit numbers. 

*

And that's it! I said at the beginning that the technique is widely applicable to many questions on projecteuler or other sites and that's not a bluff. Check the range P151-200 if you want a few easy scores ;).

Markov chains and DP aside, I just came across to the concept of semiconvergents when approximating $\sqrt{n}$ by fractions. Either I forgot everything about it on lessons or that it's new to me. I knew the convergents are the best approximations, but those marginally non-semiconvergents are so nasty! 

Suppose that $p_{k}/q_{k}$ are convergents for $\sqrt{n}$. Let $a_k$ be the corresponding continued fraction coefficients. It turns out that for any $[(a_k+1)/2] < r < a_k$, $(p_{k-1}+rp_k)/(q_{k-1}+rq_k)$ are better convergents than $p_k/q_k$ (but at a bigger denominator) called semiconvergents.

More interestingly, when $r = [(a_k+1)/2]$ the resulting fraction may or may not be a semiconvergent! I was so annoyed that such marginally non-semiconvergents are so close to the previous convergent that usual decimal calculation failed to distinguish them. I finally took the approach as follows:

Consider two fractions $p/q$ and $p'/q'$, assume that $p/q$ is larger. To avoid sign shenanigans I compare $\sqrt{n}$ to their average $(pq'+p'q)/(2qq')$. If $\sqrt{n}$ is larger then it's closer to the larger fraction and vice versa. I don't want any square root as well, so I compare the square of them. That is, to compute $(pq' + p'q)^2 - 4q^2q'^2n$. It turns out that if we are comparing the convergent and the next potentially semiconvergent $(p_{k-1}+rp_k)/(q_{k-1}+rq_k)$ where $r = [(a_k+1)/2]$, that difference is always $\pm 1$!!! If you still remember that $p_kq_{k+1}-q_kp_{k+1}$ is also $\pm 1$ you know these two fractions are really as close as they can get.

After few hours of number theory revision plus few hours of coding I finally got the right answer, only to find that Sage has the full packet to automate all these...

Friday 21 June 2024

Geometric sum as functional inverse (2): the backstory

My latest contribution about using geometric sum as inverses is surprisingly well-received, so I decided to give it a sequel.

The immediate question is: are we reinventing Maclaurin series? Is this even possible without differentiation? Well, not really. If we decompose $f$ into power series then we will see precisely what happened. If $f = \sum a_k x^k$ then $a_i = D^i[\sum a_k x^k](0)/i!$ where $D$ is the differential operator since I am too lazy to write fractions right? We can recover this by calculating $M^i[D^i[\sum a_k x^k](0)]$:
$M^i[D^i[\sum a_k x^k](0)] = M_i[i!a_i] = a_ix^i$.
In other words, we are just representing the $x^i$-term in forms of $a_i'x^i$ instead of constant $a_i$. 

Let us recall the definition of the (properly defined) operator $M$: $Mf(x) = \int ^x_0 f(u)du$.

The miracle of the exponential series actually happens because $M^i[1]$ is precisely the $x^i$-term of the exponential series. Clearly this is not guaranteed because $M^i[g] = a_ix^i$ for all $i$ iff $g$ is a constant. Sadly $g$ is pre-determined by the function $f$ we use.

Consider $f = \log (1+x)$. It looks bad further away from zero but it should be fine in an neighborhood around zero. Suppose we wave our hand say from $(I-M)f = g$ we obtain $f = (\sum M^k)g$, we want to check what's $M^kg$ exactly.

First we compute $g = (I-M)f = x(1-\log(1+x))$. Not good-looking but okay. Next we compute $Mg$:
$Mg = \frac{1}{2}(x^2+\frac{1}{2}x(x-2)-(x^2-1)\log(1+x))$
according to Wolframalpha. Soon you realized the fact that $M^kg$ will never be the k-th term of the Maclaurin series because the log term will never go away. 

Everything we argued in $L^{\infty}$ still works, even for this nasty $f$! Each $M^kg$ is a proper function and the sum indeed converges to $f$ uniformly if we restrict the domain to $[-\varepsilon, \varepsilon]$ for some small $\varepsilon \in (0,1)$. However, it simply does not coincides with the Maclaurin series because $g$ is not a constant. When is $g$ a constant, or rather, when is $(I-M)f$ a constant? Differential equation $y-y' = 0$ has solution $ce^x$ and is the only instance that fits all our speculation. 

What a pity.

You then asked: what about the trig example at the end? Why are we recovering the Maclaurin series like that even when the theory already crumbled?

To clarify let us repeat the above again: $g = (I-M)[\sin] = \sin x + \cos x -1$. 
$Mg = \sin x - \cos x + 1 -x$.
$M^2g = -\sin x - \cos x + 1 + x - x^2/2$.

Well...an oscillation? Nope. This is clear if you plot them on a graph.
Red is $g$, blue is $Mg$ and green is $M^2g$. As predicted by the norm of $M$, the $L^{\infty}$-norm do shrink exponentially which gives us the convergence. 

We can also prove convergence in analytically: note that when k is 3 mod 4, all the trigonometric terms in $(I+\ldots + M^k)g$ vanished, leaving the Maclaurin series of $\sin x$ up to degree of $4k-1$. For other k's can be sandwiched using the fact that $\| M^k g\|_{\infty}$ shrinks exponentially within the interval [-1,1] (or something slightly better than (-1,1) -- since $g$ is not a constant but that is no big deal). Again, we wanted more that convergence, and $\sin x$ is not good enough for that because $g$ is not a constant.

The last hope lingers on the fact that $(\sum (M^4)^k)[x - x^3/6] = \sin x$. Anything special about the trig functions that allows such representation? I will truly leave that to the readers this time. Hint: what is the relation between $\sin x$ and $\exp x$?

Sunday 16 June 2024

Use of geometric sum as inverse in function spaces

When I teach first year linear algebra, one of the main show is the use of diagonalization to simplify matrix computation then to link that with various applications. This is the real point where you feel the real use of these concepts other than making up its geometric meaning. Cayley-Hamilton, characteristic polynomial for recurrence sequences, Markov chain and so on.

To start with, consider a matrix $A$. If $\sum A^k = I + A + A^2 + \ldots $ is well-defined (aka converges) then by geometric series we know it is equal to $(I-A)^{-1}$. If we do the other way round we can compute the geometric series of $I-A$, we will recover the inverse $(I-(I-A))^{-1} = A^{-1}$. The problem is...does the geometric sum converges? This is where the eigenvalues kick in.

For diagonal matrix $D$, $\sum D^k$ converges iff all diagonal entries (aka eigenvalues) are of norm less than or equal 1. If $A$ is diagonalizable we can write $A = PDP^{-1}$, then $(I-A) = P(I-D)P^{-1}$, so we want the diagonal entries of $(I-D)$ to be in $(-1,1)$. Or rather, we want all eigenvalues of $A$ to be in $(0,2)$. But this is a very limited result. Can we do better? Yes!

Notice that we can tweek the eigenvalues by multiplying a constant. Let $c \in \mathbb{R}$ be non-zero so that all eigenvalues of $A/c$ to be in $(0,2)$, then the same procedure may proceed. 

Consider the matrix $A = \begin{bmatrix}5&4\\2&3\end{bmatrix}$ with eigenvalues 1 and 7, we take $c = 8$. Write $A = PDP^{-1}$ then $I - D/c$ has diagonal entries 7/8 and 1/8. Geometric sum gives $\sum (I-D/c)^k = \begin{bmatrix}8&0\\0&8/7\end{bmatrix}$. 

Plugging back we have $P(\sum (I-D/c)^k)P^{-1} = \frac{8}{7}\begin{bmatrix}3&-4\\-2&5\end{bmatrix}$, which is precisely $cA^{-1}$. Don't forget that $(A/c)^{-1} =cA^{-1}\neq A^{-1}/c$!

*

Now let us forget about linear algebra and just the a step forward. How about using geometric sum in function spaces? 

We start with a very vague idea. Let $M$ be the integral operator, then we obviously obtain $(I-M)[\exp] = -C$ where $C$ arises from the integral. What if $(I-M)$ has inverse $(I+M+M^2+...)$? Let's say $M[-C] = -Cx + C_1$, then $M^2[-C] = -Cx^2/2 + C_1x + C_2$, and so on. One obtains $(\sum M^k)[-C] = (-C+\sum C_i)(1 + x + x^2/2 + \ldots)$, and checking $x = 0$ gives $-C + \sum C_i = 1$. This is such a lucid dream because we just proved the series $e^x = \sum x^k/k!$ without differentiation at all!!

But wait, there are too many technical problems in the above argument to the point if I read that from my student's assignment I would have torn that paper into pieces. The gaps are not fatal, but we have to be very, very careful.

The biggest problem of all, is that integral operator in that arbitrary form, is likely unbounded. We must sacrifice our dream of working over the reals and work in $[0,N]$ instead. Since $N$ is arbitrary we can extend that basically to the reals, at least pointwisely.

In a bounded interval $L^p$ spaces work similarly but the best is $L^{\infty}$ because we immediately give the norm of the integral operator. Define $M_0:L^{\infty}([0,N]) \to L^{\infty}([0,N])$ by $M_0f(x) = \int ^x_0f(u)du$. This is well defined because $M_0f(x) \leq N\|f\|_{\infty}$, and it also implies that $\|M_0\| = N$ (the equality is obvious by taking $f = 1$). 

We solved unboundedness but we need $\| M_0 \|<1$ for that to work. What should we do now? Well, just copy what we did to matrices. Define $M = M_0/(N+1)$ and try again: $M[e^x] = (e^x-1)/(N+1)$ but $I[e^x] = e^x$...they don't add up. It turns out that tweeking $M$ won't suffice because a tweek to the function is necessary as well. 

Let $f = e^{x/(N+1)}$, then $Mf = (e^{x/(N+1)} - 1)$ and so $(I-M)f = 1$. Since $\|M\| < 1$, $\sum M^k$ is a well-defined operator and is the inverse of $(I-M)$. Therefore $f = (\sum M^k)[1]$. Induction gives $M^k[1](x) = (x/(N+1))^k/k!$, so $e^{x/(N+1)} = \sum (x/(N+1))^k/k!$ uniformly in $[0,N]$. A substitution recovers the desired series for $e^x$.

Many would think the job is done here but this is a false sense of security. Not an easy gap to spot -- including me when I first drafted. We established the equality in $L^{\infty}$ or uniformly, but in the pointwise sense this is only almost everywhere. The good news is we do not have to worry about the trap of generating the series using the series anymore so many more tools are now available like continuity. Since both the exponential function and the series are continuous everywhere, the two expressions are indeed equal everywhere.

*

I believe this is the first time I write about functional analysis here but this is actually my expertise. These are absolutely not what one would encounter in research, but very fun to share across all levels of undergraduate students.

The idea of $\sum A^k = (1-A)^{-1}$ is the cornerstone of spectral theory because when we look at the spectrum we want to see the set where $A-\lambda I$ is invertible, or equivalently where $I - A/\lambda$ is invertible. The fact that geometric series converges says that when $A$ is small enough to $I$ then $(I-A)$ is always invertible which gives us lots of information about the spectrum.

Of course, knowing that $(I-A)$ is invertible for small $A$ says nothing about any other operators in the space. In particular, it says nothing about invertibility of operators that fails the condition for geometric sum.

This is particularly clear when we revisit the matrix example. I claim that the above method applies to all diagonalizable matrices where all eigenvalues are of the same sign (and are non-zero) (proof: left to reader). But that left out almost all diagonalizable matrices where eigenvalues contain both signs. Are they not invertible? Absolutely not! The moment they are diagonalizable with non-zero eigenvalues they are straightaway invertible, just that this particular approach doesn't apply.

One problem to the readers: my proof above relies on the fact that $e^0 = 1$. Can you recreate the proof for when $x < 0$? 

We finish with an extra food for thought: one computes $M^4[\sin](x) = \sin x - x + x^3/6$, so $(I-M^4)[\sin](x) = x - x^3/6$. You already see the pattern: $(\sum (M^4)^k)[x - x^3/6]$ is precisely the series for $\sin x$. (Technical problems like convergence and range is easy like above so we left them to readers as well.) Can we do the same for any differential equation solution like functions? Does it depend on the radius of convergence? Or...does it apply to all $C^{\infty}$ functions? How interesting...

Tuesday 11 June 2024

DDR, the 10th year

Another "10th year" among many others that I have already wrote about on my blog, but DDR is different. This is something that I consistently keep track of, and progress is really made over the time. 

Since when did I started playing DDR? A local Osu! player lured me to try so when we were in arcade where I played Taiko (up to 9* and a few 10*s at the 11th gen cab) and jubeat (level 10 in Copious). That was a X3 vs X2 mix cab with a very beginner friendly mode where everything is limited to 1-5 "foot level" (the old foot level was already scrapped).

Like everyone else, I started with failing these easiest levels. But soon I started passing level 8 (I believe) consistently, then level 10 then 12, a year later 14. This arcade closed around the end of 2015 Q3, but this is a story for another day but you would probably have seen mentions of the exact same arcade somewhere else in the blog.

The exact time where I started couldn't be dated precisely, but surely it's well before the second half of 2014 because I had evidence of me playing up to level 13-14 at the time already. In many occasions I wrote the (N-2014)th year in the title but actually it's the (N-2014+1)th year of me playing DDR. 

I ended up settled in new arcades. The first one had an ITG but soon disgustingly replaced that by PIU, so I ordered soft pad and practice with stepmania at home until another arcade opened where I stayed for a long time.

Here are some snapshots:
2016, the 2nd year: doing well at level 14, attempts on MAX300 the gateway to advanced level
2017, the 3rd year: doing well mostly at level 16 and a few 17, almost passed 嘆きの樹 ESP
2018, the 4th year: doing well at level 17 and a few 18, 500 credits since DDR2014 (not accounting X3 and 2013 ver.)
2019, the 5th year(in Chinese): doing well at level 18 except OTP and ENDYMION, able to do 18 set, looking to clear 19
2022, the 8th year: Fully cleared 18, yet unable to clear at level 19

I was possibly the best in 2019 when my stamina was clearly prepared for 19. After covid I became better in terms of single attempts, but never bested the 2019 self in terms of readiness towards level 19. I had a 750k score at Paranoia Revolution CSP but that was it.

It is a bit sad to say that my performance peak in 2019 aligned with my physical peak in terms of age. You clearly feel that more warmup is necessary in 2024 for similar performance than in 2019. That does not imply I am never going to outperform the 2019 self again, but it requires great and consistent effort. In fact I am quite happy with a few scores obtained in the past year. Such score on a shitty 2nd gen cab means I can easily get an A in a proper cab. 

And I am back ready for the challenge.

Reason behind that? Well firstly I am done with Chunithm. Not because the game was poor, but I already had my potential fully exhausted in reaching 16.00. I can probably go slightly higher if I try really, really hard but it is not very meaningful. Chunithm is known for its vast song collection, but after playing the game intensively for a year the collection lost its attraction. Yes SEGA is still adding new songs in mass frequently, but I do not play the game just for the song and DDR is still my top choice.

Another reason is...I finally have my hands on a gold cab that I can play in long term. Not when I travel to Japan where I had to travel between two golden league seasons just for the golden league scores so that I can unlock those course trials. But a gold cab that I can play any day I desired.


I do not see myself getting close to my 2022 let alone 2019 performance for now. It takes time to get used with gold cabs with the lowered pads. When I did course trials in the past that was in survival mode, or travelling mode at the least where I don't try hard for scores. In general practice though that should be something worth minimal care. Gold cab speed is also significantly different: 450BOOST becomes 400BOOST -- at least I know I can only read 350BOOST on 2nd gen cab not because I lost my reading ability. 

My first goal is to resume my practice of 7 credits a session, then clear most 18s. Many new gimmicky/almost 19/plain bullshit level 18 maps had been added these years, and let us see how my plan goes! We will meet again in this record in a year.

For the record before Konami scraps the site again:

X2-2014 + ITG: ~500 plays without hard evidence
A~A20plus: 1993 plays
A3: 226 plays up to 09/06/2024.