Friday 26 April 2024

Cumulative density of binomial distributions

Not really a math article but more like a diary here, something to share related to estimating binomial distributions.

How would you approach CLT in a stat 101 course? It came up when a friend of mine was prepping the  coming semester. I still prefer the probabilistic approach. That is, from Chebyshev inequality to laws of large numbers(LLN) then to central limit theorem. Counter opinion says that there is a gap between LLN and CLT: LLN says nothing about how the distribution looks like but CLT says the distribution looks normal. You don't see the reason behind until you start looking into the proof that uses moments...and you don't do that in a 101 course.

Instead the other common approach is to demonstrate via binomial distribution making use of its simple additivity (the other great example being Poisson as it's easily additive as well). Using Stirling's approximation you can also observe the tail shrinking like the normal density. Although Stirling's approximation could also be too hard for a 101 course?...I don't know.

But my encounter with binomial distributions does not stop here. I recently helped another friend of mine up to a task related to cumulative density of some binomial distributions. More precisely I am looking for $P_k = P(X_k \geq n-k)$ where $X_k \sim B(n,f(k,p))$, then I am looking into properties of the series $(P_k)$. It turns out that despite a varying probability parameter, $P_k$ looks really similar across $k$ which poses me great difficulty when I try to estimate them.

The first thing that comes to my mind is to use those traditional bound. But soon you realize that bounds like Chernoff bound does not work very well out of the mean. Berry Esseen bounds the error to the order of $O((npq)^{-1/2})$. Worse is that this bound is tight in the sense that the output of binomial distributions are integers, corresponds to a jump in $C' \cdot \sqrt{npq}$ so this is the best such estimate can do. 

The error in the estimation can easily be seen in numerical examples. Suppose we flip 6 fair coins and we want to know the chance of getting 4 or above heads. The answer is surely 22/64 or about 0.344, but what if we try to estimate by normal distribution? $B(6,0.5)$ looks like $X \sim Z(3,1.5)$, and $P(X \geq 4) \approx P(Z \geq 0.816) \approx 0.207$, but this is far from our precise answer. It turns out that you should take 3.5 instead of 4, assuming that the chance of getting 4 heads is about $P(3.5 < Z < 4.5)$ instead of $P(4 < Z < 5)$. 

Why is it 0.5 but not 0.4 or 0.6? Well, a way to realize this is to assume big $n$ so everything work smoothly. In such case even the binomial density is rather linear locally so 0.5 is a fair assumption. Of course that is before we start arguing that even locally the binomial density looks exponential rather than linear as the ratio $f(n,k+1,p) = f(n,k,p) \times \frac{n-k}{k+1} (1-p)/p$ is locally constant if $n$ is large. Is there a second order correction that we can apply for a more estimate? What's the error in that case?

It took me so much time before I truly realize the error from normal density estimation is too big to be useful regardless. The error in a general estimation is probably of the form $O((npq)^{d/2})$, but it is never enough if $|P_k-P_{k-1}| = exp(-O(1)n^2)$...

At the end I just use the good old combinatorial approach. Interpret $X_k$ as a separate coin flip of $kN$ coins with a suitable probability $f'(k,p)$. Then I can argue by counting the relation across $P_k$.

It's an adventure, but nothing too exotic at the end. The most funny thing, I believe, is the discovery of an 150 page PhD thesis improving the Berry-Essen bound on binomial distribution and detailing it's uses! Not going to post the link but it should be easily found on the net ;).

Wednesday 10 April 2024

16.00 dissertation round 2

Remember the 16.00 dissertation I made last time?


Well, I took a break trying to make up my mind on whether going back to DDR or staying with Chunithm. Then I found out DDR had not updated for months possibly waiting for a full version upgrade so the answer is clear. Oh come on Konami...

I had been chasing for rating for the past 2-3 months so I decided to play according to my flavor for a while. I then noticed that I performed much better in most aspects other than just scoring better over the best performed maps. This is clear when you try some older maps with bad score and suddenly grind a superb score out of it -- some even got to the best 30 which we will see below.

Luminous came with new maps and new difficulty ratings. Surprisingly SEGA is pretty generous in their difficulty rating. LAMIA for example is promoted from 13.9 to 14.1, but it never felt like a 14 and surely not 14.1. It's not fast and the patterns are commonly seen among those expert 13.8-13.9. There are downgrades that I disagree (e.g. 蒼穹舞楽 to 14.3) though, too. 

With new songs and generous ratings I got myself to 16.00, this time both on best 30 and recent 10.  Since it's likely that I will never reach 16.25 (?) I decided to write once more to go through maps that newly made into my best 30. Let's start shall we?

#28 Titania (14.8, 1000315, 15.83)
One of the maps with a poor score from the past and a recent play gives 998k+ easily. Once you can cooperate between stream and rapid air notes you should be able to afford some misses in the streams and still end with 1000k+.

#21 Sage (14.6, 1003342, 15.93)
Ohhhh Camelia. This is one of many keyboard heavy new maps and I believe I can score much higher except my motivation leaks out after finding out this is merely a 14.6 -- I need a 1005/6k+ score to gain rating significantly but this is not trivial.

#20 Parousia (14.4, 1005201, 15.94FC)
An eventful song in my osu! career as described in my tweet. This FC is just the result of putting every pieces of the puzzle together in a coincidence. Gotta love that.

#16 祈 -我ら神祖と共に歩む者なり- (14.5, 1004551, 15.95)
When I first played the song its BPM is out of my reach where hitting randomly would be easier than to hit accurately. With better exposure at similar tempo and level this is now routine. 

#15 Vibes 2k20 (14.5, 1004638, 15.96)
A song of my taste! But the rhythm is not 1/3 based (like owl*tree) but 1/4 based, so it's actually very quick and demanding. I would love to use it as a warm up, but 1004k is purely accidental.

#14 Giselle (14.9, 1000794, 15.97)
Another keyboard type high-14+ that I have 'unlocked' out of nowhere. Given the high rating I actually want to grind this hard. If I am to try 16.25 this will definitely be on the list (same with 小悪魔の遊園地, another similar 14.9).

#13 ENDYMION (14.4, 1005410, 15.98FC)
How could you miss that as a DDR veteran? As mentioned above BPM222 became routine so this FC is natural to me. I am also going for 995k+ on master but this is just not enough in terms of rating.

#11 Jade Star (14.2, 1006742, 16.04FC)
The only in the original list that makes it here because this is a great song and the FC. Just like Parousia you know a good score will arrive eventually, just not sure when.

#7 Jakarta PROGRESSION (14.2, 1007025, 16.10FC)
How is this a 14.2? Perhaps 1/8 notes look hard but plays reasonably. Never had a 14.2 where I FC on first try and then got a SSS WITH MISS. If I try harder that's probably a 1009k.


#4 LAMIA (14.1, 1008415, 16.19)
The idea of one hand taking a constant beat and another hand on separate rhythm is nothing new, but this is particularly easy. Maybe the streams at the end is worth a 14.1 but is absolutely easy for a 16.00 player.

With a proper 16.00 right now I think I will truly take a break from Chunithm, partially for a break but also to avoid waves of players appearing out of nowhere just to try out the new version.

I want to mention another song: エータ・ベータ・イータ. The idea of vertical streams (streams of repeated notes) fits so well not only with the created map but also the song itself. This is the second song that I had the urge to map it in Osu! upon listening on the first try after のぼれ!すすめ!高い塔. Considering my Cirno 9th is mapped in 2020 and yet to be pushed for rank in 2024, mapping another map for rank is a distant goal...