Saturday 28 November 2020

Thoughts on Simon Marais 2020

Simon Marais 2020 was held on October 10, which is long ago. Due to many coincidences I and my affiliated school missed the chance to take part in the competition. At the same time I didn't have the chance to review the questions until now, so this is a very brief and irresponsible review on the exam this year.

A1: closed curve and stupid pigeonhole. 

A2: clear once you figure out how the piling interacts each other, especially when $k \mid n$ or not.

A3: nicely formulated question, although the solution reduces drastically to a bound instead of some fancy sets. The beauty of the solution lies on fact that this sum can be optimized greedily. Locally this is simply year 1 differentiation, and you will reckon that a specific geometric series will do the job.

A4: yeah it sounds fancy and intimidating, but someone may as well brute force all the way with coordinate geometry techniques. I do not see that to be more difficult than IMO-level questions that can also be defeated using co-geom brute force techniques.

B1: this is more like assignment question...not even interested to do that. Question of such depth should not even appear in these contests.

B2: oh unit fractions. This is also a nicely written question. The solution lies on the fact that you can order arrangements in $S_k$ such that raising the sum means a descent in order, which means that it does not go forever.

B3: this is the kind of question that I do not like, where you either know one trick that you easily solves the question, or you have no chance at all. We have had enough cat and mouse questions.

B4: Again, possible (a) and impossible (b).

We see that the difficulty has been pretty consistent in the past years, but we have yet to observe more abstractly formulated questions as in the Putnam exam. Questions that can be formulated rather easily, but the solution requires further thoughts. Again I strongly recommend the removal of Q1, and extend the exam into 6 questions with a more approachable difficulty ladder.

Oh well, I have had a nice afternoon solving these problems.

Tuesday 17 November 2020





Wednesday 4 November 2020

On the day of the 2020 US Election

 Result as of 6:07 ET.

It is now 11PM ET. Trump is looking good towards his second term, and regardless of that, the turnouts are clearly in favor of him relative to the mainstream media polls.

The question is, where is the error coming from?

I write lots of things on my blog, including politics. However in this article I really wanted to look into the statistical aspect of the problem, instead of who is more correct politically or is more capable of making America great again. This article is not meant to be extensive -- I am sure there are more statisticians or data scientists who are more capable of giving precise figures than I do. This is a record of my own observation. 

The traditional polls

In traditional statistics there are two stages in the process -- you first collect the data, then you do the inference.

When collecting the data there are two roles of course, the data collector and the samples.

On the data collector side things could be biased based on how the poll is done. These includes designing leading questions, or hinting the presupposition during the poll. While I do not have clear evidence for these -- well I did not look into the polls -- we have another potential sampling error here. That is, to sample from an unrepresentative pool.

There are a few categories where polls may look into. They either take samples from adults, registered voters or likely voters. Likely voters of course have a (much) higher chance of voting hence are more influential. A number of mainstream media polls focused on adults of registered voters only, and that seemed to be dems biased -- or maybe because they know that collecting data in this way could be biased in the favorable way so that they decided to take such method?

Responses from the crowd certainly heavily affect the outcome. First the weighing might shift from what happened in 2016 or even 2018, especially given the volatile political environment right now. Even assuming that to be constant, the big problem comes from the 2020 version of the Bradley effect -- how many voters are the so-called shy-Trump-voters who are not willing to express their opinion in the polls? People are skeptical about that, while big data and early results say otherwise. Bradley effect is clearly taking in place.

Now we look into the inference part. If we believe that the poll makers will follow the moral of statistical inference (which is a bold statement to make from what's been observed), then what could go wrong?

It's mainly about the margin of error [] -- when you give a confidence interval on the lead instead of of the poll percentage, the margin of error is doubled. That's because whatever that does not go to one party -- let's assume that the liberal party...or Kayne West are negligible -- goes to the other, creating a doubled difference. 

Alternative methods

There are a few polls who tried new methods of investigating shy voters. Trafalgar group adopted a mixed method [] in order to prevent the "social desirability gap". Democracy Institute tries to figure out the true preferences of the voters by asking extra questions like whether or not they think Trump will win, or who they think their neighbors would vote for. Although Project538 apparently does not like the method, but traditional methods are off as far as we observed. That may as well open a whole new area on studying how to obtain the true data when interviewees intentionally hides their true preferences. 

Of course there are models too. These models in nature aimed to figure out the willingness of people voting for particular candidates without really making a poll. These include the famous Primary model []. Others include a delayed correlation on media noise and so on. Some of these are also digitalized and measured on online platforms. 

Interestingly these are in general more favorable towards Trump -- some even gave predictions that are too good to be true. For example the Primary model predicted 362 -- while 300-plus isn't impossible, anything above 320 seems very unlikely. The thing is that the electoral college is not a smooth scale. Not only that the electoral votes jump according to the votes assigned to the states, it's also because states beyond 320 are all deep blue. The chance to flip any of them would be exponentially harder than the swing states. While the models are designed to reflect who will win, they may not extrapolate to landslide victories.

Models that are not done state-wise is certainly having similar problems. They failed to distinguish what happened in different states, and that could produce a huge difference when it comes to votes predictions.

One final thought

The unusually high voting rate is special for this election. Given the same poll result, the outcome with a lower voting rate will certainly be different with the outcome with a high voting rate. That is because the group of people on the edge of going or not going to vote, is also not scaled smoothly. 

This is hugely different from elections in Hong Kong, where we may comfortably assume that pro-government voters are fully utilized regardless of the overall willingness to vote. Thus any extra votes will be heavily biased to the pro-democracy side. 

Such assumption is false for the US election, because the voter composition of both parties are highly sophisticated and dynamic. We can easily give numerous reason on why the extra votes would be biased on one party or the other. 

For example, Biden supporters may say that extra votes are more likely for the dems. That's because we are observing a historical high on absentee ballots, which contains surely a lot more dem votes. 

On the other hand Trump supporters are justified to believe that the extra votes are in favour of the reps: the rally showed that Trump supporters are more active and more motivated to vote. One also raised interesting observations that covid which triggered distant learning, which reduced peer pressure from college friends who are in general leaning to the dems.

If we keep going deeper, how does the the variation depends on the default stance of the counties (or, as shown on the polls)? Or variation against county population, income, age distribution and so on? 

No matter what the answer is, we will learn much from exit polls of the current election. Together with all the new polling methods and models, there are too much for us to investigate, scientifically.

6:00 AM ET 4/11/2020 (yes, 7 hours after I started writing this, because the live feeds are overwhelmingly interesting to watch)

*At the moment WI, NV and AZ are in extremely close match. Oh this election is so interesting...

Sunday 1 November 2020

01/11/2020: 秋番簡評



雖然本季作品數異常地高,但卻沒有發生紫羅蘭VS國家隊VS PPTP這種神仙打架的局面。本季的巨無霸當屬地城和魔劣兩部都是成名已久兼為續作,粉絲自然不會吵起來。在地城和魔劣以下的實力派亦有一大堆,每部都有自己獨特的賣點,大家專心看番也不用吵誰才是本季霸權。以下就由我來推介幾部本季有趣的番,順便說一下看了兩三集後的感想。


催眠麥克風 3/12集 7.5/10






全員惡玉 4/12集 7/10





總之就是很可愛 4/12集 8.5/10分





憂國的莫里亞蒂 4/12集 7.8/10


比照套路,主角當然要是高富帥,身邊再帶一兩個助手(華生),男女不拘。女的話必然是女主然後走戀愛路線,比如京都寺町三條商店街的福爾摩斯。(題外話,這部作品的手遊根本就是Candy crush,而且吃電吃的不行……)

然後要選一個時代的話果然還是1900年代作為世界中心的倫敦最好。處於歐洲文明黃金時代的世界中心可以滿足任何價值衝突的背景,比如封建與資本、法治和人治……像大逆轉裁判甚至是Princess Principal那樣半架空兼蒸汽龐克的動畫都以那個時代的倫敦為參照背景。




要在前幾集就斷定動畫劇情好壞很難,不過這部番的分鏡與特效處理跟全員惡玉一樣有特別下功夫,OP畠中祐的Dying Wish也是我喜歡的半古典歌劇風,而且主角還是有點病態的帥哥。光是這樣就值得追下去了吧?

成神之日 4/12集 8/10分

麻枝准終於把注意力從護士小姐姐那邊拉回來了。這次是他收下大家對他Charlotte的指教後第一部大作,是反省還是報復,尚未可知。說起來Charlotte也是Angel Beats後的報復……



不得不說,麻枝准在輕鬆搞笑方面真的沒讓人失望過。棒球那不存在的第四好球、拉麵的味精層次都讓人笑個不停,只是麻雀新役種的話我笑完頭還會開始痛起來。這種一集一個主題的日常效果真的很好,但別忘記Angel Beats與Charlotte分出高下正正在於進入胃痛部分時劇情會不會暴走。大河內一樓的Princess Principal之所以完滿結束是因為最後兩集編劇換成別人了,但是是Key社和麻枝的話當然沒法換人。所以甚麼時候胃痛,是好看的胃痛還是難看的胃痛,全在他的一念之間。





01.11.2020 滋賀