Monday, 5 April 2021

r/second: the Reddit April Fool 2021

Every year on the first of April, Reddit hosts a "social experiment" which is nothing more than a time limited inspiring minigame. One of the greatest example is the r/place which is truly impactful as a precious snapshot of the Internet on April 2017.

Many abandoned the tradition of making something funny for the day on 2020, but fortunately Reddit held on and again delivering something simple but funny this year. Introducing the r/second


This is a simple game: each round consists of three pictures sampled online including memes, vintage games, among us mates or even simple words. People would vote for one of the three, but only the one ranked the second would be considered victorious. 

There are 3 phases in total. Mid-term results are revealed during the two transitions, giving hints on which one to vote for. That of course comes with a penalty. 

Those who voted in the first phase gets +9 for a correct answer and -3 for an incorrect answer.
Those who voted in the second phase gets +6/-2.
Those who voted in the third phase gets +3/-1.

Unlike those redditors who eat chalks and murmur about tendies (you pretend that you don't but actually you do, especially those with diamond hands), you actually smelled a way to score effectively. What would you do?

The correct answer is: you should find a time machine and travel back to 96 hours ago, because the event has already ended. But let's assume that it hasn't, how can we perform well in this event?

Assume the following. 

1) The three images are indistinguishable. 
2) The population's strategy remains constant.

The first one is actually reasonable even to those who knows memes well because of the dynamics. Even if you can tell which is more popular, comparing the three is a completely different story.

A quick glance at the leaderboard reveals that top players score around 4.7 points per win. Points gained per game is actually a hidden stat, but assuming that he only goes for phase two then he has been correct for about 80% of the times. That says, his average point per game participated is around 3.7. Can we get close to this number?

Attempt 1: Blind guessing

Well if you decided to vote every time on phase 1, then according to the assumption the only thing you can do is to guess blindly. That gives a 33% correct rate and an average of 1 per game. Not bad, but we can definitely do better.

Attempt 2: Blind guessing, no middle

Someone took record of the first 500 games and found that the middle come second at only 20% of the times in contrast to 40% for left and right. If we guess blindly without middle the correct rate is raised to 40% and the average point per game would be 1.8, still far away from optimal.

Don't forget that skipping a particular round is actually a viable option, but only when you are not guessing on phase one because guessing on phase one this round or next round are indifferent according to our assumption. But what if we start to take mid-term results into consideration?

We shall note that the target average score implies that guessing on phase three is inefficient, so we now consider strategy on phase two.

We observe that when mid-term are released, people vote for both the first and second. They vote for second of course for free win, but they also vote for the first in hope that the second would overshoot. One certain thing is that no one is voting the third unless the three are close.

Attempt 3: Guessing first or second randomly

That gives a correct rate of almost 50% and an average of 2 per game. Can we do even better?

We assumed that the population's reaction to mid-terms is consistent. This is actually in line with the observation. We may therefore apply kernel estimates to create a map $f(x_1,x_2) = p$ meaning that when the mid-term reveals that the first has $x_1$ of the votes and the second has $x_2$ of votes then there is a chance $p$ for the first to win (i.e. for the first to come second at the end). 

Note that it is the percentage difference that matters: dynamics from a 45%-40%-15% distribution should be almost the same as a 40%-35%-25% distribution. So we may compress the data into a single parameter distribution: $f(x_1-x_2) = p$ when the percentage difference is $x_1-x_2$ then there is a chance $p$ for the first to win. 

Attempt 4: KDE prediction + skips

Although I never traced the numbers seriously, it can be observed that when the first leads by a large margin (say 10%), the second is almost certain to win despite everyone voting for it after mid-term. We expect the function about is quite close to 0 or 1 at the tails. We can then maximize the integral

$R(I) = \left ( \int_I g_X(x) \right )^{-1}\int _I g_X(x)(6\max(f(x), (1-f(x)))-2\min (f(x), (1-f(x))))dx $

by choosing the interval $I$ properly. The above may simplify into 

$R(I) = 2 + 4 \left ( \int_I g_X(x) \right )^{-1}\int _I g_X(x) |1-2f(x)| dx = 2+4E(|1-2X| \mid X\in I) \geq 2$, 

where $g_X$ is the probability density for the parameter $x_1-x_2$. Since $R(I)\geq 2$, this strategy is strictly better than attempt 3, and this is a strong indicator that the mid-term results are extremely useful. If we smoothen $f$ and $g_X$ so that it is differentiable (in the computational sense) then this is just a simple exercise of calculus.

There is a problem though. If we take $I$ a very slim set then we win almost every time but voting may only happen for every 5 or 10 games, but top players win 55% out of all games! That means they probably have voted for at least 75% of the available games (not taking sleeping time into account even). 

Instead one may want to maximize the score gained per game, whether you skipped or not. This is simply the maximization of score! Since the expectation is always positive for any approaches mentioned above, what you should do in this case is to participate all games and always vote in phase two. When $f(x)$ is close to 1 or 0, vote for the indicated one, and otherwise vote randomly between first and second. (Or if you trust your function $f$, always follow the indicator, since your correct rate should always be above 50%.) 

Unfortunately I never collected the mid-terms and I don't notice anyone else who did. All that was left is the final result of the rounds, which do not help too much. Yet I believe this approach will be good enough for us to match the top rankers.

One last note: the mid-terms are not fixed snapshots. You actually get to see the votes in live for a few seconds. I do not think that helps much however as the dynamics largely depend on percentage difference of the round, hence nothing significant can be extracted out there. 

*

Let's be honest -- there are many more valuable things we may do if we had the chance to travel back to 1/4. r/second is no match against r/place after all. This is merely a little game that expires so fast. One better thing to do on 1/4 would be watching Your lie in April in my opinion. Yes?

No comments:

Post a comment