Friday, 11 June 2021

Some optimization on the FEH VG predictors

Voting Gauntlets in FEH is always controversial in many ways. In terms of outcome some hates to see popular characters always triumph over ordinary characters while the rest complains how the result is unpredictable and favors the chasing side by so much. In terms of reward some players are unhappy about the lack of rewards -- well actually 12 orbs is a lot, but the feathers are also very friendly to new players (and even me back in the days). In terms of difficulty, some found that playing with 3 random characters is quite fun, but some say they have terrible luck and facing 3 fallen Edgelord is bullshit.

But today I want to talk about mathematics and not the game mode itself. How can we predict the outcome given the first few/12/24 hours of data? Certainly there are a few attempts already: on Reddit there are a few predictors on the West and also one from Japan. I found that the interface of the Japan predictor is pretty nice, despite that the prediction is sometimes off. 

In the past I have talked about VG in the sense of a multiplayer game -- in the game theory sense, but this article is doing the complete opposite. We assume that the reaction is fixed under some unknown parameters, and the goal is to build a model out of that.

I do not plan to build a predictor by myself. It takes lots of time and does not benefit one so much ingame: in terms of ranks it makes no difference if the bonus hour shifts as everyone has the same bonus time. You can almost always get the highest reward by not missing the bonus hours in the last 20-24 hours, which can be done using VG bots. The prediction for the last few games where final result matters, can be predicted fairly accurately by most models anyway. 

The reason I wanted to write this is because there are a few things that I spotted that are relevant but they were not accounted in existing models, so it serves more as an investigation.

For starters, these are what you need to know: (FEH and negligible details are cropped, just to give a sufficient model here)
- Two teams undergo a head-to-head battle over 44 hours.
- Every player has a voting gauge which recovers by 1 vote per hour and is capped by 8. 
- Every player has 2000 flags which they can spend over the battles. One may spend a maximum of 100 flags per vote that they applied. With N flags applied the score is multiplied by N. For example if one spends 8 votes with 800 flags then the score is multiplied by 800. If no flags were spent the multiplier is 1 per vote.
- There are two multipliers: the normal multiplier starts from 1.1x and increase by 0.05x per hour. The bonus multiplier starts from 3.4x and increase by 0.2x per hour.
- The score is updated every hour. If a team is 1% more than the other, bonus multiplier will be triggered for the weaker team during the hour. 
- The team with higher score at the end wins.

Score normalization

First of all, we know that the score is not growing linearly and we need a way to normalize them for a time-invariant comparison. A clear choice would be the direct score ratio between the two teams. This is a very intuitive choice which also hooks with the bonus trigger, but the accumulated score certainly affects the velocity of this indicator over time.

Another choice is to divide the score by the multiplier, whether it's the ordinary or bonus multiplier does not matter too much because that is just a constant scaling (almost). The problem is that player's VG activity isn't constant either: higher multipliers are expected at the end so they prefer to spend flags towards the end. With flags around the score obtained by not spending flag is basically negligible. We need to capture when people spend flags.

With everything being non-linear it is so hard to decide the right exponent, so I decided to look at the accumulated score instead. It is natural to assume that players' activity is non-decreasing in general, then their points gained per hour, after enlarged by the linearly growing multiplier, is at least linear. As a result, the accumulated points are at least quadratic, i.e., $\Omega (t^2)$. 

Assume that the players' activity -- or the teams' activity as a whole, is of order $O(t^{1+k})$ then the accumulated score will be of order $O(t^{2+k})$. It is not hard to find that the accumulated score is indeed at the order $O(t^{2+k})$ for some small $k$ -- so let us just divide everything by $t^2$ before we look into the parameter $k$. 

Here are the two typical examples taken from VG 2021 June.

Example 1: VG final (F!Corrin vs Klein)

Example 2: VG quarterfinal 4

These are two typical matches in VG: example 1 is when the popularity of one clearly overwhelms the other, while example 2 happens when a team is of significantly higher popularly but not as extreme as example 1.

All the charts are time-normalized by $t^2$ where $t$ is the average of the two multipliers.

In the first chart, the orange and yellow line indicates the time-normalized boundary for bonus multipliers, while the blue line shows the normalized score difference. The second chart indicates the normalized score activity of the first team (a positive score difference means that the first team is leading). 

We expect the normalized player activity should be of order $O(t^{k-1})$, and from here we can estimate $k$. The spikes are when bonus multiplier happens.

We can see that the parameter $k$ clearly varies in different situation. We can assume that $k$ is close to 1 in example 1 while $k$ is clearly much smaller than 1 in example 2. In fact, $k=0.2$ is a pretty good estimate. The $k$ value can also be verified by checking the growth rate of the bonus boundary curve.

We can explain the correlation by how player anticipate the battle towards the end instead of casually spending their flags in the middle. Very interestingly the parameter $k$ seems to be independent of the ratio of player base size: the dominant a team is, the more bonus hours the opposite team will get. So in theory if the parameter is decided by the frequency of bonus hours, then the parameter for the two teams should be different, but that is not the case here. If we plot the activity of both teams on the same chart for example 1, we can see that the parameter for the two teams are more or less equal.

Calculating the parameter $k$ would be extremely helpful because we can then get a normalized data. (And we will cover that in the sequel of this article!)

The three states of players

With the above graph we see that the activity of the players divide into three categories, or three states that we call.

A team is in an excited state if it receives a long-waited bonus hour, or a bonus hours that shoots them to the leading position. The activity clearly spikes for this hour.

A team is in a post-excitement state if the bonus hour in excited state shoots them into leading position with bonus triggered on the opponent then the team enters the post-excitement state where activity is abnormally low because they run out of votes and flags.

A team is in a normal state otherwise. There are more fluctuation within this state depending on the score situation. 

We can plot the same for example 2 which is a lot more chaotic, but the activity is still clearly divided into the three states as described.

The pattern shows that most players aren't playing to optimize the chance to win as a team. Rushing to overtake the opponent early has little to no effect on the final result but looks good in a team sport, while the increased activity when a team enters bonus multiplier is natural as it maximizes score gain (for those who urgently need to spend flags).

I am actually quite surprised that this was not taken into account by most predictors, as it plays an important role at the end where people react to the hourly updated results vigorously. For example the JP predictor predicted a constant downward ping-pong at the end for example 1 (i.e., bonus for second team for 1 hour + no bonus for both teams for 1 hour alternately and these two together results in favor of the second team), but the reaction from the second team at hour 42 should be much more violent as it sends the team into leading position -- and of course the first team hit them back with 10x the power. This is the nature of VG predictor: players react to score but not team result.

Daily variation on player activity

Clearly players do not stay awake 24/7 for these shitty reward (well even if orbs are worth its monetary cost that is merely a burger meal, so it's not worth the time to stay awake overnight), so they stop playing when they are asleep. But people over the world situates in different time zone and they sleep over different time. More importantly, the taste from different part of the world seems to be different. As a result, we may observe higher activity from one team during daytime then higher activity from the other team during nighttime (which could then be daytime for that part of the world).

Assuming that the main division would be Japanese (or Asian) players vs the West, we can divide the players into 3 groups: time-invariant players, Asian players and Western players. We estimate the portion of the three and how supportive they are to each team. This is taken as a scaling factor when we predict the future outcomes. 

And how do we do that? Well this is simple linear algebra -- these three groups of players can be modelled into 3 kinds of waves: constant, sine and cosine waves, orthogonal to each other. We can then apply orthogonal projection to estimate the portion from each of the three. 


I really believe that the three factors together with what we already have around, builds a very accurate VG predictor, but surely no one would waste the time doing that.

To conclude this article let me show the graph plotted for the same VG but semifinal 2, which is ping-pong all the way. I stacked (apology for the poor stacking) the two graphs together so that you can observe the interaction more clearly. Since it's a perfect ping-pong first the first 20 hours or so, you can see that the activities alternates from excited and post-excitement states but not the normal state. Although the waving pattern in the first 20 hours which in fact, also occurred in the first two examples, seems to raise more questions from here...

(To be continued)

Acknowledgements: raw data extracted from the Japanese predictor by @rammtiger_n

No comments:

Post a Comment