Wednesday 7 January 2015

On card drawing problem

Faithful greetings here and wish everyone to do well in 2015. Before I continue to write about my journey, let's consider a game-related modelling problem here.

Recently there are a series of electronic games mainly on mobile platforms that collects various cards for the game system. There can be a lot of variation on the game like tower defence, card battling, action shooting etc. but these aren't really important. They can all characterized to be a card collection game (CCG).

What elements must a card collection game contain? A free way to obtain cards through events, and a paying way to obtain cards, that we call it an invocation. A basic model would contain a card pool where every time a card is chosen, at a certain probability and given to the player.

In order to attract the players to spend further, we can usually find some alternate option to invoke cards that apparently a better option. A typical way is to guarantee a rarer card when the invocation is done in a larger batch. Without a further assumption in the rate of appearance for the rare cards we ought to see which option is better.

In this entry we will simply our model to two types of cards: common and rare cards. A binary model would allow binomial distribution and make everything easier.

Since the number of cards is finite, and of course countable, the following calculations will be done with respect to discrete distributions.

Order statistics

Suppose we have a distribution $X$. Recall that its CDF $F_X$ is given by $F(x) = P(X\leq x) = \sum _{y\leq x} P(X=y)$.

Furthermore let $X_1,X_2,...$ be an independent process $\sim X$. What is the distribution of $Y = min(X_1,...,X_n)$?

This is easy:

$F_Y(x) = P(Y\leq x) = 1 - P(Y\geq x) = 1 - P(X_1\geq x\cap X_2\geq x \cap ... \cap X_n\geq x) = 1 - (1-F_X(x))^n$.

Similarly we want to find $Z = max(X_1,...,X_n)$:

$F_Z(x) = P(Z\leq x) = P(X_1\leq x \cap X_2\leq x \cap ... \cap X_n\leq x) = \prod P(X_i\leq x) = F_X(x)^n$.

We can now deal with the first typical system: the re-invocation.


The idea is that you first draw from the card pool. If the result is not satisfying you may re-invoke. Without loss of generality (oh, of course) we assume the re-invocation assumes the same rate as the first invocation.

First we handle the single case. Let $X$ be a binomial variable where $P(X=1) = p$ (rare cards) and $P(X=0) = 1-p$ (common cards). Let $X'$ be the consequent re-invocation $\sim X$.

Since we only have two (types of) cards, we only re-invoke only when we get a common card. Then the rate of getting rare cards under a re-invocation system is

$P(rare) = P(X=1)+P(X'=1\mid X=0) = p+p(1-p) = p(2-p) = q$.

As we are comparing with a batch of 10 cards, the expactance over 10 times of single process is given by

$E(X_1+...+X_{10}) = \sum _{k=1}^{10}kC^{10}_kq^k(1-q)^{10-k}$

But since this an independent process, it can be simply written as $E(\sum X_i) = \sum E(X_i) = 10E(X) = 10p(2-p)$.

It is a very smooth quadratic curve and there is nothing too special about it. But what if we consider some 'special offer' that comes in a batch?

Let's consider the following offer: you pay 10 times as a single invocation and get 10 cards, one of them is a guaranteed rare card and the rest is rare at a rate of $p$. The re-invocation is done over the whole batch. Let's call it a mega.

It is hard to calculate the expectance because it is hard to say when we need to re-invoke. A simple way is to find the median so if it is lower than the median and we invoke we have larger than 50% to get a better result. However median is hard to calculate for asymmetric binomial distribution is not easy, so we use the mean instead. If we get under $[9p+1]$ rare cards we will re-invoke.

First we find the expectance over a mega without re-invocation:

$E = \sum _{k=0}^9 (k+1)C^9_kp^k(1-p)^{9-k}$

Then using re-invocation:

$E' = E\sum _{k=0}^{[9p]}C^9_kp^k(1-p)^{9-k}+\sum _{k=[9p+1]}^{9}(k+1)C^9_kp^k(1-p)^{9-k}$

Of course we sum over well-defined indices only.

In some game, such offer gives 11 cards (like, Lovelive) instead of 10. A straightforward reason is that single re-invocation works better! One can follow the code at the bottom to plot the following:

Considering the guaranteed rarity is not that high, $p$ is actually not very low, say $p=0.25$. At this rate, making 10 single invocations is considerably better than doing a mega!

Of course we assumed that the rate $p$ for single and mega are equal here. It is always possible for the developers to adjust such rate according to such calculation.

Super-rare cards

Another problem is that we want a super-rare card instead of a rare card. A non-common card maybe rare or super rare and both can possibly be found in the guaranteed slot.

Let $X$ be a discrete distribution that can be 0 (common), 1 (rare) or 2 (super-rare) for common slot and $Y$ can be 1 or 2 for the guaranteed slot. We further assume that

$P(Y=2) = P(X=2\mid X\geq 1)$

To set up the parameter we let $P(X=1) = p$ and $P(Y=2) = r$. For $P(X=2)=q$ we have $q = \frac{pr}{1-r}$.

Now assume the invocation criteria solely depends on the super-rare card (if there is a rare card that we want besides of the super-rare slot, just count that as super-rare too. This is possible in our model). This is the same for single invocation. Let $s = q(2-q)$, the expectance is:

$E = \sum _{k=1}^{10}kC^{10}_ks^k(1-s)^{10-k}$

For mega let the mean be $m = [r+9q]$ then

$E = r + \sum _{k=1}^9 kC^9_kq^k(1-q)^{9-k}$

$E' = r(E\sum _{k=0}^{m-1}C^9_kq^k(1-q)^{9-k}+\sum _{k=m}^9(k+1)C^9_kq^k(1-q)^{9-k})+(1-r)(E\sum_{k=0}^m C^9_kq^k(1-q)^{9-k}+\sum _{k=m+1}^9 kC^9_kq^k(1-q)^{9-k})$

Now let's assume $p+q = 0.3$ and vary $q$ around $0.01$.

As you would expect, it is just like a portion of the first graph, which is locally linear. Furthermore mega is always better than making successive single invocations.

Again as you would expect, the two lines will intersect as $q$ increases further. With $p+q = 0.3$ they intersect at $q \approx 0.0414$ (you can plot yourself for that). That's how mega invocation works: you spent more at one time to get a better rate for the top cards. We don't call the rarity 4% super-rare, do we?


As a last comment reader shall notice that if we didn't assume $\frac{q}{p} = \frac{r}{1-r}$ and let the parameter flowing freely everything can be arbitrarily set up and the calculation does not make much significance. However it is a pretty sensible assumption that allows us to make models and interpolate, if possible, against the real data.

Attached is a section of simple code that directly calculates the expectation using the above formula without any simplification.

#Assumed C(n,r) the combination coefficient function
def single(x,n):
    y = x*(2-x)
    s = 0
    for k in range(n+1):
        s += k*(y**k)*(1-y)**(n-k)*C(n,k)
    return s
def mega(x,n):
    if x == 0:
        return 1
    elif x == 1:
        return n
    p = int((n-1)*x+1)
    s0 = 0
    for k in range(n):
        s0 += (k+1)*x**k*(1-x)**(n-1-k)*C(n-1,k)
    s1 = 0
    for k in range(p):
        s1 += x**k*(1-x)**(n-1-k)*C(n-1,k)
    s1 *= s0
    for k in range(p,n):
        s1 += (k+1)*x**k*(1-x)**(n-1-k)*C(n-1,k)
    return s1
def megaSR(p,r,n):
    q = p*r/(1-r)
    m = int(r+9*q)
    E0 = r
    for k in range(n):
        E0 += k*C(n-1,k)*q**k*(1-q)**(n-1-k)
    t = 0
    for k in range(m):
        t += C(n-1,k)*q**k*(1-q)**(n-1-k)
    t *= E0
    for k in range(m,n):
        t += (k+1)*C(n-1,k)*q**k*(1-q)**(n-1-k)
    E1 = r*t
    t = 0
    for k in range(m+1):
        t += C(n-1,k)*q**k*(1-q)**(n-1-k)
    t *= E0
    for k in range(m+1,n):
        t += k*C(n-1,k)*q**k*(1-q)**(n-1-k)
    return E1 + (1-r)*t


  1. I found this piece interesting, I am a complete dummy in math, so it takes a while for me to understand.
    do you not plan to publish this as a research paper? this could be a specific topic that mathematicians have yet to explore (i assume)

    1. Thanks for your comment. The problem, in my own opinion, have no research value. It is only slightly complicated than the standard probability questions in the textbook, but it is much more interesting because this is something you'd really want to calculate.

      I used to write entries on these interesting little problems and I'm happy to hear that someone is willing to spend some time to understand them. If you have further problems please feel free to ask.