Thursday, 26 December 2013
Christmas songs
I joined a few musical events before the end of 2013 which include the nutcracker performance (for those who follow don't shoot the pianist, you will know that Nov and Dec are always the time for such performance lol) but I also spent some time on reviewing old and new Christmas songs. By chance I found the game BGM Christmas in the 13th month which is a wonderful piece after all. Now I'm enjoying a succeeder of the song, dancing Christmas in the 13th month's remix from sync. art (I guess it's well known for their touhou remixes) so I hope that you will enjoy them too. Well the above one isn't from sync. art but the best performance among all clips in my opinion that the smooth sound from the viola greatly enhance the dancing that mixes well with the lively going background.
Merry Christmas.
Thursday, 19 December 2013
A rare diary
It's long since my last diary written here but this is an important one. As some sort of researcher in the field of computational complexity, I ought to dig the SIMC problem out and make an in depth analysis on it, in particular the complexity classification, and algorithm solving them.
Allow me to brief the problem here.
In the graph, assume every edge has weight 1 that the distance can be calculated accordingly. We also define the distance between edges and vertices. For $e = (u,v) \in E$, $d(e,w) = 0.5 + min(d(u,w), d(v,w))$ and the distance between edges immediately follows. It is clear that the distance measurement is metric. (why?)
For a graph $G = (V,E)$ with two disjoint sets of vertices $V_s, V_c$, define score to be the number of edges closer to $V_c$. i.e. $d(e, V_c) \leq d(e, V_s)$. If the distances are equal it counts as 0.5.
Let's have a function to describe the score: $f(G, V_s, V_c) =~score$.
Instance: $G = (V,E)$ and two disjoint sets of vertices $V_s, V_c \subseteq V$, with some further constraint specified upon the challenge.
Output: Another set $V_{c'}\subseteq V$ that is mutually disjoint with $V_s, V_c$.
Goal: Maximize $f(G,V_s, V_c \cup V_{c'})$.
Maybe you have already heard it before, that Greedy algorithm works really well for this problem. Is it the best solution? It is clear that the greedy algorithm is a polynomial time algorithm. Then is this problem in P? Is it NP? This will be an exciting review of the old problem.
Allow me to brief the problem here.
In the graph, assume every edge has weight 1 that the distance can be calculated accordingly. We also define the distance between edges and vertices. For $e = (u,v) \in E$, $d(e,w) = 0.5 + min(d(u,w), d(v,w))$ and the distance between edges immediately follows. It is clear that the distance measurement is metric. (why?)
For a graph $G = (V,E)$ with two disjoint sets of vertices $V_s, V_c$, define score to be the number of edges closer to $V_c$. i.e. $d(e, V_c) \leq d(e, V_s)$. If the distances are equal it counts as 0.5.
Let's have a function to describe the score: $f(G, V_s, V_c) =~score$.
Instance: $G = (V,E)$ and two disjoint sets of vertices $V_s, V_c \subseteq V$, with some further constraint specified upon the challenge.
Output: Another set $V_{c'}\subseteq V$ that is mutually disjoint with $V_s, V_c$.
Goal: Maximize $f(G,V_s, V_c \cup V_{c'})$.
Maybe you have already heard it before, that Greedy algorithm works really well for this problem. Is it the best solution? It is clear that the greedy algorithm is a polynomial time algorithm. Then is this problem in P? Is it NP? This will be an exciting review of the old problem.
Sunday, 15 December 2013
GCD and exponents
LCM (least common multiple) and GCD (greatest common divisor) (for integers as a integral domain) is a relatively easy field that indicates similar behavior for some numbers. In the previous entry I have constructed some related identities using prime factorization, and here I would like to introduce a few related problems that we use the properties of GCD and LCM to solve it.
Definition. For natural numbers $a_1, ..., a_n$, let $[a_1,...,a_n]$ be their LCM and $(a_1,...,a_n)$ be their GCD.
Theorem. For $m,n\in \mathbb{N}$, $mn = [m,n](m,n)$.
Q1. (Korea MO 2013, Day2 Q1) Find all $f: \mathbb{N}\rightarrow \mathbb{N}$ so that $f(mn) = [m,n](f(m), f(n))$.
When you see the above theorem a very natural solution is that $f(m) = m$ but we are asked to find all of them. Is this possible?
Substitute $(m,n) = (1,k)$, we have $f(k) = k(f(1),f(k)) = [1,k](f(1),k(f(1),f(k)))$. Let $(f(1),f(k)) = m$ then $f(k) = k(f(1),km) = km$. It is clear that $m \mid f(1)$ so that $f(k) = k(f(1),km) = kmp$ where $p\in \mathbb{N}$. Clearly, $p=1$ so that $(f(1), k) = 1$. Since $k\in \mathbb{N}$ is arbitrary, we have $f(1) = 1$, and $f(k) = k(f(1),km) = k$.
Here we should notice a fact:
Fact 1. $[a_1,...,a_n] \geq max(a_1,...,a_n)$ and $(a_1,..,a_n) \leq min (a_1,...,a_n)$
GCD is something that decrease when you repetitively take it, and if you can take it arbitrarily that gives the same result, the number is probably 1.
In this question, we put the expansion of $f(k)$ to get a looped GCD brackets, but it can be taken alternatively:
Fact 2. For $m,n\in \mathbb{N}$, $(m,n) = (m,(m,n))$.
The next question is about GCD identity on more variables.
Q2. Find all $(a,b,c) \in \mathbb{N} ^3$ so that $[a,b,c](a,b,c) = abc$
We use the proven identities from the last entry:
Theorem. For $a,b,c\in \mathbb{N}$, $[a,b,c]^2 \prod _{cyc} (a,b) = (a,b,c)^2 \prod _{cyc} [a,b]$.
Now $[a,b,c]^2(a,b,c)^2 = (a,b,c)^4 \prod _{cyc} [a,b]/(a,b) = (a,b,c)^4 \prod _{cyc} (ab/(a,b)^2) = (abc)^2 (a,b,c)^4/ \prod _{cyc} (a,b)^2 $ and we have
$[a,b,c](a,b,c) = abc (a,b,c)^2/ \prod_{cyc} (a,b)$
To make the last term 1 it is clear that all numbers must be relatively prime. To see why simply note that $(a,b)^2 \geq (a,b) \geq (a,b,c)$ and equality holds iff $(a,b) = 1$ so that we need $(a,b) = (b,c) = (c,a) = 1$, then $(a,b,c) = 1$.
Clearly all relatively prime triplets satisfies the question.
Sometimes the GCD and LCM notation takes all prime factors into account, but what if we want to consider a particular prime? We can prove the case for a certain prime and by enumerate through the primes we get the desired result as every natural numbers (which is finite) can be factorized into (finite) primes.
Definition. (Highest exponent) For $n\in \mathbb{N}$ and prime $p\in \mathbb{P}$ denote $v_p(n) = k$ if $p^k \mid n$ but $p^{k+1}\nmid n$. If $p\nmid n$ then $v_p(n) = 0$. (In some texts we don't allow 0 but here we just let it be)
Then we can prove GCD results in terms of prime exponents, and the proof is naturally completed by the FTOArithmetic.
Example 1. $v_p([a,b]) = max(v_p(a), v_p(b))$ and $v_p((a,b)) = min (v_p(a), v_p(b))$.
Example 2. Since $max(a,b) + min(a,b) = a+b$, we have $v_p([a,b](a,b)) = v_p([a,b])+v_p((a,b)) = a+b$, which proves the identity $(a,b)[a,b] = ab$.
Example 3. When there are more variables and the equation of interest is cyclic we can WLOG assume the order, here let $v_p(a) =x$, $v_p(b) = y$ and $v_p(c) = z$ and WLOG let $x\geq y\geq z$. Then
$v_p([a,b,c]^2\prod (a,b)) = v_p((a,b,c)^2 \prod [a,b]) = 2x+y+2z$
Here we have an alternate prove for Q2, from mathlinks, using the exponent notation:
Let $v_p(a) =x$, $v_p(b) = y$ and $v_p(c) = z$ and WLOG let $x\geq y\geq z$. The question is equivalent to $x+z = x+y+z$ so that $y = 0 \geq z = 0$, so that $(a,b) = (b,c) = (c,a) = 1$ due to cyclic assumption. Then $(a,b,c) = 1$ and $[a,b,c] = abc$.
Q3. Find all triplets $(a,b,c) \in \mathbb{N}^3$ so that $a+b = (a,b)^2$ in cyclic manner.
We always get trouble when addition steps into questions on GCD. To do this notice that for prime $p\geq 3$, it is impossible that $v_p(a+b) \geq v_p ((a,b))+1$ for all $(a,b) \leftarrow (a,b), (b,c),(c,a)$ (why?) Then we have $(a,b)$ equals to 1 or 4, which allow us to enumerate all cases.
Using the exponent notation we can have a very powerful theorem called the lifting the exponent lemma, that allows us to use the exponent notation to find the highest power of some complicated forms. These can be easily searched so I will not cover it here. It is actually frequently used in current Olympic tournaments!
Three little problems:
1) (Russian MO 1999) Find all bounded sequence $\left\{ a_n \right\} \subseteq \mathbb{N}$ where for all $n\geq 3$, $n\in \mathbb{N}$, $a_n = (a_{n-1}+a_{n-2})/(a_{n-1},a_{n-2})$.
2) (A classical lemma) For $a,m,n\in \mathbb{N}$ and $a\geq 2$ show that $(a^m -1, a^n - 1) = a^{(m,n)}-1$.
3) Describe all the triples $(a,b,c)\in \mathbb{N}^3$ so that $[a,b,c](a,b,c) = \sqrt{abc}$.
Definition. For natural numbers $a_1, ..., a_n$, let $[a_1,...,a_n]$ be their LCM and $(a_1,...,a_n)$ be their GCD.
Theorem. For $m,n\in \mathbb{N}$, $mn = [m,n](m,n)$.
Q1. (Korea MO 2013, Day2 Q1) Find all $f: \mathbb{N}\rightarrow \mathbb{N}$ so that $f(mn) = [m,n](f(m), f(n))$.
When you see the above theorem a very natural solution is that $f(m) = m$ but we are asked to find all of them. Is this possible?
Substitute $(m,n) = (1,k)$, we have $f(k) = k(f(1),f(k)) = [1,k](f(1),k(f(1),f(k)))$. Let $(f(1),f(k)) = m$ then $f(k) = k(f(1),km) = km$. It is clear that $m \mid f(1)$ so that $f(k) = k(f(1),km) = kmp$ where $p\in \mathbb{N}$. Clearly, $p=1$ so that $(f(1), k) = 1$. Since $k\in \mathbb{N}$ is arbitrary, we have $f(1) = 1$, and $f(k) = k(f(1),km) = k$.
Here we should notice a fact:
Fact 1. $[a_1,...,a_n] \geq max(a_1,...,a_n)$ and $(a_1,..,a_n) \leq min (a_1,...,a_n)$
GCD is something that decrease when you repetitively take it, and if you can take it arbitrarily that gives the same result, the number is probably 1.
In this question, we put the expansion of $f(k)$ to get a looped GCD brackets, but it can be taken alternatively:
Fact 2. For $m,n\in \mathbb{N}$, $(m,n) = (m,(m,n))$.
The next question is about GCD identity on more variables.
Q2. Find all $(a,b,c) \in \mathbb{N} ^3$ so that $[a,b,c](a,b,c) = abc$
We use the proven identities from the last entry:
Theorem. For $a,b,c\in \mathbb{N}$, $[a,b,c]^2 \prod _{cyc} (a,b) = (a,b,c)^2 \prod _{cyc} [a,b]$.
Now $[a,b,c]^2(a,b,c)^2 = (a,b,c)^4 \prod _{cyc} [a,b]/(a,b) = (a,b,c)^4 \prod _{cyc} (ab/(a,b)^2) = (abc)^2 (a,b,c)^4/ \prod _{cyc} (a,b)^2 $ and we have
$[a,b,c](a,b,c) = abc (a,b,c)^2/ \prod_{cyc} (a,b)$
To make the last term 1 it is clear that all numbers must be relatively prime. To see why simply note that $(a,b)^2 \geq (a,b) \geq (a,b,c)$ and equality holds iff $(a,b) = 1$ so that we need $(a,b) = (b,c) = (c,a) = 1$, then $(a,b,c) = 1$.
Clearly all relatively prime triplets satisfies the question.
Sometimes the GCD and LCM notation takes all prime factors into account, but what if we want to consider a particular prime? We can prove the case for a certain prime and by enumerate through the primes we get the desired result as every natural numbers (which is finite) can be factorized into (finite) primes.
Definition. (Highest exponent) For $n\in \mathbb{N}$ and prime $p\in \mathbb{P}$ denote $v_p(n) = k$ if $p^k \mid n$ but $p^{k+1}\nmid n$. If $p\nmid n$ then $v_p(n) = 0$. (In some texts we don't allow 0 but here we just let it be)
Then we can prove GCD results in terms of prime exponents, and the proof is naturally completed by the FTOArithmetic.
Example 1. $v_p([a,b]) = max(v_p(a), v_p(b))$ and $v_p((a,b)) = min (v_p(a), v_p(b))$.
Example 2. Since $max(a,b) + min(a,b) = a+b$, we have $v_p([a,b](a,b)) = v_p([a,b])+v_p((a,b)) = a+b$, which proves the identity $(a,b)[a,b] = ab$.
Example 3. When there are more variables and the equation of interest is cyclic we can WLOG assume the order, here let $v_p(a) =x$, $v_p(b) = y$ and $v_p(c) = z$ and WLOG let $x\geq y\geq z$. Then
$v_p([a,b,c]^2\prod (a,b)) = v_p((a,b,c)^2 \prod [a,b]) = 2x+y+2z$
Here we have an alternate prove for Q2, from mathlinks, using the exponent notation:
Let $v_p(a) =x$, $v_p(b) = y$ and $v_p(c) = z$ and WLOG let $x\geq y\geq z$. The question is equivalent to $x+z = x+y+z$ so that $y = 0 \geq z = 0$, so that $(a,b) = (b,c) = (c,a) = 1$ due to cyclic assumption. Then $(a,b,c) = 1$ and $[a,b,c] = abc$.
Q3. Find all triplets $(a,b,c) \in \mathbb{N}^3$ so that $a+b = (a,b)^2$ in cyclic manner.
We always get trouble when addition steps into questions on GCD. To do this notice that for prime $p\geq 3$, it is impossible that $v_p(a+b) \geq v_p ((a,b))+1$ for all $(a,b) \leftarrow (a,b), (b,c),(c,a)$ (why?) Then we have $(a,b)$ equals to 1 or 4, which allow us to enumerate all cases.
Using the exponent notation we can have a very powerful theorem called the lifting the exponent lemma, that allows us to use the exponent notation to find the highest power of some complicated forms. These can be easily searched so I will not cover it here. It is actually frequently used in current Olympic tournaments!
Three little problems:
1) (Russian MO 1999) Find all bounded sequence $\left\{ a_n \right\} \subseteq \mathbb{N}$ where for all $n\geq 3$, $n\in \mathbb{N}$, $a_n = (a_{n-1}+a_{n-2})/(a_{n-1},a_{n-2})$.
2) (A classical lemma) For $a,m,n\in \mathbb{N}$ and $a\geq 2$ show that $(a^m -1, a^n - 1) = a^{(m,n)}-1$.
3) Describe all the triples $(a,b,c)\in \mathbb{N}^3$ so that $[a,b,c](a,b,c) = \sqrt{abc}$.
Sunday, 24 November 2013
A little analysis problem
Recently I've been reading Springer GTM on hyperreals, but I'm not good enough to take any of these into the discussions here. I'm going to discuss a nice little problem here but due to the competitive nature of this question I am not going to disclose the source of it (maybe till the competition being due).
Question. Suppose $\left\{ a_n\right\} \subseteq \mathbb{N}$ strictly increasing. Show that $\sum_{i=1}^{\infty} \frac{a_{i+1}-a_i}{a_i}$ diverges.
A geometric interpretation gives you the conclusion immediately:
Solution. $\sum_{i=1}^{\infty} \frac{a_{i+1}-a_i}{a_i} \geq \sum_{i=1}^{\infty} \sum_{j=a_i}^{a_{i+1}-1} j^{-1} = \sum_{i = a_1}^{\infty} i^{-1}$
That clearly converges as indicated by the comparison test since the last one is a partial harmonic series. Please note that the above expression should be avoided as the $\geq $ sign is vague due to the divergence nature of the series.
Now...
Question 2. Suppose $\left\{ a_n\right\} \subseteq \mathbb{R}^+$ strictly increasing. Will $\sum_{i=1}^{\infty} \frac{a_{i+1}-a_i}{a_i}$ converge?
Extending the above proof to $\mathbb{R}$ seems to be a nice idea. To do this we shall compare the term $\frac{a_{i+1}-a_i}{a_i}$ with the integral $\int ^{a_{i+1}}_{a_i} x^{-1}dx$. We can show that they are usually asymptotically equivalent. Why usually? Let's bound them as the following:
The lower bound is clear (we use that for divergence). Suppose the sequence is unbounded then
$\sum_{i=1}^{\infty} \frac{a_{i+1}-a_i}{a_i} \geq \sum_{i=1}^{\infty} \int ^{a_{i+1}}_{a_i} x^{-1}dx = \int ^{\infty} _{a_1} x^{-1} dx$
Of course if it is bounded then it converges to $k\in \mathbb{R}$ then the series is lowerly bounded by $\int^k_{a_i} x^{-1} dx = \ln (k) - \ln (a_1)$.
What bothers us is the upper bound. Since $\left\{ a_n\right\}$ strictly increasing we use the convention that $k$ being limit of the sequence $\left\{ a_n\right\}$, which is infinity if it diverges.
Define $N_i = \frac{a_{i+1}}{a_i}$. Then if $\left\{ N_i\right\}$ upperly bounded by $N$ then the series is also upperly bounded by $\int ^{ k}_{a_1} Nx^{-1}dx = N(\ln k - \ln 1)$, so if $N_i$ and $a_i$ bounded then the series is bounded as well.
Is it possible that $a_i$ bounded but $N$ being unbounded? The answer is clear: no. Suppose $a_i$ upperly bounded by $A$. Then $N_i = \frac{a_{i+1}}{a_i} \leq \frac{A}{a_1}$ so that it is also upperly bounded.
Solution. The series converges if and only if the sequence $\left\{ a_n\right\}$ converges. i.e. iff the sequence is bounded above.
Define $f(x): \mathbb{R}^+ \rightarrow \mathbb{R}^+$ that $f(x) = \sum_{i: a_i \leq x} \frac{a_{i+1}-a_i}{a_i}$. Then it would be nice to do some asymptotic analysis over the function $f$ given that $a_i$ diverges.
From our previous analysis, if $N_i$ bounded then $f(x) \sim \log x$. What if $N$ unbounded? If we think about how $N_i$ is defined we can make up this example: $a_n = n!$ then $\frac{a_{i+1}-a_i}{a_i} = n$ .
Then it is clear that $N_i$ unbounded as $N_i = i+1$ with $f(n!)\sim n^2$, but if $N_i$ bounded we have $f(n!) \sim \log (n!) \sim n\log n$ by Stirling's formula, which has a different growth rate.
Foods for thought:
Given strictly increasing sequence $\left\{ a_n\right\}$ where $a_n \neq 0 \forall n\in \mathbb{N}$, define $b_i = \frac{a_{i+1}-a_i}{a_i}$.
1) Given a sequence $\left\{ a_n \right\} \in \mathbb{R}/\left\{ 0\right\}$. Give the condition that the series $\sum b_i$ converges.
2) Repeat Q1 if $\left\{ a_i\right\}\subseteq \mathbb{R}^+$ is increasing
3) For $k\in \mathbb{N}$, Define $b_{i,k} = \frac{a_{i+k} - a_i}{a_i}$ where $\left\{ a_i\right\}\subseteq \mathbb{R}^+$ strictly increasing. Determine the condition of convergence for $\sum_i b_{i,k}$.
4) Back to the original question. How fast can $f(x)$ grow? Are there any asymptotical boundings?
Question. Suppose $\left\{ a_n\right\} \subseteq \mathbb{N}$ strictly increasing. Show that $\sum_{i=1}^{\infty} \frac{a_{i+1}-a_i}{a_i}$ diverges.
A geometric interpretation gives you the conclusion immediately:
Solution. $\sum_{i=1}^{\infty} \frac{a_{i+1}-a_i}{a_i} \geq \sum_{i=1}^{\infty} \sum_{j=a_i}^{a_{i+1}-1} j^{-1} = \sum_{i = a_1}^{\infty} i^{-1}$
That clearly converges as indicated by the comparison test since the last one is a partial harmonic series. Please note that the above expression should be avoided as the $\geq $ sign is vague due to the divergence nature of the series.
Now...
Question 2. Suppose $\left\{ a_n\right\} \subseteq \mathbb{R}^+$ strictly increasing. Will $\sum_{i=1}^{\infty} \frac{a_{i+1}-a_i}{a_i}$ converge?
Extending the above proof to $\mathbb{R}$ seems to be a nice idea. To do this we shall compare the term $\frac{a_{i+1}-a_i}{a_i}$ with the integral $\int ^{a_{i+1}}_{a_i} x^{-1}dx$. We can show that they are usually asymptotically equivalent. Why usually? Let's bound them as the following:
The lower bound is clear (we use that for divergence). Suppose the sequence is unbounded then
$\sum_{i=1}^{\infty} \frac{a_{i+1}-a_i}{a_i} \geq \sum_{i=1}^{\infty} \int ^{a_{i+1}}_{a_i} x^{-1}dx = \int ^{\infty} _{a_1} x^{-1} dx$
Of course if it is bounded then it converges to $k\in \mathbb{R}$ then the series is lowerly bounded by $\int^k_{a_i} x^{-1} dx = \ln (k) - \ln (a_1)$.
What bothers us is the upper bound. Since $\left\{ a_n\right\}$ strictly increasing we use the convention that $k$ being limit of the sequence $\left\{ a_n\right\}$, which is infinity if it diverges.
Define $N_i = \frac{a_{i+1}}{a_i}$. Then if $\left\{ N_i\right\}$ upperly bounded by $N$ then the series is also upperly bounded by $\int ^{ k}_{a_1} Nx^{-1}dx = N(\ln k - \ln 1)$, so if $N_i$ and $a_i$ bounded then the series is bounded as well.
Is it possible that $a_i$ bounded but $N$ being unbounded? The answer is clear: no. Suppose $a_i$ upperly bounded by $A$. Then $N_i = \frac{a_{i+1}}{a_i} \leq \frac{A}{a_1}$ so that it is also upperly bounded.
Solution. The series converges if and only if the sequence $\left\{ a_n\right\}$ converges. i.e. iff the sequence is bounded above.
Define $f(x): \mathbb{R}^+ \rightarrow \mathbb{R}^+$ that $f(x) = \sum_{i: a_i \leq x} \frac{a_{i+1}-a_i}{a_i}$. Then it would be nice to do some asymptotic analysis over the function $f$ given that $a_i$ diverges.
From our previous analysis, if $N_i$ bounded then $f(x) \sim \log x$. What if $N$ unbounded? If we think about how $N_i$ is defined we can make up this example: $a_n = n!$ then $\frac{a_{i+1}-a_i}{a_i} = n$ .
Then it is clear that $N_i$ unbounded as $N_i = i+1$ with $f(n!)\sim n^2$, but if $N_i$ bounded we have $f(n!) \sim \log (n!) \sim n\log n$ by Stirling's formula, which has a different growth rate.
Foods for thought:
Given strictly increasing sequence $\left\{ a_n\right\}$ where $a_n \neq 0 \forall n\in \mathbb{N}$, define $b_i = \frac{a_{i+1}-a_i}{a_i}$.
1) Given a sequence $\left\{ a_n \right\} \in \mathbb{R}/\left\{ 0\right\}$. Give the condition that the series $\sum b_i$ converges.
2) Repeat Q1 if $\left\{ a_i\right\}\subseteq \mathbb{R}^+$ is increasing
3) For $k\in \mathbb{N}$, Define $b_{i,k} = \frac{a_{i+k} - a_i}{a_i}$ where $\left\{ a_i\right\}\subseteq \mathbb{R}^+$ strictly increasing. Determine the condition of convergence for $\sum_i b_{i,k}$.
4) Back to the original question. How fast can $f(x)$ grow? Are there any asymptotical boundings?
Sunday, 6 October 2013
An algorithm to weight influences
This is my first few times using MathJax! If you have any troubles in viewing these TeX formula don't hesitate to tell me.
Introduction
There are many types of contests that require subjective judging like diving as well as osu! mapping contests. We are human that it is unavoidable that bias may occur even though there are well-written guidelines to follow. The aim of this entry is to make an attempt on correcting such bias by an algorithm, and to determine the degree of bias occurred in the judgement.
Introduction
There are many types of contests that require subjective judging like diving as well as osu! mapping contests. We are human that it is unavoidable that bias may occur even though there are well-written guidelines to follow. The aim of this entry is to make an attempt on correcting such bias by an algorithm, and to determine the degree of bias occurred in the judgement.
Wednesday, 25 September 2013
Thoughts on minesweeper
Minesweeper has been a classic for those who had been using the Window OS, and those who born in the MSN era will not forget the minesweeper battle (and I wrote a few entries about minesweeping in the past, in Chinese)
For those who don't know what is minesweeper, I seriously recommend you to have a go for it. We can define the minesweeper formally:
Initial status: a rectangular matrix with all grids closed.
State of grids:
1) Closed.
2) Opened: showing a number, which equals to the number of mines in the 8 neighboring grids (5, or 3 if grid is on the boundary).
3) Checked: the grid is marked to contain a mine. (No grids will contain more than one mine) Also, assume wrongly opened grid to be checked.
Objective: Turn all closed grids into either opened or checked CORRECTLY.
Minesweeper sometimes depends on luck, but it depends on logical deduction as well. I won't go deep into the logical stuffs here but here we analysis the principle behind logical deduction.
Just as what we learn from computation complexity, every NP-complete question is as hard as the 3-SAT (satisfibility) question. The minesweeper game is not necessarily a NP-complete question but we can still turn them into SAT question.
For the following question, we use white square or bracketed numbers for closed grids, black square for checked grids, and numbers for opened grids. Assume the boundary of the given grid is the true boundary of the board unless otherwise specified.
In our example, any bracketed number $(n)$ is equivalent to the boolean variable $a_n$, which is true if it has a mine, or zero otherwise. For SAT questions, clauses are "OR" inside and "AND" outside, but we can but too lazy to turn them into a standard SAT format.
Example 1. $\begin{matrix} (1) & (2) \\ 1&1\end{matrix}$
The information given from the two '1' s are equivalent, and it this $a_1 \bigoplus a_2$.
Example 2. $\begin{matrix} (1) & (2) & (3) & (4) \\ \blacksquare & 2 & 3&\blacksquare \end{matrix}$
How do we express "1 out of 3" with binary boolean operator? The solution is "any two out of three has no more than one" AND "it has at least one". To express "2 out of 3", we can use the opposite of "1 out of 3"
The information is: $C_1 \wedge C_2$ where
$C_1 = \neg (a_1 \wedge a_2) \wedge \neg (a_2 \wedge a_3) \wedge \neg (a_1 \wedge a_3) \wedge (a_1 \vee a_2 \vee a_3)$
$C_2 = (a_2 \vee a_3) \wedge (a_3 \vee a_4) \wedge (a_2 \vee a_4) \wedge \neg (a_2 \wedge a_3 \wedge a_4)$
By some common sense, or SAT solvers, we will eventually get $a_1$ being false (no mine) and $a_4$ being true. No solutions for $a_2, a_3$? No worries. As we flip grid 1 and check grid 4, we get definite answers as we opened those grids. It's now reduced to example 1 but with extra clauses that we can solve it. (we CAN because the new clause is a single variable!)
Technically, a minesweeper board with $n$ grid is a SAT question $C_1 \wedge ... \wedge C_n$, where each clause represent the information from a single grid. Of course, if a grid contains a mine, then it gives no information but also automatically satisfied, then the clause is empty.
Of course, we won't have all the clauses on our hands, but let's not forget the fact that clauses are made up based on a fixed board, and hence the clauses can be satisfied. Since the clauses are solvable, if a variable can be solved from the partial clauses then it also satisfies the overall clauses, which enable us to reveal more information and eventually solve the whole board. Instead of determining solvability, our question is: are there MULTIPLE solutions towards the SAT question?
Definitely there are setups with multiple solutions. One stupid example:
Example 3. $\begin{matrix} \blacksquare &\blacksquare &\blacksquare \\ \blacksquare & (1) & \blacksquare \\ \blacksquare & \blacksquare & \blacksquare\end{matrix}$
Information: $()$. This is damn independent of $a_1$ !!!!
Example 4. $\begin{matrix} \blacksquare &2&2&\blacksquare \\ 2&(1)&(2)&2\\2&(3)&(4)&2\\\blacksquare &2&2&\blacksquare\end{matrix}$
The information is $(a_1 \bigoplus a_2) \wedge (a_2 \bigoplus a_4) \wedge (a_3 \bigoplus a_4) \wedge (a_1 \bigoplus a_3)$. After some simplification the answer is $(a_1 \leftrightarrow a_4) \bigoplus (a_2 \leftrightarrow a_3)$, which clearly brings you two solutions!
Our question is, how often that a board will give us situations that can't be deterministically solved? We have to formalize this question though:
Can we ask that if all clauses are provided? No as each empty clause represents a grid to be checked. Instead, we state the following algorithm:
def mssolver(grid, initpos = null):
if initpos not null start from the assigned position, otherwise start on a random grid
while not all grids opened/checked:
Construct clauses from given information
Solve the clauses and open/check ensured grids
if none of the grids are opened in this loop, reveal a grid randomly
return grid
I seldom leave open questions here, but SAT solvers are beyond my scope and I have other topics to deal with, but here are the questions:
Assuming that for a board, each grid has a probability $p$ to contain a mine.
Question 1. Given an infinite board (so that the initial position does not matter), how often do we have to open a grid blindly? (we can count the number of loops taken, which divides the number of times that we open a random grid.) Moreover, how often will we open a grid wrongly?
Question 2. What if we work on finite rectangular grid? Does the initial position matters?
Question 3. What if the minesweeper structure is changed? In particular, in the hexagon board?
It will be nice if there are some simulation for this one. I think that the probability of getting wrong is less than that of a finite grid because we can go further and as we go further we have possible more information so that the chance that all information is used up is smaller.
Are there any further optimization? Surely we can do some probabilistic analysis? Not sure if I have the time to deal with it, but I will try my best. Before I end this entry I'll simply throw an example that demonstrates how probabilistic analysis works. Ask yourself why about that! ;)
Example 5. $ \begin{matrix} \square & 4& \square & \blacksquare \\ \square &\square &\square & \blacksquare \\ \blacksquare & 7(!!) &\square &3\\ \blacksquare & \blacksquare &\square & \square \end{matrix}$
For those who don't know what is minesweeper, I seriously recommend you to have a go for it. We can define the minesweeper formally:
Initial status: a rectangular matrix with all grids closed.
State of grids:
1) Closed.
2) Opened: showing a number, which equals to the number of mines in the 8 neighboring grids (5, or 3 if grid is on the boundary).
3) Checked: the grid is marked to contain a mine. (No grids will contain more than one mine) Also, assume wrongly opened grid to be checked.
Objective: Turn all closed grids into either opened or checked CORRECTLY.
Minesweeper sometimes depends on luck, but it depends on logical deduction as well. I won't go deep into the logical stuffs here but here we analysis the principle behind logical deduction.
Just as what we learn from computation complexity, every NP-complete question is as hard as the 3-SAT (satisfibility) question. The minesweeper game is not necessarily a NP-complete question but we can still turn them into SAT question.
For the following question, we use white square or bracketed numbers for closed grids, black square for checked grids, and numbers for opened grids. Assume the boundary of the given grid is the true boundary of the board unless otherwise specified.
In our example, any bracketed number $(n)$ is equivalent to the boolean variable $a_n$, which is true if it has a mine, or zero otherwise. For SAT questions, clauses are "OR" inside and "AND" outside, but we can but too lazy to turn them into a standard SAT format.
Example 1. $\begin{matrix} (1) & (2) \\ 1&1\end{matrix}$
The information given from the two '1' s are equivalent, and it this $a_1 \bigoplus a_2$.
Example 2. $\begin{matrix} (1) & (2) & (3) & (4) \\ \blacksquare & 2 & 3&\blacksquare \end{matrix}$
How do we express "1 out of 3" with binary boolean operator? The solution is "any two out of three has no more than one" AND "it has at least one". To express "2 out of 3", we can use the opposite of "1 out of 3"
The information is: $C_1 \wedge C_2$ where
$C_1 = \neg (a_1 \wedge a_2) \wedge \neg (a_2 \wedge a_3) \wedge \neg (a_1 \wedge a_3) \wedge (a_1 \vee a_2 \vee a_3)$
$C_2 = (a_2 \vee a_3) \wedge (a_3 \vee a_4) \wedge (a_2 \vee a_4) \wedge \neg (a_2 \wedge a_3 \wedge a_4)$
By some common sense, or SAT solvers, we will eventually get $a_1$ being false (no mine) and $a_4$ being true. No solutions for $a_2, a_3$? No worries. As we flip grid 1 and check grid 4, we get definite answers as we opened those grids. It's now reduced to example 1 but with extra clauses that we can solve it. (we CAN because the new clause is a single variable!)
Technically, a minesweeper board with $n$ grid is a SAT question $C_1 \wedge ... \wedge C_n$, where each clause represent the information from a single grid. Of course, if a grid contains a mine, then it gives no information but also automatically satisfied, then the clause is empty.
Of course, we won't have all the clauses on our hands, but let's not forget the fact that clauses are made up based on a fixed board, and hence the clauses can be satisfied. Since the clauses are solvable, if a variable can be solved from the partial clauses then it also satisfies the overall clauses, which enable us to reveal more information and eventually solve the whole board. Instead of determining solvability, our question is: are there MULTIPLE solutions towards the SAT question?
Definitely there are setups with multiple solutions. One stupid example:
Example 3. $\begin{matrix} \blacksquare &\blacksquare &\blacksquare \\ \blacksquare & (1) & \blacksquare \\ \blacksquare & \blacksquare & \blacksquare\end{matrix}$
Information: $()$. This is damn independent of $a_1$ !!!!
Example 4. $\begin{matrix} \blacksquare &2&2&\blacksquare \\ 2&(1)&(2)&2\\2&(3)&(4)&2\\\blacksquare &2&2&\blacksquare\end{matrix}$
The information is $(a_1 \bigoplus a_2) \wedge (a_2 \bigoplus a_4) \wedge (a_3 \bigoplus a_4) \wedge (a_1 \bigoplus a_3)$. After some simplification the answer is $(a_1 \leftrightarrow a_4) \bigoplus (a_2 \leftrightarrow a_3)$, which clearly brings you two solutions!
Our question is, how often that a board will give us situations that can't be deterministically solved? We have to formalize this question though:
Can we ask that if all clauses are provided? No as each empty clause represents a grid to be checked. Instead, we state the following algorithm:
def mssolver(grid, initpos = null):
if initpos not null start from the assigned position, otherwise start on a random grid
while not all grids opened/checked:
Construct clauses from given information
Solve the clauses and open/check ensured grids
if none of the grids are opened in this loop, reveal a grid randomly
return grid
I seldom leave open questions here, but SAT solvers are beyond my scope and I have other topics to deal with, but here are the questions:
Assuming that for a board, each grid has a probability $p$ to contain a mine.
Question 1. Given an infinite board (so that the initial position does not matter), how often do we have to open a grid blindly? (we can count the number of loops taken, which divides the number of times that we open a random grid.) Moreover, how often will we open a grid wrongly?
Question 2. What if we work on finite rectangular grid? Does the initial position matters?
Question 3. What if the minesweeper structure is changed? In particular, in the hexagon board?
It will be nice if there are some simulation for this one. I think that the probability of getting wrong is less than that of a finite grid because we can go further and as we go further we have possible more information so that the chance that all information is used up is smaller.
Are there any further optimization? Surely we can do some probabilistic analysis? Not sure if I have the time to deal with it, but I will try my best. Before I end this entry I'll simply throw an example that demonstrates how probabilistic analysis works. Ask yourself why about that! ;)
Example 5. $ \begin{matrix} \square & 4& \square & \blacksquare \\ \square &\square &\square & \blacksquare \\ \blacksquare & 7(!!) &\square &3\\ \blacksquare & \blacksquare &\square & \square \end{matrix}$
Thursday, 8 August 2013
Note transfer completed
So, here we go.
Notes are transferred to http://forretrio.t15.org/ . New instructions has been added to the heading of the site.
The junior section are referred to be local post within this blog. Some of them looks messy, and upon request I will repair them, while the senior section are now downloadable.
If there are any further questions comment here. Otherwise, enjoy.
Notes are transferred to http://forretrio.t15.org/ . New instructions has been added to the heading of the site.
The junior section are referred to be local post within this blog. Some of them looks messy, and upon request I will repair them, while the senior section are now downloadable.
If there are any further questions comment here. Otherwise, enjoy.
Wednesday, 10 July 2013
Concerning the notes
Currently my ftp site got demolished without any notice --- that's quite hash for me, and my site.
I'm going to find a new host for them as soon as possible, and I hope this can be done within the entire month. This won't be an easy task --- I have no backup in my current computer and I'm afraid that some of the junior form subject notes has been lost --- I will find them with my best. If they are not found I'll try to compensate it through other means.
FTP hosts are getting less common these days. We have experienced the evolution process of online social platform --- MSN community (late 90s) --- geocities (90s - early 10s) --- forums (00s) --- blogs (00s) --- and facebook. As written in the book Zero, the future of a radical price professionalism declines on the web --- and they evolves into more integrated platform. Individual sites lost its commercial potential when people use facebook --- and now no one supplies free FTP host. This will be a long and sad story but let's keep this piece of entry be an announcement rather than complaints.
I planned to make a full site connecting to this blog with the notes, but that would absolutely be a great challenge for me, and it takes quite a lot of time.
What should you do if you want to retrieve the notes?
1) Take the web version from this blog.
2) Serior form notes are hosted here temperorily: http://forretrio.comli.com/ I'm sorry for the stupid advertisment. I'll find a new host ASAP, as mentioned.
Thanks for supporting my blog, and if you have any enquiries put it in the ask.fm (on the right hand side), or comment anywhere in my site. To make this entry 'sticky' I'll not write any new stuff until the problem is solved.
11/7/2013
I'm going to find a new host for them as soon as possible, and I hope this can be done within the entire month. This won't be an easy task --- I have no backup in my current computer and I'm afraid that some of the junior form subject notes has been lost --- I will find them with my best. If they are not found I'll try to compensate it through other means.
FTP hosts are getting less common these days. We have experienced the evolution process of online social platform --- MSN community (late 90s) --- geocities (90s - early 10s) --- forums (00s) --- blogs (00s) --- and facebook. As written in the book Zero, the future of a radical price professionalism declines on the web --- and they evolves into more integrated platform. Individual sites lost its commercial potential when people use facebook --- and now no one supplies free FTP host. This will be a long and sad story but let's keep this piece of entry be an announcement rather than complaints.
I planned to make a full site connecting to this blog with the notes, but that would absolutely be a great challenge for me, and it takes quite a lot of time.
What should you do if you want to retrieve the notes?
1) Take the web version from this blog.
2) Serior form notes are hosted here temperorily: http://forretrio.comli.com/ I'm sorry for the stupid advertisment. I'll find a new host ASAP, as mentioned.
Thanks for supporting my blog, and if you have any enquiries put it in the ask.fm (on the right hand side), or comment anywhere in my site. To make this entry 'sticky' I'll not write any new stuff until the problem is solved.
11/7/2013
Wednesday, 22 May 2013
A number theory problem from online games Part 2
Using the same notation as last time we want to show that for , (where A is natrual number larger than 1) maximized at the second crossings. Why?
Definition. Let be the largest integer that .
The above claim is equivalent to claiming that is the global maximum of f.
Theorem. .
Proof. , the result follows.
Theorem. iff .
Proof. Straightforward by direct expansion.
Theorem. maximized at .
Proof. This is equivalent to .
which can be shown using modular arithmetic. (It isn't hard to check: there are only finite case to check.)
How about the case ?
Note that and let . Consider the case which is larger than zero for large enough a_k. Now we can summarize our approach to find our optimal solution:
Check the following value of a: if they are a better solution replace the current one.
1) The (usually the best answer)
2) Find smallest . For each find so that is maximized in the range .
Complexity? Something like . The problem is how often the answer falls on and how should we make the algorithm more efficient? I would leave that part open.
What about irrational case? Let's talk about them next them.
Definition. Let be the largest integer that .
The above claim is equivalent to claiming that is the global maximum of f.
Theorem. .
Proof. , the result follows.
Theorem. iff .
Proof. Straightforward by direct expansion.
Theorem. maximized at .
Proof. This is equivalent to .
which can be shown using modular arithmetic. (It isn't hard to check: there are only finite case to check.)
How about the case ?
Note that and let . Consider the case which is larger than zero for large enough a_k. Now we can summarize our approach to find our optimal solution:
Check the following value of a: if they are a better solution replace the current one.
1) The (usually the best answer)
2) Find smallest . For each find so that is maximized in the range .
Complexity? Something like . The problem is how often the answer falls on and how should we make the algorithm more efficient? I would leave that part open.
What about irrational case? Let's talk about them next them.
Sunday, 19 May 2013
A number theory problem from online games Part 1
The situation comes from online game in a very natrual style.
Unlike modern banking system, the statistics (like money) in online games are usually integer, but that makes some problem (to the producer sometimes, and sometimes to players) in particular the statistic conversion.
Like, we are obtaining energy A from energy B at a rate of 36.5%. It is natrual that it takes 365 units of B for us to get 1000 units of A. For what if we want 1 units of A? 3 units?
In the game that I am playing they take the rounding off approach --- of course --- this is beneficial if we goes B->A but not-so-good when we go in another way. For example 1 unit of A is supposed to take 0.365 units of B and now it's rounded off to zero. 3 units of A takes 1 unit of B to convert with. Of course, the system won't let you to obtain 1 unit of A 'with zero unit of B' --- you need at least one unit of input.
The problem comes: in what ratio (in particular, the amount of A I want every time) so that we maximize our gain (i.e. ratio above the supposed rate)? We can now state our problem:
Instance: A real number
Prroblem: Find to maximize .
Well, r causes a bit of trouble when it isn't rational, so we modify the instance a bit:
Instance: A sequence of real number that converges to . It follows that the sequence is an approximation to that and . Moreover, it follows that for all fractions that . If , the sequence constantly equal to the fraction that equals to .
We should also specify that we can't convert to something (larger than zero) using zero resources. So the problem becomes:
Problem: Find to maximize as defined above, with .
Where should we start with? It is clear that when the problem can be easily solved. To see this we shall consider rational instance for the rest of (this part of) the passage with the notation .
Lemma. For , .
Proof. For , there is a multiplicative inverse of p mod q. Let in , it is clear that .
For , .
It's like that the above lemma gives us a possible answer to the problem, but is it the best answer? Let's divide it in two cases:
Theorem. If , then is the optimal answer.
Proof. It is clear that can't be a valid answer, and for all using similar arguement with the lemma. To see that for , notice that so that the integer part of do not increase in the given range, the result immediately follows.
Similarly, if we check all to get the optimal answer. It follows that the optimal answer is at most using the similar arguement.
If we want to be faster, check all crossing and the corresponding value so that is just under the multiple of q, now we should get the answer.
Why 'should'? Think about the situation:
Suppose we get as the 'optimal' answer using the faster way, is a better answer? Is this even possible?... We shall see it shortly...
Unlike modern banking system, the statistics (like money) in online games are usually integer, but that makes some problem (to the producer sometimes, and sometimes to players) in particular the statistic conversion.
Like, we are obtaining energy A from energy B at a rate of 36.5%. It is natrual that it takes 365 units of B for us to get 1000 units of A. For what if we want 1 units of A? 3 units?
In the game that I am playing they take the rounding off approach --- of course --- this is beneficial if we goes B->A but not-so-good when we go in another way. For example 1 unit of A is supposed to take 0.365 units of B and now it's rounded off to zero. 3 units of A takes 1 unit of B to convert with. Of course, the system won't let you to obtain 1 unit of A 'with zero unit of B' --- you need at least one unit of input.
The problem comes: in what ratio (in particular, the amount of A I want every time) so that we maximize our gain (i.e. ratio above the supposed rate)? We can now state our problem:
Instance: A real number
Prroblem: Find to maximize .
Well, r causes a bit of trouble when it isn't rational, so we modify the instance a bit:
Instance: A sequence of real number that converges to . It follows that the sequence is an approximation to that and . Moreover, it follows that for all fractions that . If , the sequence constantly equal to the fraction that equals to .
We should also specify that we can't convert to something (larger than zero) using zero resources. So the problem becomes:
Problem: Find to maximize as defined above, with .
Where should we start with? It is clear that when the problem can be easily solved. To see this we shall consider rational instance for the rest of (this part of) the passage with the notation .
Lemma. For , .
Proof. For , there is a multiplicative inverse of p mod q. Let in , it is clear that .
For , .
It's like that the above lemma gives us a possible answer to the problem, but is it the best answer? Let's divide it in two cases:
Theorem. If , then is the optimal answer.
Proof. It is clear that can't be a valid answer, and for all using similar arguement with the lemma. To see that for , notice that so that the integer part of do not increase in the given range, the result immediately follows.
Similarly, if we check all to get the optimal answer. It follows that the optimal answer is at most using the similar arguement.
If we want to be faster, check all crossing and the corresponding value so that is just under the multiple of q, now we should get the answer.
Why 'should'? Think about the situation:
Suppose we get as the 'optimal' answer using the faster way, is a better answer? Is this even possible?... We shall see it shortly...
Subscribe to:
Posts (Atom)