Sunday, 20 July 2025

IMO 2025 and LLM's misleading claim on gold medals

Hello, time for IMO again!

This is one of the IMO hosted that is closest to my base of New Zealand, but I am not a participant for sure and I am not a team staff anyway, so nothing other than the questions themselves. So let's dive into the questions right now, shall we?

As usual, this is not a full attempt with the proper time limit. This is my instinctive brainstorming and observations, together with comments when I check the discussions.

Q1. Do you call that combinatorics? Or algebra? I think it has bits of everything yet so simple that makes it hard to classify.

For $n=3$ that is pretty simple where you come up with the possibility of $k = 0,1,3$. But what about the general $n$? That must be by induction for sure, and it would be nice if we can reduce the $n$.

My instinct is an idea similar to the reciprocity proof: how many lattice points a line would cross given the first crossed point and the rational slope? That is simply the denominator of the slope at simplest form. That says, sunny lines are inefficient in covering the points whose size grows quadratically. However, this approach is so vague and leaves the cover set in a mess. Thus I changed the covering focus to the boundary instead.

To be more precise, we focus on the colinear lattices $(1,b)$. Two cases to consider: if we include the line $x=1$ that passes through all those lattices we can reduce to lower $n$. If not, that is $n$ distinct lines passing through the $n$ lattices. For that case we ask the question: what lines cover the lattices $(b,n+1-b)$? If they are covered by the line $x+y = n+1$ then we reduce to lower $n$, otherwise again they are again covered by $n$ distinct lines.

Now this is a matching between the $(n-1)$ lattices $(1,1),...,(1,n-1)$ and $(2,n-1),...,(n,1)$ so that it covers all lattices not covered by the line connected to $(1,n)$. We focus on the lattices $(2,1),...,(n-1,1)$. If the line connecting $(1,1)$ does not connect $(n,1)$, all such $(n-2)$ lattices are not covered. They can't be covered by any other line connecting $(1,2),...,(1,n-1)$ either because such line won't be able to connect a lattice on $x+y=n$ then. Thus either $n=3$ or we force the line $y=1$. 

From the above we established that we can always reduce $n$ by arguing at least one of the lines would be the edge of the outermost triangle, so we can reduce $n$ until $n=3$ where we already know the answer.

I thought this is a nice argument until I checked AoPS.

oh.

OOOOHHHHHHH.

Point counting around the outermost triangle with inequality $2n \leq 3n-3$ is much better.

Q2. Geometry outside of Q1/4 is hopeless for me as usual. Interestingly if I had 4.5 hours I can actually hardbash this geometry question by co-geom. It gets even more feasible with the fact that I can actually do Q3 timely this year. It is also simple enough in the sense you do not need to use vector calculus or the ratio gimmick stuff.

Q3. Surprisingly easy really. When you see statements in the type of "$f(a)$ satisfies proposition $P(a,b)$ for all $a,b$" you know this is an extremely powerful statement to be exploited by picking the right $b$. It also allows us to approach by thinking about the factorization of $f(a)$.

Believe or not, these two approaches are all you need for this problem.

- For prime $p$ we know that $f(p)\mid p^p$ so $f(p)$ is power of primes. 
- For $f(x)\neq x$ (we know the identity function does not satisfy the functional equation), $f(p) \mid x^p - f(x)^{f(p)}$ but $x^p -f(x)^{p^c} \equiv x-f(x)$ mod $p$ so $p\mid f(x)-x$ which is impossible for large enough $p$. Thus $f(p) = 1$ for all large enough primes.
- Suppose $a$ odd and $p\mid f(a)$ for odd prime $p$. Let $q$ be large enough so that $f(q) = 1$ and is primitive mod $p$. Then $f(a) \mid q^a-1$, but since $q$ is primitive we know $p-1\mid a$ which is impossible since $p-1$ is even and $a$ is odd. Checking $f(a) \mid a^a - f(a)^{f(a)}$ shows that $f(a)$ is odd as well. Thus $f(a) = 1$.
- Suppose $p\mid f(a)$ for prime $p$, then $p\mid f(a)\mid p^a-1$ forces $p=2$, so $f$ maps to powers of 2. (Notice the order, we need the previous claim so that $f(p) = 1$ for all odd primes.
- It is all prime power ($v_2(a)$) chasing and construction of a tight example at the end.

Every step is so natural: divisibility leads to prime behavior, odd behavior then extend to the whole function. The only slightly less natural tool here would be the Dirichlet's theorem but I guess this is well known right? 

Anyway, one of the very few accessible Q3 to me.

Q4. Strangely I find this one causing me more trouble than Q3, due to the troublesome case by case argument. The first part is easy -- the only three distinct unit fractions the sums to 1 is $(2,3,6)$. The rest is all about arguing (1) what would reduce to this case and (2) why the rest do not.

It reminds me of the Aliquot sequence problem! To fully characterize the Aliquot sequence we just need to argue about (1) what would reduce to cycles (primes, amicible numbers or sociable numbers) and (2) what would runaway. Easy right? No. This is an open problem.

It is easy enough to argue $a_1$ must be even or else the prime unit fractions do not sum up to 1 and that keeps going creating infinite descent, but the rest is quite messy and troublesome, like showing that 3 must be a factor with power limits. Bleh.

Q5. Game with inequalities! It immediately becomes one of my favourites in recent years, although that does not tell anything in terms of difficulty. The "critical point" was deemed non-trivial on the AoPS post but I don't really think so.

Alice is maintaining the linear sum $\sum x_i \leq \lambda n$ while Bazza is maintaining the quadratic sum $\sum x_i^2 \leq n$. The way Alice defeats Bazza is by min-maxing: she can keep choosing zero and accumulate (the allowance of $\lambda n - \sum x_i$) until she can pull a big one to break the quadratic sum. The critical point is then $\sqrt{2}/2$ because the best Bazza can do is $\sqrt{2}$ so that the quadratic sum increases by the maximum amount aka 2.

After that it suffices to check the cases $\lambda$ above, below or equal to $\sqrt{2}/2$. And again I would say every step here logical and natural. The choice of $x_{2k} = \sqrt{2-x_{2k-1}^2}$ looks artificial but it's merely "the choice to stuff the quadratic sum". The beautiful ending to the problem is that the equal case is actually the combination of the winning strategies from both cases.

I love setup like this, but again is it too straightforward for a question 5?

Q6. Sometimes I am tired with "general case" combinatorics question. It is always nice to have a question where numbers are concrete and the answer is strongly relying on that number. Last year we had an "algo" problem using the number 2024 but such parameter can be so easily adjusted to any natural numbers. This 2025 is however different because it is a square number! This is probably the once in a lifetime chance for us to have a square number year when the next one is 91 years away. I am glad that they utilized the chance to put up a question like this.

And honestly I have zero idea on the question. The actual tiling is simple and beautiful. Considering how Q4 and 5 are easy I have again 3+ hours for Q6. In that case there will be a slim chance for me to come up with that optimal tiling, although I doubt if it helps in any way other than ending up with 1/7 or so. Erdos-Szekeres sounds way too advanced for IMO but almost every solution seem to use a version of that. It's a beautiful Q6, although I hope to see a real and slightly more elementary solution.

In overall IMO 2025 is special in many ways:
- An actual solvable Q3 and a Q4 that looks messier than Q3.
- Solvable questions can be done quickly allowing time to solve Q2,6
- 2 number theory problems in similar field (divisibility, power chasing)
- Ending with a "square" problem (kind of reminding me the 1988 Q6 which is also a "square" problem although 1988 isn't a square xD)

I really enjoy the problems this year, can't say anything further. I seemed to keep complaining Simon Marais problems but never did the same to IMO or Putnam. Perhaps this is the magic of collaborative efforts, hm?

Oh well, we will see again in 2026.

Update: I read news about OpenAI and google claimed that their ai made good progress with "gold medal" at 35 points. But is it that big of a progress though?

I am not sure if we can solve those question using commercial LLMs, but I would not doubt their ability to compare the score against historical performances:
Q3 is indeed much, much easier than historical mean. Comparing against 2005-2024 data, this is +2.3 in Z score and the highest in 20+ years. This is not only by students getting partial marks (for example, I imagine that knowing $f(p) = 1$ for large enough primes is easy and will get you 1/7 for sure), but also by astonishingly high 7/7 rate at 15%. 

Q5 is also fairly easy as expected with a Z-score of +1 in 2005-2024 data although it seemed to trend easier since 2018 or so:
In overall, Q1-Q5 are all easier than average. Q6 is harder than usual (0.143, Z-score of -0.85), and is among the hardest Q6 historically in terms of average score. But this is fine because Q6 is here to serve as the ultimate challenge anyway. 

The takeaway is, this set of problems is an outlier vs past difficulty, and has a significant gap between mid-hard problems and extremely hard problems. The gap of difficulty is precisely the right difficulty of questions where LLM progressed enough to be able to solve. The lack of such question renders OpenAI and google's "claim in progress" hollow.

Oh by the way the 1988 Q6 average score is 0.6 -- not exactly hard in this regard. Not the legendary level hard at least ;)

No comments:

Post a Comment