404_NOT_FOUND
made their 1st forecast (view all):
Probability
Answer
8%
Less than or equal to 66
10%
Between 67 and 70, inclusive
20%
Between 71 and 74, inclusive
29%
Between 75 and 78, inclusive
33%
More than or equal to 79
Why do you think you're right?
  • OpenAI's o4-mini: 70
  • DeepSeek's R1: 68

Despite the tiny difference between the two models, the current crowd consensus suggests there's a 38% chance OpenAI will score more than 82 points by the year's end, while DeepSeek only has a 20% chance they will reach at least 79. Superficially, this doesn't make much sense, but there are two plausible explanations:

  1. The bin structure is creating the perfect framework for the anchoring effect.  The top and bottom bins appear unlikely purely because they are at the ends of the spectrum, irrespective of what their values actually communicate
  2. OpenAI is expected to outperform DeepSeek with its 2025 releases.

This second explanation appears more rational, yet I'm unconvinced. It's not like OpenAI is years ahead of the pack. On the contrary, depending on the benchmark and the timing of the models' releases, Google, Anthropic, and even DeepSeek can routinely take the top spot on the various leaderboards.

I'm starting with a conservative forecast that assumes there will be one or two releases during 2025, which are likely to make the index grow by a few points. There's some rumoring that DeepSeek will soon release R2, which would make things clearer to understand.

Files
Why might you be wrong?

Without entirely repeating here my comment from the OpenAI question on benchmarks being moving targets, I'll leave a hypothetical of what could happen.

In June 2025, DeepSeek's R2 is released. It immediately skyrockets to the top of Artificial Analysis's Intelligence Index leaderboard with a shocking score of 88 points. The news explodes in the mainstream media (and Nvidia's stock plummets). OpenAI decides to rush the release of a new model, which is quickly found to have an Intelligence Index of 90. Artificial Analysis decides the existing benchmark is becoming too easy and isn't representative anymore of the state of AI advancement. They decide to update the benchmark structure, and all scores are returned to the vicinity of 50 points.

How unlikely is the scenario above? Well, the scenario above is not dissimilar from what happened in late 2024. o1 had an Intelligence Score of 90, the benchmark was updated, and the model now sits at a meager 62.

Files
ctsats
made a comment:

Thank you Nicolò.

I think the scenario you describe in your Why might you be wrong? section is indeed far from unimaginable, but I am not sure it is very likely. To be precise, the change in their benchmark came not in late 2024 but just in February 2025: from their snapshots at the Wayback Machine, we can see that up until Feb 10 (2025) they had something else in place called Quality Index; the Artificial Analysis Intelligence Index came in sometime between Feb 11 and Feb 17.

So, given that the last change in the benchmark was in mid-February, I am not sure how likely it is for it to change again in the next ~7 months. And I think that, should something like that happens, the question should be voided (not much point trying to forecast with moving goalposts...)

Files
Files
Tip: Mention someone by typing @username