wilzuv

About:
Show more
Forecasting Activity
Forecasting Calendar
No forecasts in the past 3 months
 

Past Week Past Month Past Year This Season All Time
Forecasts 0 0 4 4 4
Comments 0 0 3 3 3
Questions Forecasted 0 0 1 1 1
Upvotes on Comments By This User 0 0 0 0 0
 Definitions
New Badge
wilzuv
earned a new badge:

Active Forecaster

New Prediction
wilzuv
made their 3rd forecast (view all):
Probability
Answer
15% (+14%)
Less than or equal to 70
40% (+37%)
Between 71 and 74, inclusive
30% (-5%)
Between 75 and 78, inclusive
10% (-31%)
Between 79 and 82, inclusive
5% (-15%)
More than or equal to 83
Why do you think you're right?

GPT - 5 was such a small improvement. I still think they will drop an upgrade this year due to competition pressure but it will likewise be just a small improvement. There is also a fair chance they only announce product upgrades which would mean status quo -> 70 on benchmark. 

Files
Why might you be wrong?

- They could have made some breakthrough that is yet to be leaked (I find it unlikely)

- They won’t necessarily publish anything new that would affect the index score.


Files
New Prediction
Why do you think you're right?

Recent months trends indicates upward pressure. 

Files
Why might you be wrong?
The historical data still fluctuates quite a bit.
Files
New Prediction
wilzuv
made their 2nd forecast (view all):
Probability
Answer
1% (0%)
Less than or equal to 70
3% (0%)
Between 71 and 74, inclusive
35% (0%)
Between 75 and 78, inclusive
41% (0%)
Between 79 and 82, inclusive
20% (0%)
More than or equal to 83
Confirmed previous forecast
Files
New Badge
wilzuv
earned a new badge:

My First Question

Congratulations on making your first forecast!
New Badge
wilzuv
earned a new badge:

Active Forecaster

New Prediction
wilzuv
made their 1st forecast (view all):
Probability
Answer
1%
Less than or equal to 70
3%
Between 71 and 74, inclusive
35%
Between 75 and 78, inclusive
41%
Between 79 and 82, inclusive
20%
More than or equal to 83
Why do you think you're right?

- Trend extrapolation
- They have at least one major release right around the corner, potentially two this year. 

- Benchmark saturation 
- Improved scaffolding of reasoning agents

Files
Why might you be wrong?

- The release is super delayed for some reason

- No architectural improvements
- Safety concerns limit progress

Files
Files
Tip: Mention someone by typing @username