89th
Accuracy Rank

404_NOT_FOUND

Nicolò
About:
Show more

-0.892834

Relative Brier Score

167

Forecasts

241

Upvotes
Forecasting Activity
Forecasting Calendar
 

Past Week Past Month Past Year This Season All Time
Forecasts 0 20 270 236 800
Comments 0 21 268 257 410
Questions Forecasted 0 20 47 36 66
Upvotes on Comments By This User 4 8 233 199 827
 Definitions
New Badge
404_NOT_FOUND
earned a new badge:

Power Forecaster - Nov 2025

Earned for making 20+ forecasts in a month.
New Badge
404_NOT_FOUND
earned a new badge:

Star Commenter - Nov 2025

Earned for making 5+ comments in a month (rationales not included).
New Prediction
Why do you think you're right?
  • FDA's approval process for de novo medical devices (the use of a novel technology wouldn't seem suited to a 510k application) is typically 12-14 months long
  • Only devices that are already in the approval pipeline have a chance of being approved before the end of March
  • Two such devices seem to exist. Their applications date back to earlier than May 2022 (Wysa) and earlier than January 2025 (PathChat DX). 
  • Typical FDA approval rate for the novo applications is lower than 50%

Both Wysa and PathChat could reach the end of the pipeline within March, yet the former has been under review for 3 years now, implying that there might be some concerns there. Similar concerns would apply to PathChat as well, so I'm expecting both a) a longer approval and b) a lower success rate.

This would seem to be confirmed by a recent DHAC meeting (as highlighted by @grainmummy here) where the FDA signalled the intention to create an ad-hoc regulatory framework for these new devices. This would likely take months and produce as a result a new set of requirements that manufacturer would be asked to meet, further extending the timeline. 

The creation of a new risk-based framework specifically for GenAI technologies is likely to take more than the 4 months that are left in this question, so the likelihood for approval of any llm-based device is very low.

I'm remaining higher than the crowd due to the existance of these two companies with products that have already been in the pipeline for enough time.

Files
Why might you be wrong?
  • The DHAC is only an advisory body, so the FDA might proceed with approval without implemening any new framework
Files
grainmummy
made a comment:
As you said, a balanced perspective should be considered, where the risks outweigh the benefits, but the benefits don't make it entirely impossible. General regulatory headwinds and past De Novo deadlines suggest a "No," but the presence of two strong candidates already in the pipeline and undergoing expedited review makes the likelihood of a "Yes" low.  Thank you so much for tagging me.
Files
New Prediction
Why do you think you're right?

TL;DR: We have no official data for any of the largest models. Epoch AI is guesstimating the total training compute and often basing new estimates on older ones. 

Rationale

Leading AI companies are not disclosing any data regarding the total training compute necessary for their models. Therefore, we have no official data for any of the largest models in the list.

Grok and GPT's total training compute is estimated. For Claude and Gemini, there is an estimate in the "Training compute notes" section, but it's not reported in the adjacent column that will be used for resolution.

What this means in practice is that we are currently forecasting the likelihood that Epoch AI's estimates for one model would surpass the required threshold

So, how is Epoch AI estimating these values? There appears to be no consistent methodology, but they use multiple approaches depending on which data is available. Here are a few examples:

  • In most cases, it's a formula based on the number of parameters the model has (which, in most cases, is not publicly disclosed and therefore estimated)
  • GPT-5's training compute is based on nothing solid. It's just a series of hypotheses of what the compute should be in relation to GPT-4 (whose compute notes read "this is a rough estimate based on public information, much less information than most other systems in the database")
  • Claude 3.7's total training compute is based on the claim that it took "a few tens of millions of dollars" to train. Epoch AI took the geometric average of $20-90 M and reverse-engineered the total compute based on hypotheses of the cost of training.
  • Most importantly, Grok 4's (our current benchmark model) is based on qualitative assumptions relative to Grok 3's training, which was based on estimates of a training duration of "approximately 3 months".

Overall, it seems just like a giant guesstimation game, which brings reasonable and plausible estimates, but might not be best suited to be the resolution criteria for a forecasting question.

Files
Why might you be wrong?

No change

Files
New Prediction
Why do you think you're right?
Slightly adjusting to the current crowd consensus.
Files
Why might you be wrong?

One key issue with this question is that it's challenging to update the prediction in a meaningful way regularly. New information is sporadic and is usually not available through mainstream media sources. Rather, one would need to read through the cryptic abstracts of highly domain-specific newly published papers to see if the content is relevant to the question and how big the effect could be. It seems to me that advances are incremental, and it's unclear how much they contribute to the overall outcome.

Files
New Prediction
Why do you think you're right?

Our own current biology appears as mirror-life to any mirror-life organism we might be able to create. The existential risk assumption is that newly created mirror-life bacteria would pose a threat to our ecosystems while also being immune to our natural defenses. It seems kind of contradictory to me, as it would make sense to assume that the threat would be at least reciprocal.

The idea seems to be that mirror-life can thrive and survive by eating/infecting non-mirror-life organisms, while our collective terrestrial biology would be incapable of affecting it.

The more I think about this, the less the alarmists seem to make sense, and I'm starting to think that policymakers will soon have similar doubts.

Global searches for "mirror life" on Google Trends. Are we already past peak hype? If so, the likelihood that any policy decision is made would be greatly reduced.


The only recent mention of mirror-life in mainstream media is in a CBC review of the Plur1bus drama TV series, where it, somehow, occupies the first 6 paragraphs of the article.

Files
Why might you be wrong?

No change

Files
New Prediction
Why do you think you're right?

Our own current biology appears as mirror-life to any mirror-life organism we might be able to create. The existential risk assumption is that newly created mirror-life bacteria would pose a threat to our ecosystems while also being immune to our natural defenses. It seems kind of contradictory to me, as it would make sense to assume that the threat would be at least reciprocal. 

The idea seems to be that mirror-life can thrive and survive by eating/infecting non-mirror-life organisms, while our collective terrestrial biology would be incapable of affecting it. 

The more I think about this, the less the alarmists seem to make sense, and I'm starting to think that policymakers will soon have similar doubts.

Global searches for "mirror life" on Google Trends. Are we already past peak hype? If so, the likelihood that any policy decision is made would be greatly reduced.  

The only recent mention of mirror-life in mainstream media is in a CBC review of the Plur1bus drama TV series, where it, somehow, occupies the first 6 paragraphs of the article.

Files
Why might you be wrong?

No change

Files
New Prediction
Why do you think you're right?
Confirmed previous forecast
Files
Why might you be wrong?

No change

Files
New Prediction
404_NOT_FOUND
made their 4th forecast (view all):
Probability
Answer
80% (+18%)
Less than $1 billion
15% (-8%)
More than or equal to $1 billion but less than $1.2 billion
4% (-6%)
More than or equal to $1.2 billion but less than $1.4 billion
1% (-3%)
More than or equal to $1.4 billion but less than $1.6 billion
0% (-1%)
More than or equal to $1.6 billion
Why do you think you're right?
$0.270 billion so far, with one month left in 2025. In the entirety of 2024, there were $0.7 billion in investments, implying that for the outcome to fall outside the lowest bin, we would need to see investments in the upcoming year surpassing those of the previous one. This is definitely possible, but, at the moment, the trend is markedly falling in the opposite direction.
Files
Why might you be wrong?

No change

Files
New Prediction
404_NOT_FOUND
made their 4th forecast (view all):
Probability
Answer
43% (+8%)
Less than $350 million
33% (+5%)
More than or equal to $350 million but less than $450 million
17% (-6%)
More than or equal to $450 million but less than $550 million
5% (-5%)
More than or equal to $550 million but less than $650 million
2% (-2%)
More than or equal to $650 million
Why do you think you're right?
  • Avg 2024: $25.4 m/month
  • Avg 2025:  $14.6 m/month

Applying a heavier weighting to the most recent data, and with just over 13 months left, the lowest bin remains the most likely one. 

Files
Why might you be wrong?

No change

Files
Files
Tip: Mention someone by typing @username