I now generally use Claude for delicate things and Gemini for donkey work. Occasionally, I will also use Deepseek, which is still a good model. I have high hopes for Deepseek V4.
But no! I think we, as humans, ultimately have to take responsibility for determining the truth as best we can. The machines can help us, but it needs to be our judgement in the end.
Building better institutions and cultural change is really what we need. It would be nice to be able to trust experts again...
Always have to appreciate someone capable of roundly admitting having been wrong, respect!
Thanks! I'm often wrong!
Which models did you use?
For what?
Sorry which LLMs did you use for the replication? Eg opus 4.5, GPT 5.3, etc.?
And did you use an agent (eg claude code/ codex?) or build your own harness?
I think this is fantastic work, but the limitations may fade quickly given model upgrades and better harnesses?
In other words, the dream of all human knowledge may live on!
Haha. OK!
I now generally use Claude for delicate things and Gemini for donkey work. Occasionally, I will also use Deepseek, which is still a good model. I have high hopes for Deepseek V4.
But no! I think we, as humans, ultimately have to take responsibility for determining the truth as best we can. The machines can help us, but it needs to be our judgement in the end.
Building better institutions and cultural change is really what we need. It would be nice to be able to trust experts again...