From my X bookmarks:
From my X bookmarks: 认真读完了OpenAI 研究员 Noam Brown 今天的长帖,一个被行业严重低估的现实。 LLM 的真实能力天花板,远高于当前任何 benchmark 所显示的水平。 原因,是给它的test-time compute太少了。而随着模型迭代,这个问题会越来越突出。 GPT-5.5 发布初期,benchmark 只比 5.4 略有提升,很多人觉得没那么惊艳。
My take: this belongs in Agent Infrastructure. Key signal here is around Autoresearch, Philosophy, AgentInfrastructure. Worth saving because it can be reused as a building block for product, protocol, or agent-economy thinking—not because it is merely trending.