原创 · 深度 · 长读
深度专栏
潜龙编辑部的原创长读专栏。我们不追逐热点,只关心一个问题:这件事三年后会被怎样回看。
本周精选部署指南
Stop Testing AI on 'Vibes': The New Science of LLM Evaluation
Imagine buying a brand-new car that was safety-tested purely based on how "smooth" the ride felt to the factory inspector. You probably wouldn't feel safe...