潜龙 QianLong · 中文 AI 内容与工具平台

We are currently evaluating artificial intelligence the same way we evaluate high school students: by making them sit for standardized tests. Modern large language models (LLMs) are breaking records on benchmarks like MMLU, HumanEval, and MATH. Yet, when you actually use them, the experience often feels less like collaborating with a genius and more like interacting with a highly articulate vending machine. You insert a prompt, and it drops a fully formed, take-it-or-leave-it answer into the tray.

What is missing from this equation? A sense of purpose.

Currently, most chatbots are optimized for one-pass generation. But human interaction rarely works this way. Think about planning a group vacation. You don’t walk up to a travel agent, state a single sentence about your desires, and expect a flawless, unchangeable itinerary in return. Real planning requires iterative bargaining—a multi-round exchange where preferences are weighed, constraints are discovered, and compromises are made.

Computer science pioneer Terry Winograd once noted that all language use can be thought of as a way of activating procedures within the hearer. Every utterance is a deliberate action meant to alter the other person's understanding of the world. By this definition, a conversation is a collaborative game. But today’s LLMs aren't playing a game with us; they are just trying to predict the next word in a sequence.

This lack of purposeful, multi-turn dialogue isn't just a philosophical issue; it’s a practical bottleneck. Take software engineering, a field where AI is heavily deployed. Benchmarks measure how well an AI can generate a block of code in a single attempt. But to automate solving real-world GitHub issues, an AI cannot act alone. It needs to pause and ask human engineers for missing documentation, clarify ambiguous requirements, or request a hand when stuck—much like a junior developer in a pair-programming setup.

To understand why AI behaves this way, we have to look under the hood. In the 1970s, early dialogue systems like ELIZA, PARRY, or Roger Schank’s "restaurant script" were explicitly designed with conversational steps in mind. Modern LLMs, however, are trained on gigantic, chaotic oceans of internet text—news, books, and code. To make them act like chatbots, developers artificially wrap this text-predicting engine in chat templates (using tags like <system> or <INST>) and train them to be helpful and harmless via reinforcement learning (RLHF). They are fundamentally text-completers wearing a conversational mask.

Unlocking true human-AI collaboration requires moving beyond the vending machine model. The future lies in AI that embraces turn-taking and builds long-term memory. Imagine an assistant that doesn't just answer isolated questions, but actively reads your specific Twitter feeds or Slack channels, drafts emails, and learns your exact tone by studying the edits you make over time.

The ultimate test of artificial intelligence won't be how quickly it can spit out a perfect essay on the first try. It will be whether it knows how to ask the right questions to help you write a better one yourself.

Key Points

Standardized AI benchmarks (MMLU, MATH) measure static intelligence, not interactive problem-solving.
Real-world tasks, from travel planning to coding, require multi-round 'purposeful dialogue' rather than one-pass answers.
Modern LLMs are fundamentally next-token predictors adapted for chat via formatting tags and RLHF, unlike 1970s AI which was explicitly scripted for interaction.
The future of AI involves turn-taking, long-term memory, and continuous adaptation to user preferences.

Why It Matters

As AI becomes deeply integrated into daily workflows, understanding its inability to engage in goal-oriented dialogue explains current user frustrations and highlights the necessary shift from AI as a tool to AI as a collaborative partner.

Sources:

What's Missing From LLM Chatbots: A Sense of Purpose — The Gradient

潛

本文完

潜龙编辑部 · 2026/7/14

The Vending Machine Problem of Modern AI

Key Points

Why It Matters

更多专栏

The Rise of the ChatGPT Flyer: AI's Awkward Physical Era

The Accidental Legacy of Apple's Cancelled Car

Inside the AI Mind: Unlocking the Secrets of 'J-Space'