From Single LLM Call to Multi-Agent System: The Full Spectrum
When you first start building with large language models, the architecture looks simple: send a prompt, get a response. But as the tasks get harder — longer, more ambiguous, requiring external data, or needing to self-correct — that simple pattern breaks down. Each new challenge has spawned a new design pattern.
Today there are seven distinct patterns in common use, ranging from a single LLM call to orchestrated networks of specialized agents. They are not interchangeable. Each one suits a different class of problem, and choosing the wrong one either underserves the task or wastes compute and money.
Here is how they differ, when to use each one, and what goes wrong when you pick the wrong one.
1. Single LLM Call
SimplestOne prompt goes in, one response comes out. No memory, no tools, no loops. The model processes the input and returns a result in a single pass.
This covers most of what people initially do with LLMs: summarizing a document, translating a paragraph, drafting a short email, classifying a support ticket. If the task fits entirely within the model's context window and does not require external information, a single call is the right choice.
The failure mode is trying to push tasks here that are too complex. A single call cannot update a database, cannot look up current information, cannot check its own output for errors, and cannot handle tasks that are genuinely multi-step. If the output frequently needs correction or the task requires information not in the prompt, you have outgrown this pattern.
Use when: the task is self-contained, the answer is in the input, and a single response is sufficient.
2. Chain
Structured · LinearA chain connects multiple LLM calls in a fixed sequence. The output of one step becomes the input of the next. Each step has a specific, narrow responsibility.
A practical example: step one extracts key facts from a customer complaint, step two determines the category and priority, step three drafts a reply using both the facts and the category. Each step is simpler and more accurate than asking one model to do all three at once.
The advantage is that each call can be optimized independently — different prompts, different models, different parameters. Complex tasks get decomposed into manageable pieces. The limitation is that the flow is fixed. If step two produces an unexpected result, the chain has no way to adapt — it just passes that bad output forward.
Use when: the task naturally decomposes into sequential, well-defined steps with predictable outputs at each stage.
3. Workflow
Structured · ConditionalA workflow is a structured process that mixes LLM calls with non-LLM steps: API requests, database lookups, conditional logic, rule-based routing, and human review checkpoints. The flow is defined upfront by the developer, not decided by the AI at runtime.
In a workflow, the AI handles the parts that require language understanding — reading a document, extracting fields, drafting text. Everything else is handled deterministically: if the invoice amount exceeds a threshold, route it for approval; if the category is billing, assign it to the billing team. Rules are cheaper, faster, and more auditable than asking an LLM to make those decisions.
The key distinction from a chain: a workflow can branch, loop, call external systems, and include humans in the process. The key distinction from an agent: the structure is predetermined. The AI is embedded at specific steps, not in control of the flow.
Use when: the process has known decision points, involves external systems, or requires human approval at certain stages.
4. ReAct Agent
Autonomous · Tool Use · LoopReAct stands for Reasoning and Acting. The model interleaves two behaviors in a loop: it reasons about what to do next, then takes an action (calling a tool or API), then observes the result, then reasons about what to do next. This continues until it has enough information to produce a final answer.
Tools are the key enabler. A ReAct agent can search the web, run code, query a database, send an email, or call any API it has been given access to. The model decides which tool to use, with what inputs, and when to stop.
This makes ReAct agents genuinely capable for open-ended tasks: research something I don't know in advance, pull data from multiple sources and combine them, or complete a task that requires back-and-forth with external systems. The model is in the driver's seat.
The tradeoff is reliability. Because the model decides the path, it can take wrong turns, call tools unnecessarily, get stuck in loops, or make poor judgment calls. Production systems that use ReAct agents need guardrails: maximum step counts, tool call validation, and human review for consequential actions.
Use when: the task requires dynamic tool use and the exact steps cannot be predetermined.
5. Reflection Agent
Self-correcting · Quality LoopA reflection agent reviews and improves its own output. The core loop is: generate a response, critique it against a set of criteria, revise based on the critique. This can run for multiple iterations until the output meets a quality bar.
The critique step is often a separate model call with a different prompt — one that specifically looks for errors, omissions, or weaknesses in the generated output. The revision step then addresses those issues. Some implementations use the same model for both; others use a smaller, faster model for critique and a larger one for generation.
Reflection works well for tasks where quality is hard to specify upfront but easy to recognize: writing that needs a specific tone, analysis that should cover certain angles, code that must meet certain standards. It also helps with factual accuracy — the critique step can explicitly check whether claims are supported by the source material.
The cost is latency and token usage: each loop adds time and calls. The loop also needs a stopping condition — a quality threshold, a maximum iteration count, or an explicit "done" signal from the critique.
Use when: output quality matters more than speed, and the criteria for good output can be expressed as a critique prompt.
6. Deep Agent
Long-horizon · Planner · ResearcherA deep agent is designed for tasks that require sustained, multi-step investigation over many iterations — far more than a standard ReAct agent would run. It is characterized by an explicit planning phase, a long research loop, and a final synthesis step that integrates everything gathered.
The planning phase is what distinguishes it. Before taking any action, the agent decomposes the goal into sub-questions and builds a research plan. As it works through that plan and encounters new information, it updates the plan — new sub-questions open up, old ones get answered or dropped. The loop can run for dozens of steps across many sources.
Deep agents are the architecture behind tools like AI research assistants that produce long, sourced reports from a single prompt. Given "analyze the competitive landscape for electric vehicle charging in Europe," a deep agent will plan, search, read, synthesize, look for gaps, search again, and eventually produce a structured report — doing in minutes what would take a human analyst hours.
The cost is significant: high latency (minutes, not seconds), high token usage, and errors that compound over a long chain. These agents need robust state management to track what has been read, careful deduplication to avoid processing the same sources repeatedly, and clear stopping conditions to know when enough research is enough.
Use when: the task requires comprehensive research across many sources and the answer cannot be known until the investigation is complete.
7. Multi-Agent System
Parallel · Specialized · ScalableResearch
Analysis
Writing
A multi-agent system coordinates multiple independent agents — each with its own role, tools, and context — to complete a task that no single agent could handle efficiently alone. An orchestrator breaks the goal into sub-tasks, dispatches them to specialized agents, and aggregates their outputs into a final result.
The specialization is the point. A research agent knows how to search and read. An analysis agent knows how to structure and interpret data. A writing agent knows how to produce readable prose. Each one is optimized for its narrow job. They can also run in parallel, which dramatically reduces total wall-clock time on complex tasks.
Multi-agent systems handle scale and complexity that would overwhelm a single agent. Coding assistants that simultaneously write code, run tests, check security, and generate documentation are multi-agent systems. So are enterprise automation platforms that handle intake, routing, processing, and notification across different departments.
The engineering cost is real: inter-agent communication, shared state management, error handling when one agent fails, and ensuring that agents do not contradict each other. These systems are powerful but require careful design. Start with a single agent and move to multiple agents only when you have a concrete reason — parallelism, specialization, or scale that genuinely demands it.
Use when: the task is large enough to benefit from specialization or parallelism, and a single agent becomes a bottleneck.
Choosing the right pattern
The seven patterns form a spectrum from simple and predictable to powerful and complex. More capability almost always means more cost, more latency, and more engineering effort. The right choice is the simplest one that reliably handles the task.
| Pattern | Control | Tools | Self-corrects | Best for |
|---|---|---|---|---|
| Single Call | Full | No | No | Contained, predictable tasks |
| Chain | Full | No | No | Multi-step tasks with fixed stages |
| Workflow | Full | Yes | No | Business processes with rules and APIs |
| ReAct Agent | Model | Yes | Partial | Open-ended tasks needing dynamic tool use |
| Reflection Agent | Model | Optional | Yes | High-quality output that must meet a bar |
| Deep Agent | Model | Yes | Partial | Long-horizon research and investigation |
| Multi-Agent | Shared | Yes | Yes | Complex, parallelizable, large-scale tasks |
A useful heuristic: start at the top of this list and move down only when you hit a real limitation. If a chain works, don't build a ReAct agent. If a ReAct agent works, don't build a multi-agent system. The simpler pattern is cheaper to run, easier to debug, and more reliable in production.
The one exception is quality requirements. If accuracy matters enough to justify the cost, a reflection loop is often worth adding even when simpler patterns technically produce an output — because technically producing an output and reliably producing a good one are different things.
Understanding these patterns is the foundation of building AI systems that actually work in production. The capability of any individual model matters less than how you structure the system around it.