Agents — AI Field Guide

Most of us have used ChatGPT, Gemini, or Claude to answer a question, summarize a document, or write a piece of code. We type something in, get a response back, and move on. This is a language model doing what it does best: predicting useful text based on the input it has been given.

An AI agent is something different: it can think, act, and remember. Instead of just answering a question, it figures out what steps to take, uses tools to carry them out, and adjusts its approach based on what happens along the way.

Think of it this way: if we hired a new engineer and only let them talk, never touch a keyboard, open a browser, or read documentation, they would be pretty limited. That is essentially an LLM on its own. Once we give that engineer access to the codebase, a terminal, company docs, and the ability to ask clarifying questions, they become far more capable. At that point, they are acting more like an agent.

In this article, we’ll cover what AI agents are, how they differ from plain language models, what components make them work, and when they are worth using.

What is an AI agent in plain terms?

An AI agent is a software system that uses a language model as its core reasoning engine, combined with the ability to take actions in the real world. Those actions might include:

Searching the web
Querying a database
Calling an API
Reading or writing files
Sending an email
Running code

The key distinction is autonomy. A plain LLM responds to a single prompt. An agent receives a goal and then independently decides what steps to take, executes those steps, observes the results, and continues until the goal is met (or it determines the goal cannot be met).

The new hire analogy

Imagine we hire a new software engineer. On their first day, we would not expect them to know everything, but we would expect them to:

Read documentation to understand the codebase
Use tools like an IDE, terminal, and browser
Ask questions when something is unclear
Break down tasks into smaller steps
Check their work before saying they are done
Learn from mistakes and adjust their approach

An AI agent works the same way. It has a base of knowledge (the language model), access to tools, and an orchestration layer that manages the loop of thinking, acting, and observing.

LLM vs. agent: what’s the difference?

This is the most important distinction to internalize early. An LLM is the model itself: it receives context, predicts a useful response, and returns text. An agent is the system wrapped around that model: it can choose tools, read observations, keep track of progress, and stop once the goal is met.

Another way to say it: an LLM answers from the context it has. An agent can go get more context or take an action before answering.

Aspect	LLM (alone)	AI Agent
What it does	Generates text based on a prompt	Pursues a goal through multiple steps
Interaction	Single turn (or multi-turn chat)	Autonomous loop of thought and action
Tools	None - text in, text out	Can call functions, APIs, search, etc.
Memory	Limited to context window	Can persist information across steps
Decision-making	Responds to what you ask	Decides what to do next on its own
Error handling	Gives you an answer (right or wrong)	Can observe errors and retry with a new approach

A helpful mental model:

LLM = Brain
Agent = Brain + Hands + Memory

The brain (LLM) does the reasoning. The hands (tools) let it take action. The memory (state management) lets it keep track of what has happened and what still needs to be done.

A concrete example

LLM alone: We ask “What is the current price of GOOG stock?” The model might say “As of my last training data, it was around $140” which could be months out of date.

Agent: We ask the same question. The agent thinks “I need current stock data, I should use a finance API.” It calls a stock price tool, gets the live price, and returns an accurate answer. If the API call fails, it might try a different data source.

This loop (think, act, observe, repeat) is what makes an agent an agent.

Core components of an AI agent

Every agent system, regardless of framework, has three fundamental components:

1. the model (the brain)

This is the language model at the center of the agent. It handles:

Understanding the user’s goal
Reasoning about what steps to take
Deciding which tool to use (and with what parameters)
Interpreting the results of tool calls
Generating the final response

The model you choose matters. Harder tasks (multi-step reasoning, complex code generation, nuanced decision-making) benefit from frontier models like Gemini or Opus. Simpler tasks (classification, extraction, straightforward Q&A) can use lighter models like Gemini Flash to save cost and latency.

2. tools (the hands)

Tools are what let an agent interact with the world beyond text generation. Without tools, an agent is just a chatbot. With tools, it can:

Retrieve information: Search the web, query a database, read a file
Take actions: Send an email, create a ticket, deploy code
Compute: Run calculations, execute code, transform data

Tools are typically defined as functions with clear names, descriptions, and parameter schemas. The model decides when and how to call them. We will cover tools in depth in Lesson 3.

3. the orchestration layer (the control loop)

This is the glue that connects the model and tools into a functioning system. The orchestration layer manages:

The agent loop: Think -> Act -> Observe -> Repeat
State management: What has happened so far, what context the model needs
Error handling: What to do when a tool call fails
Termination conditions: When to stop looping and return a result
Guardrails: Safety checks, output validation, scope limits

The simplest orchestration pattern looks like this:

Receive user goal
Send goal + available tools to the model
Model returns either:
1. A final answer -> Return to user
2. A tool call -> Execute the tool, add result to context, go to step 2

This is often called a ReAct loop (Reasoning + Acting). More sophisticated patterns exist that we may explore in later lessons, but most agent systems follow the same core pattern: the model reasons about what to do next, tools execute actions, and the orchestration layer manages the flow between them.

User Goal
    |
    v
+-------------------+
| Orchestration     |
| Layer             |
|                   |
|  +-------------+  |
|  |   Model     |  | "I need to search for X"
|  |  (Brain)    |--+--->  Tool Call
|  +-------------+  |         |
|        ^          |         v
|        |          |  +-------------+
|        +----------+--+   Tools     |
|     Tool results  |  |  (Hands)   |
|                   |  +-------------+
+-------------------+
    |
    v
Final Response

A taxonomy of agent systems

Not all agents are created equal. It’s helpful to think about agent systems on a spectrum of autonomy and capability, from Level 0 through Level 4.

Level 0: basic reasoning (simple LLM)

What it is: A language model answering questions with no tools or memory.

Example: You ask Gemini “Explain the CAP theorem” and it gives you a clear explanation from its training data.

Capabilities:

Text generation and comprehension
Single-turn or multi-turn conversation
No external data access
No ability to take actions

When it works well: General knowledge questions, creative writing, brainstorming, summarization of provided text.

Level 1: connected problem-solver (tool-using agent)

What it is: A model that can call tools to retrieve information or perform simple actions. This is where we cross the line from “chatbot” to “agent.”

Example: A customer support bot that can look up order status by calling your order API, or a coding assistant that can search documentation.

Capabilities:

Everything in Level 0
Function calling (tools)
Retrieval-Augmented Generation (RAG) for grounding in real data
Simple single-step or few-step task completion

When it works well: Tasks that require current data, API integrations, straightforward workflows with a small number of steps.

Level 2: strategic agent (autonomous with context)

What it is: An agent that can plan multi-step approaches, maintain context across a longer session, and adapt its strategy based on intermediate results.

Example: A research agent that takes a question like “Compare the top 3 cloud providers on serverless pricing,” then searches for pricing pages, extracts data, builds a comparison table, and summarizes findings.

Capabilities:

Everything in Level 1
Multi-step planning and execution
Dynamic replanning when things change
Working memory across steps
Self-evaluation (“Is this result good enough?“)

When it works well: Research tasks, complex troubleshooting, multi-step workflows where the path depends on intermediate results.

Level 3: collaborative multi-agent system

What it is: Multiple specialized agents working together, each handling a different aspect of a larger task. One agent might coordinate the others.

Example: A software development system where one agent writes code, another writes tests, a third reviews the code, and an orchestrator agent manages the workflow.

Capabilities:

Everything in Level 2
Agent-to-agent communication
Specialized roles and delegation
Parallel execution of subtasks
Consensus or voting mechanisms for quality

When it works well: Complex projects that benefit from specialization, tasks requiring multiple perspectives or quality gates.

Level 4: self-evolving agent

What it is: An agent that can reflect on its own performance, learn from past runs, update its strategies, and improve over time without manual intervention.

Example: A deployment agent that tracks which rollback strategies worked best historically and adjusts its approach for future deployments.

Capabilities:

Everything in Level 3
Long-term memory and learning
Strategy optimization based on past outcomes
Self-modification of prompts or tool selection
Performance monitoring and self-correction

When it works well: Recurring tasks where patterns emerge over time, systems that benefit from continuous improvement.

Summary table

Level	Name	Key Feature	Example
0	Basic Reasoning	Text in, text out	Chatbot, Q&A
1	Connected Problem-Solver	Tool use	Order lookup bot
2	Strategic Agent	Multi-step planning	Research assistant
3	Collaborative Multi-Agent	Agent coordination	Dev team simulation
4	Self-Evolving	Learning from experience	Adaptive ops agent

Most production agent systems today operate at Level 1 or Level 2. Levels 3 and 4 are active areas of research and are becoming more practical, but they add significant complexity. Start simple and move up only when you have a clear reason to.

Interactive example

Click a level to see its architecture, autonomy, and capabilities.

Architecture

Level 0 — Basic Reasoning

Autonomy

10%

What it is

A language model answering questions with no tools or memory.

Key capabilities

Text generation and comprehension
Single or multi-turn conversation
No external data access
No ability to take actions

Best for

General Q&ACreative writingBrainstormingSummarization

When to use agents vs. when a simple prompt is enough

Agents are powerful, but they come with added complexity, cost, and latency. Not every problem needs one. Here’s a practical guide to help decide.

Use a simple prompt when:

The task can be completed in a single step
No external data or actions are needed
The answer exists within the model’s training data
Low latency is critical (agents add multiple round trips)
The cost of multiple model calls is not justified

For example, a simple prompt works well for summarizing a paragraph, converting JSON to a Python dataclass, writing a regex that matches email addresses, or explaining the difference between TCP and UDP.

Use an agent when:

The task requires multiple steps that depend on each other
External data or tools are needed (APIs, databases, search)
The task requires real-time or current information
The approach may need to change based on intermediate results
The task involves taking actions (not just generating text)

For example, an agent works well for finding the three most recent bugs in an issue tracker and drafting a summary for standup, checking a customer’s shipping status and sending an update email, researching competitors’ pricing, or reviewing a pull request and suggesting improvements after tests run.

The decision flowchart

Does the task require external data or actions?
  |
  +-- No --> Can the model answer from its training data?
  |            |
  |            +-- Yes --> Use a simple prompt
  |            +-- No  --> Consider RAG (retrieval) first, then an agent
  |
  +-- Yes --> Is it a single tool call?
               |
               +-- Yes --> A simple function-calling setup may suffice
               +-- No  --> Use an agent with orchestration

Cost and latency considerations

Every step in an agent loop involves a model call. A 5-step agent workflow means 5 or more calls to the model, plus tool execution time. This adds up:

Latency: Each model call takes 1-10 seconds depending on the model and prompt size. A 5-step agent might take 15-30 seconds.
Cost: Each model call costs tokens. Agent workflows can use 10-50x more tokens than a single prompt.
Reliability: More steps means more chances for errors or hallucinations.

The engineering principle is the same as anywhere else: use the simplest approach that gets the job done.

Key takeaways

An AI agent is a system that uses a language model to reason, tools to act, and an orchestration layer to manage the loop between thinking and doing.
LLM = brain. Agent = brain + hands + memory. The model provides reasoning. Tools provide action. The orchestration layer provides control flow.
Agents exist on a spectrum from simple tool-using assistants (Level 1) to self-evolving systems (Level 4). Start at the lowest level that solves your problem.
Not everything needs an agent. If a single prompt gets the job done, use a single prompt. Add agent capabilities only when the task genuinely requires tools, multi-step reasoning, or real-world actions.
The core loop is simple: Receive goal -> Think about what to do -> Use a tool -> Observe the result -> Repeat until done.

What is an AI agent in plain terms?

The new hire analogy

LLM vs. agent: what’s the difference?

A concrete example

Core components of an AI agent

1. the model (the brain)

2. tools (the hands)

3. the orchestration layer (the control loop)

A taxonomy of agent systems

Level 0: basic reasoning (simple LLM)

Level 1: connected problem-solver (tool-using agent)

Level 2: strategic agent (autonomous with context)

Level 3: collaborative multi-agent system

Level 4: self-evolving agent

Summary table

When to use agents vs. when a simple prompt is enough

Use a simple prompt when:

Use an agent when:

The decision flowchart

Cost and latency considerations

Key takeaways

Related terms