Thursday, May 14, 2026

The Anatomy Of An AI Coding Agent, Part 8

 # The Anatomy Of An AI Coding Agent, Part 8


## The Agent Loop: Observe, Plan, Act, Verify, Repeat


If there is one idea that separates an AI coding agent from a chatbot, it is the loop.


A chatbot answers. An autocomplete system predicts the next piece of code. An agent keeps going. It observes the current state, decides what matters, chooses an action, uses a tool, reads the result, updates its understanding, and either continues or stops.


That loop is why tools like Cursor, Claude Code, Codex CLI, and similar systems feel different from earlier coding assistants. The model still matters, but the behavior comes from repeated cycles of perception, decision, action, and feedback.


The agent loop is also where many failures happen. Agents get lost when they observe the wrong thing, plan too little, act too broadly, misread tool output, or keep going after the evidence says they should stop.


Understanding the loop makes agents easier to use, evaluate, and trust.


## The Simple Version


At a high level, the loop looks like this:


```text

Observe -> Orient -> Plan -> Act -> Verify -> Decide -> Report

```


In a real coding session, that might look like:


1. Read the user's request.

2. Inspect relevant files, errors, tests, or diffs.

3. Build a working theory of the problem.

4. Decide the next useful action.

5. Edit a file, run a command, search the repo, or ask a question.

6. Read the result.

7. Decide whether to continue, revise, or stop.

8. Summarize what happened.


This is not magic. It is the same practical loop engineers use every day. The difference is that the agent can run through many small cycles quickly.


## Observe: What Is The Current State?


The loop starts with observation. The agent needs to understand what is being asked and what state the world is in.


Observation can include:


- The user's prompt.

- Open files and selected code.

- Repository search results.

- Diagnostics from the editor.

- Git diffs and current branch state.

- Terminal output.

- Test failures.

- Documentation or issue descriptions.

- Prior conversation context.


For example, if the user says:


```text

Fix the failing login test.

```


the agent should not immediately edit authentication code. It should first observe the actual failure. Which test is failing? What is the error? Did the failure start after a recent change? Is the test failing locally or only in CI? Is the visible file even related?


Bad observation leads to bad work. If the agent reads the wrong test, confuses two similarly named modules, or assumes the open file is relevant when it is not, every later step in the loop is built on weak ground.


Good agents observe before they help.


## Orient: What Matters?


Observation gathers information. Orientation decides what matters.


This step is easy to miss because it often happens inside the model's reasoning. But it is one of the most important parts of agent behavior.


Suppose the agent sees a failing test, a recent diff, and three files with similar names. It has to decide which details are signal and which are noise. Is the failure caused by the current branch? Is a generated file stale? Is a test fixture wrong? Is the product behavior ambiguous?


Orientation is where the agent forms a working model:


- This looks like a frontend validation bug.

- The backend behavior appears unchanged.

- The failing test is probably a regression test for the intended behavior.

- The repository already has a helper for this permission check.

- The safest change is likely in the shared policy layer, not the UI.


This working model may be wrong. That is fine if the agent treats it as a hypothesis rather than a fact.


Good orientation is provisional. The agent should be willing to revise it as soon as new evidence arrives.


## Plan: What Is The Next Useful Step?


Planning does not always mean writing a long checklist. In an agent loop, planning often means choosing the next useful step.


For a small task, the plan might be:


```text

Read the failing test, inspect the implementation, patch the bug, rerun the test.

```


For a larger task, the plan might be more explicit:


```text

First map the existing authorization flow.

Then identify the shared permission helper.

Then add the new rule.

Then add tests for admin, editor, and viewer roles.

Then run the focused test package.

```


Good planning controls blast radius. It keeps the agent from editing too much too soon.


The best agents plan at the level the task deserves. They do not stop to write a project plan for a typo. They also do not dive into a multi-file security change without explaining the approach.


Planning should answer three questions:


- What am I trying to learn or change next?

- Why is this the right next step?

- What would make me stop or revise?


That third question matters. A plan without stopping conditions can turn into wandering.


## Act: Use A Tool


The action step is where the agent touches the world.


Actions can include:


- Searching for code.

- Reading a file.

- Editing a file.

- Running a test.

- Running a formatter.

- Opening a browser.

- Calling an API.

- Asking the user a clarifying question.


This is where the agent becomes more than a model. It is no longer just generating text; it is operating inside a development environment.


But action should be scoped. A good agent does not rewrite five files when one helper change would do. It does not run a broad command when a focused test provides enough signal. It does not install a dependency when the standard library or existing project code is sufficient.


The action should match the plan. If the plan is to investigate, the agent should not edit. If the plan is to make the smallest safe change, the diff should be small. If the plan is to verify behavior, the tool result should provide evidence.


Many agent failures are action failures. The agent uses the wrong tool, edits before reading, runs an unsafe command, or changes unrelated code. Guardrails exist because action is where mistakes become real.


## Verify: What Happened?


After acting, the agent has to observe again.


This is the feedback part of the loop. The agent reads the command output, test result, diff, browser state, linter warning, or API response and asks: did that action do what I expected?


For example:


```text

The test still fails, but the error moved from a 500 response to a missing field assertion.

```


That is useful information. The first patch may have fixed one layer and exposed another. The agent should not treat the failure as generic bad news. It should interpret the change in evidence.


Verification can also reveal that the plan was wrong:


```text

The failing test is not using the code path I edited.

```


or:


```text

The formatter changed many unrelated files.

```


or:


```text

The browser behavior is correct, but the accessibility label is missing.

```


Good agents do not ignore these signals. They update their working model.


Verification is not just "did the command pass?" It is "what did the result teach me?"


## Decide: Continue, Revise, Ask, Or Stop


After verification, the agent needs to choose the next branch in the loop.


There are four common outcomes.


First, continue. The action worked, and the next step is obvious. For example, the implementation is fixed, and now the agent should add a regression test.


Second, revise. The action produced evidence that the hypothesis was wrong or incomplete. The agent should adjust the plan and try a different path.


Third, ask. The task requires information the agent cannot infer safely. For example, two product behaviors are plausible, or a command requires permission, or the change touches a security-sensitive area.


Fourth, stop. The work is complete, blocked, too risky, or outside the requested scope.


Stopping is underrated. A good agent should know when not to keep going. It should not keep editing just because there is another possible improvement nearby. It should not fix unrelated failures. It should not turn a bug fix into a refactor unless the user asked for it.


The loop is powerful because it repeats. It is safe only when the agent knows when to exit.


## Report: Make The Loop Visible


The final step is reporting. The agent tells the human what happened.


A useful report includes:


- What changed.

- Why it changed.

- What evidence was gathered.

- What tests or checks ran.

- What remains uncertain.

- What the human should review.


For example:


```text

I changed the shared project filter so archived projects are excluded before the picker receives options. I added a regression test for archived projects and ran the focused picker test suite. I did not change the backend query because existing callers rely on receiving archived projects in admin views.

```


That kind of summary makes the loop inspectable. It gives the reviewer a map of the agent's decisions.


Weak reports say:


```text

Fixed it.

```


Strong reports explain the path from request to evidence.


## A Full Example


Imagine a user asks:


```text

Archived projects still appear in the active project picker. Please fix it.

```


A good agent loop might unfold like this.


Observe: Search for the active project picker, read the component, inspect how projects are loaded, and find existing tests.


Orient: Determine that the picker receives a list from a shared hook, and that "archived" is represented by `status: "ARCHIVED"` rather than a boolean.


Plan: Update the shared active-project selector rather than filtering in the component, then add a regression test beside existing picker tests.


Act: Patch the selector and add the test.


Verify: Run the focused test. It fails because one existing fixture uses lowercase `archived`.


Decide: Inspect the project status type. Discover uppercase enum values are production behavior and the lowercase fixture is outdated.


Act again: Update the fixture to use the enum value.


Verify again: Rerun the focused test. It passes.


Report: Summarize the selector change, the regression test, the fixture correction, and the command that passed.


The value is not that the agent guessed the fix immediately. The value is that the loop let it find and correct its assumptions.


## Common Loop Failures


Agent failures usually map cleanly to one part of the loop.


Observation failure: The agent reads the wrong files, misses the failing test, or ignores the current diff.


Orientation failure: The agent sees the right facts but draws the wrong conclusion.


Planning failure: The agent jumps into edits without sequencing the work.


Action failure: The agent uses the wrong tool, edits too broadly, or runs a risky command.


Verification failure: The agent runs a check but misreads the output.


Decision failure: The agent keeps going when it should ask, stop, or report a blocker.


Reporting failure: The agent finishes without enough evidence for the human to review.


This failure map is useful because it makes agent behavior debuggable. Instead of saying "the AI got confused," you can ask where the loop broke.


## How To Prompt For A Better Loop


Users can improve agent behavior by making the desired loop explicit.


For investigation:


```text

Inspect first. Do not edit yet. Summarize the likely cause and the files involved.

```


For scoped implementation:


```text

Make the smallest safe change. Match existing patterns. Add or update focused tests.

```


For verification:


```text

Run the most relevant check and explain what the result proves.

```


For review:


```text

Review the diff for behavior outside the requested scope, missing tests, and security risks.

```


These prompts work because they tell the agent which phase of the loop it is in. A lot of frustration comes from phase confusion: the user wants observation, but the agent acts; the user wants action, but the agent keeps explaining.


## Conclusion


The agent loop is the heart of an AI coding agent.


Observe, orient, plan, act, verify, decide, report. Then repeat when needed.


Every other part of the anatomy supports this loop. The model reasons inside it. Context feeds it. Tools execute it. The workspace grounds it. Guardrails constrain it. Feedback improves it. The human interface makes it visible and steerable.


When the loop works, an agent feels like a capable collaborator. It gathers evidence, makes scoped changes, checks its work, and knows when to ask for help.


When the loop breaks, the agent guesses, wanders, edits too much, trusts weak evidence, or hides uncertainty.


Understanding the loop gives engineers a practical way to use agents well. It also gives teams a practical way to evaluate them: do not only ask whether the agent produced code. Ask whether it moved through the loop with discipline.


No comments:

Post a Comment