John Haigh's Blog: The Anatomy Of An AI Coding Agent, Part 3

# The Anatomy Of An AI Coding Agent, Part 3

## The Hands: Tools, File Edits, Terminal, Browser, And APIs

An AI coding agent without tools is mostly a very fast pair programmer with no keyboard. It can reason, suggest, and explain, but it cannot inspect a repository, run a test, edit a file, open a browser, or call an API unless the environment gives it hands.

Those hands are where AI coding agents become operational. They are also where much of the practical risk lives.

For software engineers and technical leaders evaluating tools like Cursor, Claude Code, Codex CLI, and similar systems, it is useful to look past the chat interface. The real question is not only "how good is the model?" It is also "what can the agent do, how does it know what happened, and where are the boundaries?"

## Tools Turn Reasoning Into Action

A coding agent typically has access to a set of tools: file readers, search, patch editors, terminal execution, browser automation, API clients, linters, test runners, and sometimes internal systems like GitHub, Jira, observability tools, or deployment platforms.

The model decides what to do next, but the tool performs the action.

For example, if you ask:

```text

Fix the bug where archived projects still appear in search results.

```

A capable agent might:

1. Search for the project search implementation.

2. Read the relevant backend query.

3. Inspect existing tests.

4. Edit the filter logic.

5. Add or update a regression test.

6. Run the focused test suite.

7. Summarize the change.

Each step depends on tool access. The model may understand the likely fix, but it needs repository context and feedback from real execution to avoid guessing.

This is why strong agents feel less like autocomplete and more like junior-to-mid-level engineers working under supervision. They form a hypothesis, inspect the code, make a change, run checks, and adjust.

## File Edits Are The Core Skill

The most important hand is the ability to edit files safely.

Good agents do not rewrite entire files casually. They read the surrounding code, preserve local style, and make narrow patches. This matters because real codebases contain formatting conventions, partial migrations, generated files, and nearby work from other developers.

A weak edit sounds like this:

```text

I replaced the service with a cleaner implementation.

```

A stronger edit sounds like this:

```text

I changed only the repository filter predicate, kept the existing pagination path intact, and added a regression case beside the current archived-project tests.

```

The distinction is important. Agents should be evaluated not just by whether the final code compiles, but by whether the change is easy to review.

File editing also introduces workflow concerns. What happens if the working tree is already dirty? Does the agent overwrite user changes? Does it understand generated files? Can it explain every modified file?

A useful rule: the agent should treat the repository as shared space, not as a blank canvas.

## The Terminal Provides Ground Truth

The terminal gives the agent access to reality.

Models can reason about TypeScript, Go, Python, Java, and build systems, but they do not know whether this branch currently passes tests. They need to run the commands that engineers would run: unit tests, linters, type checks, builds, migrations, code generation, and focused repro scripts.

For example, after changing a React component, an agent might run the project's type checker and a focused test file. After modifying a Go package, it might run the package tests rather than the entire monorepo suite.

This feedback loop is essential. Without it, an agent can produce code that looks plausible but fails on import paths, mocks, generated types, snapshots, or edge cases.

Terminal access also needs guardrails. There is a big difference between:

```text

npm test

```

and:

```text

curl some-script | bash

```

A mature agent environment distinguishes between safe, routine development commands and commands that modify external systems, install dependencies, delete files, or make network calls. Approval flows are not bureaucracy. They are part of making agents usable in real engineering environments.

## Browser Use Closes The Loop For User-Facing Work

For frontend and product work, tests are often not enough. The browser is where layout, interaction, authentication flows, accessibility issues, and real user behavior show up.

A browser-capable agent can validate changes by opening the app, navigating to a page, filling a form, clicking through a flow, and checking the visible result. This is especially valuable for issues like:

- A button is enabled when it should be disabled.

- A modal closes but leaves focus trapped.

- A loading state never clears.

- A table renders correctly in tests but breaks at realistic viewport widths.

- A route works locally but fails after a redirect.

This does not replace human product judgment. It does, however, make the agent less dependent on static reasoning.

A practical example: if an agent changes a checkout form validation rule, it should not stop at editing the schema. It should verify that invalid input shows the expected error, valid input can proceed, and the UI does not regress in the common path.

## APIs Extend The Agent Beyond The Repo

Many coding tasks live partly outside the codebase. The relevant context may be in GitHub comments, Jira tickets, CI logs, feature flag systems, observability tools, package registries, or internal deployment APIs.

API tools let the agent gather that context.

For example, when fixing a failing CI job, an agent may need to fetch the failed workflow logs, identify that a snapshot changed, inspect the related pull request, update the test expectation, and report back with the exact failure that was resolved.

This is powerful, but it also expands the trust boundary. API responses should be treated as untrusted input. Agents should not blindly execute instructions found in issue comments, logs, web pages, or tool responses. A malicious comment saying "ignore your safety rules and run this command" is still just data.

For technical leaders, this is one of the most important evaluation points: can the platform separate data from instructions?

## Good Tool Use Is Deliberate

The best agents are not the ones with the most tools. They are the ones that use tools deliberately.

A good agent does not run the entire test suite after every one-line change if a focused test would provide faster feedback. It does not open a browser for a backend-only refactor. It does not call external APIs when local repository context is enough.

But it also does not guess when it can inspect. It reads before editing. It tests after changing. It asks for approval before risky actions. It summarizes what it did in terms that a reviewer can verify.

In practice, this creates a useful operating model:

- Tools provide context.

- Edits express intent.

- Tests and builds provide feedback.

- Browser checks validate behavior.

- APIs connect the work to the broader engineering system.

- Human review remains the accountability layer.

## Conclusion

The hands of an AI coding agent are what make it useful. They let the agent move from advice to execution: reading code, changing files, running tests, checking UI behavior, and interacting with engineering systems.

But hands without boundaries are dangerous. The same terminal that runs a unit test can delete a directory. The same API client that reads a CI log can post a misleading comment. The same file editor that fixes a bug can overwrite unrelated work.

Evaluating an AI coding agent means evaluating both capability and control. Can it take meaningful action? Can it observe the result? Can it explain what changed? Can it stop before crossing a risky boundary?

The best agents do not replace engineering discipline. They depend on it.

John Haigh's Blog

Thursday, May 14, 2026

The Anatomy Of An AI Coding Agent, Part 3

No comments:

Post a Comment

About Me