John Haigh's Blog: Designing Agentic Workflows: Core Considerations

When you build an agentic workflow, you are really designing a system where an LLM can plan, act, observe results, and iterate — not just answer a single prompt. The core aspects below are the ones that usually determine whether it works reliably in production.

---

1. Define the job boundary clearly

Start with what the agent is allowed to accomplish, and what it must never do.

- Scope: One well-defined outcome (e.g. “triage this alert and propose a fix”) beats “handle anything related to infra.”

- Success criteria: What does “done” look like? A merged PR? A Jira ticket? A human-approved plan?

- Escalation: When should the agent stop and ask a person instead of continuing?

Ambiguous goals are the main reason agent workflows feel impressive in demos but fail in real use.

---

2. Choose the right orchestration model

Not every task needs a fully autonomous agent.

| Pattern | Best for |

|---|---|

| Fixed pipeline | Predictable steps with known tools |

| Planner + executor | Multi-step tasks with branching |

| Multi-agent | Parallel research, review, or specialization |

| Human-in-the-loop | High-risk or irreversible actions |

A common mistake is making everything “fully agentic” when a deterministic workflow with one LLM step would be simpler and more reliable.

---

3. Tool design and permissions

Agents are only as good as the tools they can call.

- Least privilege: Give only the tools needed for the task.

- Safe defaults: Read-only first; require explicit approval for writes, deploys, deletes, or network calls.

- Structured outputs: Tools should return predictable JSON, not free-form text the agent must reinterpret.

- Idempotency: Assume the agent may retry; side effects should be safe to repeat.

---

4. State, memory, and context management

Agents fail when they lose track of what already happened.

- Working memory: Current task state, intermediate results, open questions.

- External memory: Docs, tickets, repo context, prior runs — retrieved on demand rather than stuffed into every prompt.

- Context budget: Summarize or drop stale history instead of sending the full transcript forever.

- Handoffs: If multiple agents are involved, define exactly what each one receives and returns.

---

5. Prompting, skills, and guardrails

Instructions should be layered, not one giant system prompt.

- System rules: Security, tone, non-negotiable constraints.

- Skills/playbooks: Reusable procedures for recurring tasks.

- Task prompt: The specific user request and current state.

- Examples: Few-shot examples for brittle formats or decision boundaries.

Also treat all external inputs — tool responses, web fetches, MCP output, user files — as untrusted. Validate before acting on them.

---

6. Reliability and failure handling

Agentic systems must assume things will go wrong.

- Retries with limits: Retry transient tool failures, not logical mistakes.

- Checkpoints: Save progress so a run can resume after interruption.

- Verification steps: Have the agent confirm outcomes (“did the test pass?”, “does the diff match the request?”).

- Fallbacks: Smaller model, simpler workflow, or human takeover.

A workflow that cannot recover gracefully from one bad tool call is not production-ready.

---

7. Observability and auditability

You need to answer: *What did the agent do, why, and with what result?*

- Trace each step: Prompt, tool call, tool result, model decision.

- Attribute AI actions: Especially for commits, PRs, and operational changes.

- Metrics: Success rate, retries, cost, latency, human intervention rate.

- Replay/debug: Ability to inspect a failed run without guessing.

Without this, debugging agent behavior is mostly speculation.

---

8. Evaluation before and after launch

Agent quality is behavioral, not just “the code compiles.”

- Golden tasks: A curated set of real scenarios with expected outcomes.

- Regression evals: Run after prompt, tool, or model changes.

- Failure taxonomy: Hallucinated tool use, wrong plan, unsafe action, incomplete task.

- Continuous monitoring: In production, sample live runs and review drift over time.

---

9. Cost, latency, and model selection

Agentic workflows multiply token and tool usage quickly.

- Use smaller/faster models for classification, routing, and summarization.

- Reserve stronger models for planning, synthesis, and ambiguous reasoning.

- Cache retrieval and repeated context where possible.

- Cap max steps, tool calls, and runtime per task.

---

10. Security and governance

This becomes critical once agents can modify systems.

- No hardcoded secrets; use scoped credentials.

- Approval gates for destructive or privileged operations.

- Sandboxing for command execution.

- Clear ownership: who is accountable when an agent opens a PR or changes config?

---

A practical mental model

```mermaid

flowchart LR

Goal[Clear goal] --> Plan[Plan / decompose]

Plan --> Act[Use tools]

Act --> Observe[Observe results]

Observe --> Verify[Verify progress]

Verify -->|Not done| Plan

Verify -->|Blocked| Human[Human escalation]

Verify -->|Done| Complete[Deliver outcome]

The hardest parts are usually not the LLM itself, but:

Clear termination conditions

Safe, well-scoped tools

Verification loops

Human checkpoints for risky actions

John Haigh's Blog

Wednesday, June 3, 2026

Designing Agentic Workflows: Core Considerations

No comments:

Post a Comment

About Me