John Haigh's Blog: June 2026

Tuesday, June 9, 2026

Building Agentic Workflows With Hermes Agent, Part 2

The Agent Loop As Workflow Engine

Most software workflows are easy to describe and hard to make reliable. A user asks for something, the system gathers context, decides what to do, calls tools, checks the result, and either continues or stops. That sounds simple until the task crosses a boundary: a codebase, a browser, an API, a document store, a queue, another agent, or a human approval step.

That is where agentic workflows become useful. Instead of treating an AI model as a one-shot text generator, frameworks like Hermes Agent give engineers a harness for running work over time. The core of that harness is the agent loop.

In Hermes Agent, you can think of the agent loop as the workflow engine. It is the repeating control structure that lets an agent observe the world, plan its next move, use tools, verify what happened, and decide whether to keep going or stop. Tool use, memory, skills, gateways, and subagents all matter, but the loop is what turns those pieces into a working system.

A useful mental model is:

```text

observe -> plan -> act with tools -> verify -> continue or stop

```

This loop is simple enough to reason about and powerful enough to support complex workflows.

Observe: Start With State, Not Just Prompt Text

The first step in a reliable agentic workflow is observation. The agent needs to know what is true right now.

For a coding task, observation might include the user request, open files, repository structure, test output, linter diagnostics, previous attempts, and relevant project conventions. For an operations workflow, it might include logs, alerts, deployment state, runbook steps, and service health. For a business workflow, it might include ticket metadata, customer context, policy rules, and prior decisions.

The design mistake to avoid is treating the initial prompt as the whole world. In real workflows, the prompt is only the starting signal. The agent should gather enough state to make a grounded decision.

Implementation guidance:

- Define what state the agent is allowed to observe.

- Separate trusted system context from untrusted external content.

- Make observations explicit in the workflow trace.

- Prefer structured context over long, unbounded text blobs.

- Keep sensitive data out of the observation surface unless strictly required.

A good observation step narrows uncertainty. It should answer: what is being asked, what constraints apply, what resources are available, and what is already known?

Plan: Choose The Next Move

Planning is where the agent turns state into intent. This does not always need to be a long plan. In production workflows, the most useful plan is often compact and operational: what will be done next, why, and how success will be checked.

For example, an agent handling a code change might plan to inspect the relevant module, identify existing patterns, make a small edit, run targeted tests, and report the result. An agent handling an incident might plan to confirm the alert, inspect recent deploys, compare current metrics to baseline, and escalate if the blast radius is unclear.

Planning should be scoped. The agent does not need to solve the entire problem in one decision. The point of the loop is that the plan can evolve as new observations arrive.

Implementation guidance:

- Make plans short-lived and revisable.

- Encode hard constraints outside the model where possible.

- Give the agent a clear stopping condition.

- Prefer small steps with verification over large speculative actions.

- Let specialized skills or subagents handle bounded parts of the work.

Technical leaders should pay close attention to this phase. Planning is where autonomy becomes governance. If the workflow cannot explain what the agent is about to do, it will be hard to trust in production.

Act: Give The Agent Real Capabilities

A model without tools can suggest. A model with tools can act.

Hermes Agent's tool-use model is central to building practical workflows. Tools may include file readers, search APIs, code execution, issue trackers, deployment systems, browser gateways, memory stores, or calls to other agents. Multi-platform gateways make this especially important because the agent may need to operate across different environments while keeping one coherent workflow.

The key design principle is that tools should be narrow, typed, and observable. A tool named `run_any_command` gives the agent too much surface area. A tool named `get_build_status`, `search_repo`, or `create_draft_pr_description` is easier to secure, test, and reason about.

Implementation guidance:

- Prefer specific tools over general-purpose escape hatches.

- Validate tool inputs before execution.

- Return structured outputs the agent can inspect.

- Include clear error states, not just free-form failure text.

- Log tool calls with enough metadata for debugging and audit.

- Require approval for irreversible or high-risk actions.

Tool design is workflow design. If the tools are vague, the workflow will be vague. If the tools encode useful boundaries, the agent loop becomes much more reliable.

Verify: Do Not Confuse Action With Progress

Verification is the difference between an agent that merely does things and an agent that completes work.

After every meaningful action, the agent should ask: did that work? The answer should come from evidence, not confidence. For code, that might mean tests, type checks, diffs, or runtime behavior. For infrastructure, it might mean health checks, metrics, deployment status, or log changes. For a document workflow, it might mean validating required fields, checking policy compliance, or confirming that generated content matches the request.

This step is also where agents should recover from mistakes. A failed tool call is not necessarily a failed workflow. The agent can inspect the error, adjust the plan, and try a safer next step. But retries need limits. Infinite loops are not resilience.

Implementation guidance:

- Define success criteria before acting.

- Use external signals when possible.

- Treat tool errors as observations for the next loop.

- Add retry budgets and escalation paths.

- Distinguish partial success from complete success.

- Preserve enough trace data to debug failed runs.

Reliable agents are evidence-seeking systems. They do not simply produce an answer; they check whether the answer survived contact with the environment.

Continue Or Stop: Control The Loop

The final step is deciding whether to continue.

An agent should continue when the goal is not yet satisfied, more information is needed, or verification reveals a fixable issue. It should stop when the objective is complete, when it reaches a defined limit, when it needs human input, or when continuing would be unsafe.

This sounds obvious, but stop conditions are one of the most important parts of agentic workflow design. Without them, agents drift. They over-search, over-edit, over-retry, or keep optimizing past the point of value.

Implementation guidance:

- Set maximum iterations for each workflow.

- Define completion criteria in terms of observable outcomes.

- Stop on ambiguous authority, missing permissions, or unsafe requests.

- Escalate when confidence depends on business judgment.

- Report what was done, what was verified, and what remains uncertain.

For technical leaders, this is where agentic systems become manageable. The organization does not need agents that "try their best" indefinitely. It needs agents that make bounded progress and know when to hand back control.

Designing Reliable Workflows With Hermes Agent

The agent loop becomes more powerful when combined with the rest of the Hermes Agent model.

Memory helps the agent carry useful context across steps or sessions, but memory should be curated. Store durable facts, preferences, and prior outcomes. Do not dump every intermediate token into long-term context.

Skills let teams package repeatable procedures. A skill might encode how to triage a failing build, draft a release note, investigate an alert, or prepare a code review. Good skills turn tribal knowledge into executable workflow guidance.

Subagents let a workflow split into specialized tasks. One agent can inspect code, another can summarize test failures, and another can draft user-facing notes. The parent loop remains responsible for coordination and final verification.

Gateways allow the same agentic workflow to reach different platforms. That matters because real work rarely lives in one system. The loop gives continuity across those boundaries.

A practical starting point is to design workflows as state machines before making them autonomous. Write down the states: observing, planning, acting, verifying, waiting for approval, complete, failed. Then decide where the model is allowed to make decisions and where deterministic code should enforce rules.

The strongest Hermes Agent workflows will not be the ones that give the model unlimited freedom. They will be the ones that combine model flexibility with software engineering discipline: typed tools, explicit state, bounded loops, durable memory, reusable skills, and clear verification.

The agent loop is not just an implementation detail. It is the core abstraction that makes agentic work possible. Once you can observe, plan, act, verify, and decide whether to continue, you have the foundation for workflows that are more than chat and less brittle than hard-coded automation.

That is the promise of Hermes Agent: not magic autonomy, but practical, inspectable systems that can move through real work step by step.

Wednesday, June 3, 2026

Designing Agentic Workflows: Core Considerations

When you build an agentic workflow, you are really designing a system where an LLM can plan, act, observe results, and iterate — not just answer a single prompt. The core aspects below are the ones that usually determine whether it works reliably in production.

---

1. Define the job boundary clearly

Start with what the agent is allowed to accomplish, and what it must never do.

- Scope: One well-defined outcome (e.g. “triage this alert and propose a fix”) beats “handle anything related to infra.”

- Success criteria: What does “done” look like? A merged PR? A Jira ticket? A human-approved plan?

- Escalation: When should the agent stop and ask a person instead of continuing?

Ambiguous goals are the main reason agent workflows feel impressive in demos but fail in real use.

---

2. Choose the right orchestration model

Not every task needs a fully autonomous agent.

| Pattern | Best for |

|---|---|

| Fixed pipeline | Predictable steps with known tools |

| Planner + executor | Multi-step tasks with branching |

| Multi-agent | Parallel research, review, or specialization |

| Human-in-the-loop | High-risk or irreversible actions |

A common mistake is making everything “fully agentic” when a deterministic workflow with one LLM step would be simpler and more reliable.

---

3. Tool design and permissions

Agents are only as good as the tools they can call.

- Least privilege: Give only the tools needed for the task.

- Safe defaults: Read-only first; require explicit approval for writes, deploys, deletes, or network calls.

- Structured outputs: Tools should return predictable JSON, not free-form text the agent must reinterpret.

- Idempotency: Assume the agent may retry; side effects should be safe to repeat.

---

4. State, memory, and context management

Agents fail when they lose track of what already happened.

- Working memory: Current task state, intermediate results, open questions.

- External memory: Docs, tickets, repo context, prior runs — retrieved on demand rather than stuffed into every prompt.

- Context budget: Summarize or drop stale history instead of sending the full transcript forever.

- Handoffs: If multiple agents are involved, define exactly what each one receives and returns.

---

5. Prompting, skills, and guardrails

Instructions should be layered, not one giant system prompt.

- System rules: Security, tone, non-negotiable constraints.

- Skills/playbooks: Reusable procedures for recurring tasks.

- Task prompt: The specific user request and current state.

- Examples: Few-shot examples for brittle formats or decision boundaries.

Also treat all external inputs — tool responses, web fetches, MCP output, user files — as untrusted. Validate before acting on them.

---

6. Reliability and failure handling

Agentic systems must assume things will go wrong.

- Retries with limits: Retry transient tool failures, not logical mistakes.

- Checkpoints: Save progress so a run can resume after interruption.

- Verification steps: Have the agent confirm outcomes (“did the test pass?”, “does the diff match the request?”).

- Fallbacks: Smaller model, simpler workflow, or human takeover.

A workflow that cannot recover gracefully from one bad tool call is not production-ready.

---

7. Observability and auditability

You need to answer: *What did the agent do, why, and with what result?*

- Trace each step: Prompt, tool call, tool result, model decision.

- Attribute AI actions: Especially for commits, PRs, and operational changes.

- Metrics: Success rate, retries, cost, latency, human intervention rate.

- Replay/debug: Ability to inspect a failed run without guessing.

Without this, debugging agent behavior is mostly speculation.

---

8. Evaluation before and after launch

Agent quality is behavioral, not just “the code compiles.”

- Golden tasks: A curated set of real scenarios with expected outcomes.

- Regression evals: Run after prompt, tool, or model changes.

- Failure taxonomy: Hallucinated tool use, wrong plan, unsafe action, incomplete task.

- Continuous monitoring: In production, sample live runs and review drift over time.

---

9. Cost, latency, and model selection

Agentic workflows multiply token and tool usage quickly.

- Use smaller/faster models for classification, routing, and summarization.

- Reserve stronger models for planning, synthesis, and ambiguous reasoning.

- Cache retrieval and repeated context where possible.

- Cap max steps, tool calls, and runtime per task.

---

10. Security and governance

This becomes critical once agents can modify systems.

- No hardcoded secrets; use scoped credentials.

- Approval gates for destructive or privileged operations.

- Sandboxing for command execution.

- Clear ownership: who is accountable when an agent opens a PR or changes config?

---

A practical mental model

```mermaid

flowchart LR

Goal[Clear goal] --> Plan[Plan / decompose]

Plan --> Act[Use tools]

Act --> Observe[Observe results]

Observe --> Verify[Verify progress]

Verify -->|Not done| Plan

Verify -->|Blocked| Human[Human escalation]

Verify -->|Done| Complete[Deliver outcome]

The hardest parts are usually not the LLM itself, but:

Clear termination conditions

Safe, well-scoped tools

Verification loops

Human checkpoints for risky actions