Building a Fully Agentic Coding Setup

I was late to fully agentic coding. I stuck with Cursor until the end of last year, then decided to give it a proper shot. I downloaded Claude Code, pointed it at real work, and used it for a week straight.

Opus 4.5 was an amazing model. My usage in Cursor had already proven that. But Claude Code’s interface kept getting in the way. Slash commands were unintuitive. Skills loaded silently with no indication of what was active. Scrolling was broken at the time, and I couldn’t follow what was happening half the time.

The model wasn’t the problem. The tooling around it was.

I just wanted a UI

Here’s a take that might be controversial in the terminal-first crowd: I genuinely don’t enjoy TUIs. I’ve tried. I respect people who live in them. But I find graphical interfaces faster to scan, easier to navigate, and more pleasant to look at for eight hours straight.

Anthropic shipped a desktop app that bundles Claude Code with a GUI. Sessions break when you switch between them, navigation falls apart if you click on a previous chat, and parallel agents don’t work reliably. They also have a web UI now, but it runs in a cloud workspace with no local config, no .env files, no credentials, and no way to test against real services. I wanted to see my conversation history laid out properly, click on things, and work in a UI designed for readability.

Finding OpenCode

OpenCode has a TUI, a web interface, and a desktop app. I use the desktop app. It’s clean, the conversation is fully visible, scrolling works, and it’s genuinely nice to work in for hours.

But the UI was just the entry point. What kept me was the configurability. Claude Code gives you a model and some tools. OpenCode gives you a framework for building your own workflow around a model. The project is open source on GitHub and has grown fast (100k+ stars), which says something about the demand for this kind of tool.

Custom prompts and separate agents

I use a Claude Max subscription with OpenCode, and to make that work you need to configure the prompts. Once I was in there, I kept going. In practice I use build and explore 90% of the time:

{
  "agent": {
    "build": {
      "description": "Default execution agent with full tool access",
      "mode": "primary",
      "prompt": "{file:./prompts/build.txt}"
    },
    "explore": {
      "description": "Fast read-only codebase scout",
      "mode": "subagent",
      "model": "anthropic/claude-haiku-4-5",
      "tools": { "write": false, "edit": false, "bash": false }
    },
    "debug": {
      "description": "Hypothesis-driven investigation",
      "mode": "primary",
      "temperature": 0.1,
      "prompt": "{file:./prompts/debug.txt}"
    }
  }
}

The build agent studies first (delegates to explore), builds in chunks, then verifies with external signals. The debug agent follows OBSERVE → HYPOTHESIZE → PREDICT → TEST → ANALYZE → repeat or fix.

The explore agent is the one I’m most opinionated about. It runs on Haiku and can only read files. No writing, no editing, no bash. Haiku reads and outputs faster than Opus, which is exactly what you want for a scout that rips through files and reports back immediately:

You are a fast, read-only codebase scout. Your only job is
to locate code and report where it is. You cannot modify anything.

Locate, don't analyze. You are a search tool, not an advisor.

There’s also a permission system. After an agent pushed code to the wrong remote, I added rules: force pushes are denied, hard resets require approval, destructive file operations are blocked.

Workspace-aware tooling and real CLI access

I wrote custom tools in TypeScript that detect which project I’m in and switch auth automatically:

const WORKSPACES = {
  Acme: {
    directory: "~/Documents/Acme",
    account: "dirk-acme",
    configDir: "~/.config/gh-acme",
  },
  Globex: {
    directory: "~/Documents/Globex",
    account: "dirk-globex",
    configDir: "~/.config/gh-globex",
  },
  Personal: {
    directory: "~/Documents/Personal",
    account: "Stoffberg",
    configDir: "~/.config/gh-personal",
  },
}

Open a project, the right GitHub, Jira, and Confluence auth is already active. Beyond auth, the agents get access to real CLI tools: Azure DevOps, AWS CLI, and Azure CLI. Full PR workflows in one prompt. Application Insights queries with KQL to find production errors. CloudWatch logs cross-referenced against stack traces. Remote debugging through an AI agent.

REST tools for API debugging

I built an MCP plugin that gives agents a proper HTTP client with response storage. Instead of cobbling together curl commands in bash and losing the response the moment it scrolls off, the agent gets dedicated tools:

rest_configure  →  set base URL, auth token, headers
rest_request    →  make a request, store the response
rest_pick       →  extract specific fields from a stored response
rest_filter     →  filter arrays with WHERE/SELECT syntax
rest_compare    →  diff two stored responses

Hit an endpoint, store the response, make a change, hit it again, compare the two. Filter 500 items down to the ones matching a condition. Pick nested fields out of deeply structured JSON. Each request references previous responses, so the agent builds up a chain of evidence as it debugs. I point the debug agent at a broken endpoint with credentials and say “figure out why this returns the wrong data.” It tests variations, stores each response, compares them, and narrows it down.

Skills that inject the right context at the right time

Markdown files that load on demand when a task matches a pattern (following the Agent Skills open standard, which Claude Code also supports):

PR authoring: my exact format, Jira ticket linking, review templates, test plan structure
Frontend design: typography, spacing, and avoiding the generic AI look (no gradient blobs, no “hero sections with a CTA”)
TDD (inspired by Matt Pocock’s TDD skill): red-green-refactor loop, one test at a time
API debugging: use the REST tools instead of curl

Deep domain context when the agent needs it, nothing when it doesn’t.

Adaptive thinking via plugin

When Anthropic introduced adaptive thinking, OpenCode didn’t support it yet. Instead of setting a fixed thinking token budget, adaptive mode lets the model decide how much to think based on problem complexity. I wrote a plugin that intercepts the API request and swaps the config:

export const adaptiveThinking: Plugin = async () => ({
  "chat.params": async (_input, output) => {
    if (output.options?.thinking?.type === "adaptive") {
      output.options.thinking = {
        type: "enabled",
        budgetTokens: 1024,
      }
    }
  },
  loader: async () => ({
    fetch: async (url, init) => {
      const body = JSON.parse(init.body as string)
      if (
        body?.thinking?.budget_tokens === 1024 &&
        body?.model?.includes("opus-4-6")
      ) {
        body.thinking = { type: "adaptive" }
        return fetch(url, { ...init, body: JSON.stringify(body) })
      }
      return fetch(url, init)
    },
  }),
})

OpenCode sends the request with a sentinel value (budgetTokens: 1024), the plugin’s fetch wrapper catches it and replaces it with { type: "adaptive" } before it hits the Anthropic API. This is the kind of thing you can do when the tool is actually extensible.

I cancelled Cursor too

For a while I was running both Claude Max and Cursor Ultra, $200 each. $400/month on AI tooling, and I was actually hitting the Cursor usage limit. Every usage-based tool I tried had the same problem: the amount of coding I do just burns through whatever cap they set. Claude Max is flat rate. With OpenCode, running three agents per day doing real work, I cannot get past 35% usage per week. I’ve tried.

OpenCode already had what Cursor was selling me: multi-session support, LSP integration for type errors and diagnostics, and the agent framework I actually wanted to customize. I’m still on Claude 4.6 Opus. It misses stuff constantly. I need to guide it, review everything it produces, and catch the things it glosses over. That’s just where all models are right now: useful enough to change how you work, not reliable enough to trust unsupervised.

That’s the setup. If you want to see what it actually does with all of this, read 13 Prompts, 366 Tool Calls.