Skip to content

Why I switched from Claude Code to OpenCode

· 24 min read

I was slightly late to the whole “vibe coding” era. I stuck with just Cursor until the end of last year, then decided to give agentic coding a proper shot for a week. I started by downloading Claude Code and feeding it real tickets, trying to build a new project from scratch. Between the broken UI, the silent context drops, and the complete lack of cost control, I went looking for something better pretty quickly.

The harness, not the model

Don’t get me wrong, Opus 4.5 at the time was an amazing model (my usage in Cursor had already proven that). But the harness around it just didn’t click:

  • The terminal interface felt limiting
  • Slash commands were unintuitive
  • Skills were unclear about when they loaded
  • Scrolling was broken at the time
  • I couldn’t follow what was going on half the time

So I tried Anthropic’s new desktop app that bundled Claude Code with a proper GUI. The experience was worse. Parallel agents and chat sessions just didn’t work. I’d open one session, talk to it, switch to another, prompt again, then go back to the first and the whole thing would break. Clicking on a previous chat stopped working entirely. Navigation fell apart.

The model wasn’t the problem. The tooling around it was.

Death by a thousand paper cuts

The breaking point wasn’t one dramatic failure. It was the accumulation of small ones:

  • Commands hanging for no obvious reason
  • Context silently dropping mid-conversation so the agent forgets what you told it three messages ago
  • Tool calls failing and getting retried with the exact same parameters, like the agent learned nothing from the error
  • No way to tell whether your instructions are wrong or the tool is just being flaky

That uncertainty becomes a tax on every decision you make. I found myself spending more time working around Claude Code’s quirks than actually coding. That’s when I knew I needed to look elsewhere.

I just wanted a UI

Here’s a take that might be controversial in the terminal-first crowd: I genuinely don’t enjoy TUIs. I’ve tried. I respect people who live in them. But I find graphical interfaces faster to scan, easier to navigate, and more pleasant to look at for eight hours straight.

Claude Code is terminal-only. No web UI, no desktop app, take it or leave it. I wanted to see my conversation history laid out properly. I wanted to click on things. I wanted a UI designed for readability, not squeezed into a terminal grid.

Finding OpenCode

OpenCode offered what I was missing. I started with the TUI, which was already better than Claude Code’s. Then I moved to the web interface, which immediately solved the remaining UI problems: clean layout, full conversation visible, proper scrolling. Now I primarily use the desktop app, which is genuinely nice to work in.

But the UI was just the entry point. What kept me was the configurability. Claude Code gives you a model and some tools. OpenCode gives you a framework for building your own workflow around a model. The project is open source on GitHub and has grown fast (100k+ stars), which says something about the demand for this kind of tool.

Custom prompts: the accidental rabbit hole

Here’s how this actually started. I use a Claude Max subscription with OpenCode, and to make that work you need to configure the prompts in a specific way. So I opened the prompt config, did the bare minimum to get Max working, and thought “well, if I’m already in here, I might as well make these actually good.”

That’s how I ended up spending days on custom prompts. The Max subscription forced the door open, and then I walked through it.

Each agent gets its own system prompt loaded from a file. I spent real time researching what makes effective prompts for different tasks and encoded those patterns into each one.

The build agent follows a structured workflow:

  1. Study first by delegating reconnaissance to the explore agent
  2. Build in chunks (smallest meaningful piece, then verify)
  3. Verify with external signals (tests, typecheck, lint, not “looks right”)

The debug agent uses hypothesis-driven investigation:

OBSERVE → HYPOTHESIZE → PREDICT → TEST → ANALYZE → repeat or fix

Each iteration must eliminate at least one hypothesis. Before adding any instrumentation, it tries the obvious stuff first: actually read the error message, trace the code path mentally, check for typos and missing awaits, reproduce minimally.

Default prompts try to be everything to everyone and end up mediocre at all of it. A focused prompt that encodes a specific methodology consistently outperforms a generic one, even with the same model.

Why I split into separate agents

The most impactful thing in my setup is having multiple agents with distinct roles. Here’s the actual config:

{
  "agent": {
    "build": {
      "description": "Default execution agent with full tool access",
      "mode": "primary",
      "prompt": "{file:./prompts/build.txt}"
    },
    "plan": {
      "description": "Architecture and planning agent",
      "mode": "primary",
      "temperature": 0.1,
      "prompt": "{file:./prompts/plan.txt}"
    },
    "debug": {
      "description": "Hypothesis-driven investigation",
      "mode": "primary",
      "temperature": 0.1,
      "prompt": "{file:./prompts/debug.txt}"
    },
    "explore": {
      "description": "Fast read-only codebase scout",
      "mode": "subagent",
      "model": "anthropic/claude-haiku-4-5",
      "tools": { "write": false, "edit": false, "bash": false }
    }
  }
}

The reason for this split is context pollution. When one agent does everything, planning context bleeds into implementation. You ask it to think through an architecture decision, and that reasoning stays in context while it’s writing code, subtly biasing every edit. The plan agent thinks through tradeoffs with low temperature (more deterministic, less creative wandering). The build agent executes with full tool access. They don’t contaminate each other.

The explore agent is the one I’m most opinionated about. It runs on Haiku and can only read files. No writing, no editing, no bash. Haiku is fast. It reads faster and outputs faster than Opus, which is exactly what you want for a task that’s pure comprehension. When I need to understand a codebase, I want a scout that rips through files and reports back immediately, not a reasoning powerhouse that pauses to think about every line it reads. And I definitely don’t want it “helpfully” modifying things it finds along the way.

Its prompt is deliberately constrained:

You are a fast, read-only codebase scout. Your only job is
to locate code and report where it is. You cannot modify anything.

Locate, don't analyze. You are a search tool, not an advisor.
Find the relevant code, report the locations, and stop.
Never offer opinions, suggestions, diagnoses, or recommendations.

The permission system exists because I learned the hard way

{
  "permission": {
    "bash": {
      "*": "allow",
      "git push --force*": "deny",
      "git push -f*": "deny",
      "git reset --hard*": "ask",
      "rm -rf /*": "deny",
      "rm -rf ~*": "deny"
    }
  }
}

This isn’t theoretical caution. I had an agent push code to the wrong remote because it picked up the wrong GitHub account context. When you work across multiple projects with separate GitHub accounts, that kind of mistake is one autonomous git push away at any time.

The rules are simple:

  • "*": "allow" lets the agent run most commands autonomously
  • Force pushes are denied outright (no recovering from pushing to the wrong branch on someone else’s repo)
  • Hard resets require explicit approval (destructive but sometimes needed)
  • rm -rf / and rm -rf ~ are blocked (because why not)

Claude Code doesn’t offer anything like this. You either trust it completely or you babysit every command.

Workspace-aware tooling

I wrote custom tools in TypeScript that detect which project I’m in based on the directory and automatically switch auth. I work across multiple projects with different GitHub accounts, Jira instances, and Confluence spaces. Before this, I’d regularly forget to switch accounts and either get permission errors or, worse, successfully push to the wrong place.

The workspace detection keys off the directory:

const WORKSPACES = {
  Acme: {
    directory: "~/Documents/Acme",
    account: "dirk-acme",
    configDir: "~/.config/gh-acme",
  },
  Globex: {
    directory: "~/Documents/Globex",
    account: "dirk-globex",
    configDir: "~/.config/gh-globex",
  },
  Personal: {
    directory: "~/Documents/Personal",
    account: "Stoffberg",
    configDir: "~/.config/gh-personal",
  },
}

The gh and git tools detect the workspace from the working directory and set GH_CONFIG_DIR automatically before running any command. Same for Jira and Confluence: each workspace has its own site URL, email, and API token. Open a project, the right auth is already active. No manual switching, no remembering which account goes with which repo.

Giving agents real CLI access

This is the part that changed how I work the most. I give my agents access to real CLI tools:

  • GitHub CLI (gh) for creating PRs, reviewing code, adding comments, merging
  • Jira for finding tickets, linking them to PRs, adding comments, transitioning issue status
  • Azure DevOps (az boards, az repos) for the same workflow on Azure projects
  • AWS CLI for debugging infrastructure issues, checking CloudWatch logs, inspecting resources
  • Azure CLI for querying Application Insights logs and diagnosing production issues

The agent can do a full PR workflow without me touching GitHub:

"Create a PR for this branch, link it to PROJ-1234,
 add a description based on the commits, and request
 review from the platform team."

It runs gh pr create, formats the description from the diff, links the Jira ticket, and sets reviewers. That whole flow used to take me five minutes of tab switching. Now it’s one prompt.

But the infrastructure access is where it gets really powerful. When something breaks in production, I can tell the debug agent to investigate and it will:

  1. Query Application Insights with KQL to find the error traces
  2. Pull CloudWatch logs from the relevant service
  3. Cross-reference timestamps across systems
  4. Form a hypothesis about what went wrong
  5. Suggest a fix with the actual stack traces as evidence

That’s remote debugging through an AI agent. It’s reading production logs, correlating data across services, and narrowing down root causes while I focus on understanding the business impact. This alone justified the entire setup.

REST tools for API debugging

I built an MCP plugin that gives agents a proper HTTP client with response storage. Instead of the agent cobbling together curl commands in bash (and losing the response the moment it scrolls off), it gets dedicated tools:

rest_configure  →  set base URL, auth token, headers
rest_request    →  make a request, store the response
rest_pick       →  extract specific fields from a stored response
rest_filter     →  filter arrays with WHERE/SELECT syntax
rest_compare    →  diff two stored responses

This means the agent can hit an endpoint, store the response as a reference, make a change, hit it again, and compare the two. It can filter a list of 500 items down to the ones matching a condition without me having to read through the raw JSON. It can pick nested fields out of a deeply structured response.

For API debugging specifically, this is a huge upgrade over curl in bash. The agent can build up a chain of requests, each one referencing previous responses, and keep the full context of what it’s tested. When I’m investigating a bug in an API, I point the debug agent at it with credentials and say “figure out why this endpoint returns the wrong data.” It methodically tests variations, stores each response, compares them, and narrows down the issue.

Skills that inject the right context at the right time

I have markdown files called “skills” (following the Agent Skills open standard, which Claude Code also supports) that get loaded when a task matches a pattern:

  • PR authoring injects my exact format, Jira ticket linking, review templates, and how I like test plans structured
  • Frontend design injects guidelines about typography, spacing, and avoiding the generic AI look (no gradient blobs, no “hero sections with a CTA”)
  • TDD (inspired by Matt Pocock’s TDD skill) injects the red-green-refactor loop: one test, one implementation, repeat
  • API debugging teaches the agent to use the REST tools (rest_configurerest_requestrest_pick) instead of curl

The key insight is that these aren’t permanent instructions bloating every conversation. They load on demand, so the agent gets deep domain context exactly when it needs it and nothing when it doesn’t.

Adaptive thinking via plugin

When Anthropic introduced adaptive thinking, OpenCode didn’t support it yet. Adaptive mode lets the model decide how much to think based on problem complexity instead of you setting a fixed token budget. I didn’t want to wait for official support, so I wrote a plugin that intercepts the API request and swaps the thinking config:

export const adaptiveThinking: Plugin = async () => ({
  "chat.params": async (_input, output) => {
    if (output.options?.thinking?.type === "adaptive") {
      output.options.thinking = {
        type: "enabled",
        budgetTokens: 1024,
      }
    }
  },
  loader: async () => ({
    fetch: async (url, init) => {
      const body = JSON.parse(init.body as string)
      if (
        body?.thinking?.budget_tokens === 1024 &&
        body?.model?.includes("opus-4-6")
      ) {
        body.thinking = { type: "adaptive" }
        return fetch(url, { ...init, body: JSON.stringify(body) })
      }
      return fetch(url, init)
    },
  }),
})

The trick: OpenCode sends the request with a sentinel value (budgetTokens: 1024), the plugin’s fetch wrapper catches it and replaces it with { type: "adaptive" } before it hits the Anthropic API. The model then allocates thinking tokens dynamically. Simple file reads get minimal thinking, complex architecture decisions get deep reasoning. This is the kind of thing you can do when the tool is actually extensible.

I cancelled Cursor too

For a while I was running both Claude Max and Cursor Ultra, $200 each. That’s $400 a month on AI tooling. When I got my OpenCode setup dialled in, I realised Cursor wasn’t earning its keep anymore.

OpenCode already had what Cursor was selling me:

  • Multi-session support for running parallel agents
  • LSP integration for type errors and diagnostics
  • The agent framework I actually wanted to customize

The features Cursor sold me on (worktrees, parallel project editing) sounded great in theory, but in practice I can only manage about three agents at once before the context switching overwhelms the productivity gain. More than that and I’m not locked in on any of them, just juggling.

I cancelled with half an eye on GPT-5.3-Codex, which was generating a lot of buzz at the time. But at the time of writing, I’m still on Claude 4.6 Opus and it’s doing the job. Not perfectly. It misses stuff constantly. I need to guide it, review everything it produces, and catch the things it glosses over. That’s just where all models are right now: useful enough to change how you work, not reliable enough to trust unsupervised.

I do genuinely enjoy talking through problems with Claude; the conversational side is where it shines. But for real coding work, the chat model alone doesn’t cut it. You need the agent setup, the custom prompts, the whole framework around it. That’s exactly why the harness matters more than the model, and why I couldn’t stay on Claude Code.

The cost problem nobody talks about

Claude Code’s pricing is genuinely bad. Even with a Claude Max subscription at $200/month, you burn through usage fast because you have zero control over how the agent spends tokens. There’s no way to route cheap tasks to a cheap model. Every file read, every grep, every “let me check that for you” runs through the same expensive Opus pipeline. You’re paying premium rates for the model to do the cognitive equivalent of looking something up in a phone book.

With OpenCode, my explore agent runs on Haiku. It handles all the codebase reading, pattern finding, and dependency tracing that makes up probably 40% of what an agent does in a typical session. Haiku is faster for these tasks and way cheaper. Claude Code doesn’t let you route anything. Every token goes through the same model at the same price, and you just have to hope it doesn’t waste half your context window reading files it didn’t need to read.

Anthropic could fix this tomorrow by letting users configure model routing, but they won’t, because the current setup maximises their revenue. The more tokens the agent burns on trivial tasks, the faster you hit your limits, the sooner you consider upgrading or buying API credits. It’s not a bug, it’s a business model.

The real tradeoff

I spent a few days building my OpenCode config:

  • Custom agents with separate prompts and temperatures
  • TypeScript tools for workspace-aware GitHub, Jira, and Confluence auth
  • CLI integrations for AWS, Azure, and GitHub that let agents do real ops work
  • REST tools for structured API debugging with response storage and comparison
  • Permission rules that block destructive operations
  • Skills that inject domain context on demand
  • A plugin for adaptive thinking before OpenCode officially supported it

Claude Code requires none of this. You install it and start coding. I get the appeal of that. But “zero config” also means zero control. You can’t route tasks to cheaper models. You can’t stop it from force pushing your code. You can’t inject domain knowledge when it matters. You can’t query production logs or debug infrastructure. You can’t even get a decent UI. You just take what Anthropic gives you and hope it’s enough.

That investment compounds. Every day I save time on auth switching, avoid destructive mistakes, get faster codebase exploration from the Haiku scout, let agents handle PR workflows and ticket management, debug production issues through Application Insights and CloudWatch, and work in a UI I actually enjoy. The config encodes my engineering workflow so the tool adapts to me instead of me adapting to it.

If you’re doing serious multi-project work and you want your AI tooling to fit how you actually operate, OpenCode is worth the setup cost. For me, it wasn’t even close.

What this actually looks like in practice

I want to close with a real session, because the config and the theory only tell half the story. This isn’t a demo where I typed one prompt and went to get coffee. I was steering the whole time, but the ratio of my effort to its output was absurd.

Here’s the full conversation. My prompts are numbered.


1. “Get the remote branch for the client for the feature changes. Then open a PR for that into staging and make a PR. Clean up the changes too if it’s outdated.”

The agent immediately started investigating. It found two remote branches, ran git log on both, and figured out the situation:

  • feat/branch-1: 67 files changed, accumulated budgets, pipeline configs, problem management, form enhancements, all mixed in
  • feat/branch-2: just 2 clean commits with the actual feature work

From there it ran the whole flow autonomously:

  1. Checked how far behind staging the clean branch was (120 commits)
  2. Cherry-picked the 2 commits onto a fresh branch off current staging
  3. Cleaned up the code: any types → proper typing, hardcoded values → constants, added useMemo, simplified redundant query invalidation
  4. Fixed a pre-existing type error in an unrelated file that was blocking the commit hooks
  5. Committed and pushed (prettier, eslint, and tsc all passed in the hooks)
  6. Created a PR via Azure DevOps CLI with a full multi-section description generated from the diff

2. “Where and how does the flow get initiated?”

I wanted to understand the feature before deploying it. The agent traced the entry points and gave me a walkthrough: sidebar navigation → Agreements page → batch status check per tenant → modal flow for unsigned tenants. It noted there was no dashboard prompt or notification triggering it, just navigation. Product context I needed before moving forward.


3. “There are API changes needed. Can you check the API gateway for staging to make sure all the endpoints are there?”

It pulled the three endpoint paths from the client code, then queried Azure API Management to see what was registered. This is where it discovered there were two separate API deployments on the gateway: microsoftapi (v1) and microsoftapi-v2:

az apim api operation list --api-id microsoftapi-v2 \
  --query "[].{method:method, urlTemplate:urlTemplate}" -o table
Method    UrlTemplate
--------  ------------------------------------------
GET       /orders/account/{accountId}/{tenantId}
POST      /orders/account/{accountId}/{tenantId}/cart
GET       /products/account/{accountId}/{id}
...

No agreement endpoints on either API. All three were missing.


4. “We are using v2.”

I told it which API to target. It added the three missing operations to microsoftapi-v2 via Azure REST API calls (the CLI didn’t support the template parameter format, so it figured out the REST API on its own).


5. “Also is there a list endpoint that we need? Are you sure you got all of them? Code to APIM?”

I challenged it. It went back, diffed the backend controller routes against the APIM operations, and confirmed everything matched. It explained there was no separate list endpoint because the batch status check already returned the detail data.


6. “How do we list it in the client?”

Follow-up to understand the data flow. It explained the tenant list comes from a different API entirely, enriched with agreement status from the batch POST. No dedicated list endpoint needed.


7. “Yeah.”

Moving on.


8. “The backend doesn’t have all the changes from the feature work. Merge that into staging. Make sure staging has all the changes from main too, then push and redeploy using DevOps.”

It merged branches across two repos, resolved merge conflicts (it understood which side had the newer code), triggered a build pipeline via az pipelines run, and waited.


9. “Pipeline failed.”

The agent pulled the build logs via the Azure DevOps REST API, found a CS0246 error: a class called UniversalAppContext didn’t exist in the version of the shared library. Here’s where the domain understanding got interesting. It traced the dependency across repos:

  • Main branch had refactored PartnerCenterAppContextUniversalAppContext using shared library version 3.0.20
  • Feature branch still used PartnerCenterAppContext with shared library version 3.0.22
  • Version 3.0.22 didn’t have UniversalAppContext because the feature branch forked before that refactor

10. “Can you build locally? Feel free to clone the common library down.”


11. “You can clone the project and reference it locally if you want, it’s in a repo for us.”

It swapped the NuGet package reference for a local project reference (there was already a commented-out ProjectReference in the .csproj, which it noticed), confirmed the build worked locally, then figured out the real fix: merge the library’s main into the feature branch to bring in UniversalAppContext.

It did that, triggered a new NuGet package build on the merged branch, and polled the Azure Artifacts feed until the new version appeared:

az rest --method get --url ".../Feeds/NUGET-PACKAGES/packages?
  packageNameQuery=common-lib" | python3 -c "..."
common-lib: 3.0.23

Updated the backend .csproj to reference 3.0.23, swapped back from the local project reference to the NuGet reference, pushed, triggered the deploy pipeline. Pipeline passed.


12. “The acceptance status is returning a 404, why?”

The agent tested the endpoint through APIM and investigated. It queried both API configurations and discovered the versioning setup:

// microsoftapi-v2
{ "apiVersion": "v2", "path": "ms-api" }

// microsoftapi (v1)
{ "apiVersion": null, "path": "ms-api" }

// Version set
{ "versioningScheme": "Segment" }

Both APIs shared the same base path (ms-api) with segment-based versioning. The v2 API lived at /ms-api/v2/..., but the client was calling /ms-api/account/... with no version segment, so requests fell through to v1 which didn’t have the new operations. That’s why the 404.

I’d told it to add endpoints to v2, but the client was hitting v1. It added the operations to both APIs to be safe, tested again, and got a 200 through the gateway.


13. “For every tenant it says the method or operation is not implemented. Where is that coming from? Can you find it in the logs?”

This is where it got really smart. The agent queried Application Insights with KQL:

exceptions
| where timestamp > ago(10m)
| where operation_Name contains 'agreement'
    or innermostMessage contains 'implemented'
| project timestamp, outerMessage, outerMethod, outerAssembly

And got back:

System.NotImplementedException
"The method or operation is not implemented."
common-lib, Version=1.0.0.0
PartnerCenterAppContext.SelectableClientHttpRequest

It immediately connected the dots: the old PartnerCenterAppContext in the shared library had SelectableClientHttpRequest as a stub (throw new NotImplementedException()). The UniversalAppContext from the library merge had the real implementation. The NuGet package 3.0.23 with the fix had been published, but the previous deploy was still running on 3.0.22. It updated the reference, pushed, redeployed, tested the endpoint again, and it worked.

The chain of reasoning here was: production exception → Application Insights KQL → stack trace pointing to a specific method in a specific assembly → that method is a stub on one branch but implemented on another → the deployed version is using the wrong branch’s package → update the NuGet reference and redeploy. All autonomous.


14. “There’s an issue with the delete button on the subscriptions page. When you click delete, it throws a bunch of request errors. It shouldn’t query price data for deleted products. Can you just create a work item for that? I think it’s a bug. Assign it to me.”

az boards work-item create --type Bug \
  --title "Subscriptions page: deleting a product triggers
           unnecessary price data requests" \
  --description "When clicking delete on a subscription,
                 the UI fires off price data queries for
                 products being deleted. These requests fail.
                 The delete flow should skip fetching price
                 data for products that are being removed." \
  --assigned-to "[email protected]"
ID: 744
Type: Bug
State: New
Assigned To: Dirk

The tally

14 prompts from me. 366 tool calls from the agent. Here’s what it handled:

  • Git: fetch, branch comparison, cherry-pick, merge conflict resolution, push across two repos
  • Code: type fixes, constant extraction, useMemo, query invalidation cleanup, pre-existing type error fix
  • PR: created via Azure DevOps CLI with a generated multi-section description
  • API Management: discovered two separate API deployments (v1 and v2), identified missing endpoints on both, added operations via Azure REST API, diagnosed segment-based versioning mismatch, fixed routing on both APIs
  • Build pipelines: triggered three separate builds, monitored status, pulled failure logs via DevOps REST API
  • NuGet: traced a version mismatch across two repo branches, identified a commented-out project reference in the .csproj, merged library branches to unify the code, published version 3.0.23, polled the artifact feed to confirm, updated the consumer reference
  • Production debugging: queried Application Insights with KQL, found System.NotImplementedException, traced it through the parsed stack to PartnerCenterAppContext.SelectableClientHttpRequest in the shared library, connected it to the branch/version mismatch it had already solved
  • Endpoint testing: tested through the API gateway after every change (got 403 from direct access, 404 from APIM, then 200, then the NotImplementedException, then finally success)
  • Work items: created a bug ticket with description, classification, and assignment

The agent understood NuGet package versioning across repo branches, Azure API Management segment-based routing with dual API deployments, the relationship between two repos sharing a library through an artifact feed, how build pipelines publish packages, and how to trace a production exception back through Application Insights to a specific method stub in a specific version of a specific dependency.

This is what I mean when I say the setup compounds. None of this is possible with a vanilla coding agent that can only read and write files. The custom tools, the CLI access, the workspace auth, the permission guardrails: they all combine to let the agent operate across the full stack of your actual engineering workflow. Not just the code, but the infrastructure, the pipelines, the tickets, the deployments.

That’s the difference between a tool that helps you write code and a tool that helps you ship software.