Codi-E mascot

Codi-E Blog

Building the Executor

350 lines of bash that ship code while you sleep.

· 18 min read

In the last post, I described the architecture for autonomous coding agents — how they pick up GitHub issues, write code, and create pull requests. That post was the blueprint. This one is the construction site.

We built the executor. It works. It found a GitHub issue, ran an AI model to implement it, pushed the code, and created a pull request in 20 seconds. The total cost was eight cents.

It also broke in three different ways before it worked. This post covers all of it: the script, the bugs, the fixes, and what we learned.

What Is an Executor?

Imagine you have a to-do list for your software project. Each item describes something that needs to change: "add a loading spinner to the login page" or "fix the typo in the settings screen." Normally, a developer reads the item, opens the code, makes the change, and sends it for review.

Illustration of a teal mascot character orchestrating an automated coding pipeline from a laptop

An executor is a script that does this automatically. It reads the to-do list (GitHub issues), picks the next item, hands it to an AI coding agent, and sends the result for human review (a pull request). It runs in a loop, 24 hours a day, checking for new work.

Our executor is a bash script — the same kind of script you might use to back up files or automate a build. It is not a framework or a platform. It is 350 lines of code that ties together tools you already have: GitHub for the to-do list, Claude Code for the AI brain, and Telegram for notifications.

   GitHub Issues               Executor Script              Pull Request
 ┌────────────────┐       ┌────────────────────┐       ┌────────────────┐
 │  #5: Add header │──────▶│  1. Find issue     │──────▶│  Changes ready  │
 │  label: agent   │       │  2. Claim it       │       │  for review     │
 │                 │       │  3. Clone repo     │       │                 │
 │  #12: Fix bug   │       │  4. Run Claude     │       │  "Closes #5"    │
 │  label: agent   │       │  5. Push + PR      │       │                 │
 └────────────────┘       │  6. Clean up       │       └────────────────┘
                           │  7. Repeat         │
                           └────────────────────┘
                                    │
                                    ▼
                              📱 Telegram
                            "PR created!"

The Loop

The executor runs a poll-claim-execute-report loop. Every 60 seconds, it asks GitHub: "are there any issues labeled agent-ready that nobody has claimed yet?" If yes, it picks the oldest one and gets to work.

Here is the main loop, stripped to its essentials:

while true; do
    # Check if someone asked us to stop
    if [[ "$SHUTDOWN_REQUESTED" == "true" ]]; then
        log "Shutdown complete"
        exit 0
    fi

    # Ask GitHub for the next unclaimed issue
    issue_json="$(find_next_issue)" || {
        sleep 60
        continue
    }

    # Process it: claim → clone → run Claude → PR
    process_issue "$issue_json"
done

That find_next_issue function uses the GitHub Search API to look across all seven of our repositories at once. It is like typing into GitHub's search bar, but from a script:

find_next_issue() {
    # Build a search across all repos
    # repo_query = "repo:owner/app-1 repo:owner/app-2 ..."
    local repo_query
    repo_query="$(get_repo_full_names)"

    gh api --method GET search/issues \
        -f q="label:agent-ready state:open ${repo_query}" \
        -f sort="created" \
        -f order="asc" \
        -f per_page="1"
}

The --method GET is important (and the source of our first bug — more on that later).

Claiming an issue

Before the executor starts working, it claims the issue so no other executor picks it up at the same time. Claiming means three things:

  1. Post a comment: "Claimed by executor-1. Starting implementation."
  2. Remove the agent-ready label
  3. Add the agent-claimed label

If something goes wrong later, the issue gets an agent-failed label and a comment explaining what happened. A human can then decide whether to retry or fix it manually.

Setting up the workspace

Each issue gets a fresh, isolated workspace. The executor clones the repository into /tmp/executor-workspaces/ and creates a feature branch:

setup_workspace() {
    local repo_full="$1" issue_number="$2"
    local workspace="/tmp/executor-workspaces/${repo_name}-${issue_number}"

    # Fresh clone every time
    git clone "https://github.com/${repo_full}.git" "$workspace" --depth=50
    git -C "$workspace" checkout -b "feature/issue-${issue_number}"

    echo "$workspace"
}

The --depth=50 keeps the clone fast — we only need recent history, not the entire repository from 2019.

Graceful shutdown

If you press Ctrl+C or send a stop signal, you do not want the executor to die mid-task, leaving half-finished branches and confused labels. The script uses a signal handler — a way to say "when you get a stop signal, finish what you are doing first":

SHUTDOWN_REQUESTED=false

shutdown_handler() {
    log "Shutdown requested - will finish current issue and exit"
    SHUTDOWN_REQUESTED=true
}

trap shutdown_handler SIGTERM SIGINT

The loop checks this flag between issues. If it is set, the executor finishes the current issue, sends a Telegram message ("executor stopped"), and exits cleanly.

Keeping It Safe

An AI coding agent with unrestricted access is a liability. Ours can read and write code files. It can run git, xcodebuild, and swift. It cannot do anything else. No curl. No rm -rf. No sudo. No installing packages. No network access beyond what git needs.

This is enforced by Claude Code's permission system. The --permission-mode dontAsk flag means: if a tool is not on the allowlist, deny it silently. Do not ask the user (there is no user — it is running unattended). Do not try an alternative. Just skip it.

ALLOWED_TOOLS='Read,Write,Edit,Glob,Grep,Bash(git *),Bash(xcodebuild *),Bash(swift *),Bash(npm *),Bash(ls *),Bash(mkdir *),Bash(cat *),Bash(head *),Bash(tail *),Bash(wc *),Bash(diff *),Bash(which *)'

claude -p "$prompt" \
    --model sonnet \
    --max-budget-usd 2.00 \
    --permission-mode dontAsk \
    --allowedTools "$ALLOWED_TOOLS" \
    --output-format json

Let me unpack what each safety flag does:

  • --permission-mode dontAsk — Deny-by-default. Only tools on the allowlist can run.
  • --allowedTools — The explicit list. Bash(git *) means "any git command is fine." Bash(rm *) is not on the list, so it will be blocked.
  • --max-budget-usd 2.00 — Hard spending cap per issue. If Claude hits $2.00 in API costs, it stops. For simple issues, this is generous. For complex ones, we use agent-opus labels that bump this to $10.
  • --output-format json — Returns structured data with token counts, cost, and session ID. This feeds our cost tracking CSV.

Why not --dangerously-skip-permissions? That flag exists and it is tempting. It allows everything. But "everything" includes sending HTTP requests, modifying system files, and running arbitrary shell commands. Using an explicit allowlist is more work to set up, but you only have to think about security once. With skip-permissions, you have to worry about it forever.

Model selection

By default, the executor uses Claude Sonnet — fast, capable, and cheap ($3 per million input tokens). For issues that need more reasoning power, we add an agent-opus label. The executor detects this and switches to Claude Opus with a higher budget:

local model="sonnet"
local max_budget="2.00"

if echo "$labels" | grep -q "agent-opus"; then
    model="opus"
    max_budget="10.00"
fi

Building the Prompt

The prompt is everything. It is the only instruction the AI gets. A vague prompt produces vague code. A good prompt produces code that follows your conventions, references the issue number in commits, and does not try to "improve" things you did not ask about.

Here is our prompt template (simplified):

You are an autonomous code agent working on GitHub issue
#${issue_number} in the ${repo} repository.

## Issue
**Title:** ${title}
**Description:** ${body}

## Instructions
1. Read the issue requirements carefully
2. Explore the codebase to understand existing patterns
3. Implement the changes described in the issue
4. Follow existing code conventions
5. Make focused, minimal changes
6. If the repo has a CLAUDE.md, read it first
7. Commit with conventional commits referencing the issue

## Rules
- Do NOT create files unless the issue requires it
- Do NOT refactor code beyond what the issue asks
- Do NOT add comments to unchanged code
- If you cannot complete the task, explain why

A few things to note:

  • "Read the CLAUDE.md" — Every repository has a CLAUDE.md file with project-specific instructions. The agent reads this first to learn about the tech stack, folder structure, and coding conventions. This is how we teach the agent about each app without putting everything in the prompt.
  • "Do NOT refactor" — Without this rule, the agent will helpfully restructure nearby code, add type annotations, and "improve" error handling. This produces larger diffs, harder reviews, and more CI failures. Ship the minimum. Improve later.
  • Conventional commits — Messages like feat(scope): description (#5). These feed into automated changelogs and release notes.

The issue body comes directly from GitHub. If the issue is well-written — clear task, acceptance criteria, links to relevant files — the agent produces better code. Garbage in, garbage out. This is true for humans too. The difference is a human will ask clarifying questions. The agent will just guess.

Three Bugs That Stopped Everything

The executor did not work on the first try. Or the second. Here are the three bugs we hit, in the order we found them, and what they teach about building automation.

Bug 1: The search that always failed

The executor polls GitHub every 60 seconds, looking for issues labeled agent-ready. On every single poll, it logged:

[WARN] GitHub search API call failed

We had a test issue sitting right there with the correct label. The executor never found it.

The root cause: the gh command-line tool sends a POST request by default when you pass -f (form data) parameters. But GitHub's search API only accepts GET requests. A POST to the search endpoint returns 404 — "not found."

# Broken (sends POST, gets 404)
gh api search/issues -f q="label:agent-ready ..."

# Fixed (explicitly sends GET)
gh api --method GET search/issues -f q="label:agent-ready ..."

This bug was invisible because we had 2>/dev/null suppressing the error. The 404 response was being thrown away. Lesson: never suppress errors while developing. Add 2>/dev/null after the code works, not before.

Bug 2: The function that lied about its return value

After fixing the search, the executor found the issue and started cloning the repository. Then it crashed immediately:

[INFO] Cloning my-org/count-e into /tmp/executor-workspaces/count-e-5
[INFO] Running Claude (model=sonnet, budget=$2.00)
[ERROR] Claude exited with code 1 after 0s

Zero seconds. Claude did not even start. The error message from bash was revealing:

cd: [2026-02-20 13:23:51] [INFO] Cloning my-org/count-e
/tmp/executor-workspaces/count-e-5: No such file or directory

Look carefully. Bash is trying to cd into a path that starts with a timestamp and a log message. The workspace path has a log line stuck in front of it.

Root cause: our log() function used tee, which writes to both a file AND standard output. When setup_workspace() called log(), the log message went to stdout. When the calling code captured the function's output with $(setup_workspace ...), it got both the log message and the actual return value mixed together.

# Broken: tee writes to stdout, polluting return values
log() {
    echo "[$timestamp] [$level] $msg" | tee -a "$LOG_FILE"
}

# Fixed: write to file and stderr separately
log() {
    echo "[$timestamp] [$level] $msg" >> "$LOG_FILE"
    echo "[$timestamp] [$level] $msg" >&2
}

In bash, functions "return" values by printing to stdout. If your logging also prints to stdout, the two get mixed. This is a classic bash pitfall. The fix is to send logs to stderr (>&2) which is reserved for diagnostic messages.

Bug 3: The environment variable that followed us home

This one was subtle. Claude Code sets an environment variable called CLAUDECODE=1 when it is running. If you try to start another Claude Code instance from inside an existing one, it detects this variable and refuses to start — a sensible safety measure to prevent infinite nesting.

The problem: we were running the executor script from inside a Claude Code terminal session. Even though our script does unset CLAUDECODE at the top, the child claude process sometimes inherited the variable before the unset took effect.

# At the top of executor.sh
unset CLAUDE_CODE_ENTRYPOINT 2>/dev/null || true
unset CLAUDECODE 2>/dev/null || true

# In run_claude() - belt AND suspenders
output="$(cd "$workspace" && env -u CLAUDECODE -u CLAUDE_CODE_ENTRYPOINT \
    claude -p "$prompt" ...)"

The env -u command explicitly strips the variables from the child process's environment. It is redundant if the unset worked, but it costs nothing and guarantees the variable is gone.

The meta-lesson: All three bugs were invisible at the call site. The search returned 404 silently. The workspace path was corrupted silently. The environment variable was inherited silently. When building automation, make failure loud. Suppress noise later, not now.

The First Successful Run

After fixing all three bugs, we created a test issue on one of our repositories:

Title: docs: add comment header to README.md
Label: agent-ready

## Task
Add a comment at the very top of README.md with the text:
<!-- Managed by Invotek AS - https://dashecorp.com -->

## Acceptance Criteria
- README.md has the HTML comment as the first line
- No other changes to the file

A deliberately simple issue. We wanted to verify the pipeline, not test the AI's coding ability.

Here is the complete log from the successful run:

[13:24:39] Executor started: executor-1
[13:24:40] Processing: my-org/count-e#5 - docs: add comment header to README.md
[13:24:40] Claiming issue #5
[13:24:42] Cloning my-org/count-e into /tmp/executor-workspaces/count-e-5
[13:24:43] Running Claude (model=sonnet, budget=$2.00)
[13:24:56] Claude completed in 13s
[13:24:56] Claude made 1 commit(s)
[13:24:56] Pushing branch feature/issue-5
[13:24:59] PR created: https://github.com/my-org/count-e/pull/6
[13:25:00] Cleaned up workspace
[13:25:00] Issue processing complete

20 seconds from issue detection to pull request. 13 of those seconds were Claude reading the codebase, understanding the task, editing the file, and committing the change. The rest was git operations.

The resulting pull request:

Title: docs: add Invotek AS comment header to README.md
Body:
  Closes #5

  ## Changes
  - docs: add Invotek AS comment header to README.md (#5)

  ## Agent Details
  Model: sonnet | Tokens: 6 in / 421 out | Cost: $0.08
  Implemented by executor-1

The PR references the issue (Closes #5), includes a conventional commit message, and logs the model, token usage, and cost. All automated.

Cost and Numbers

We track every executor run in a CSV file for analysis:

timestamp,repo,issue,model,input_tokens,output_tokens,cost_usd,duration_s,status
2026-02-20 13:25:00,my-org/count-e,5,sonnet,6,421,0.08,20,success

For this test issue:

  • Model: Claude Sonnet — fast, cheap, plenty capable for simple tasks
  • Input tokens: 6 (mostly the codebase context was cached)
  • Output tokens: 421 (the edit + commit message)
  • Cost: $0.08 (eight cents)
  • Duration: 20 seconds total, 13s for Claude

This was a trivial issue. Realistic feature work costs more — our budget is $2 per issue for Sonnet, $10 for Opus. Based on early testing and industry data:

  • Simple fix (1 file, clear description): $0.05 – $0.50
  • Moderate feature (2-5 files): $1.00 – $5.00
  • Complex refactor (10+ files): $3.00 – $10.00

At 5 issues per day on Sonnet, that is roughly $50 – $150 per month in API costs. The compute cost (the Mac running the executor) is electricity — about $5/month.

What Comes Next

The executor works. It is Phase 1 — a single loop running on a MacBook. No VMs, no containers, no fleet management. Here is what is left:

Immediate improvements

  • Scoped GitHub PAT — Right now, the executor uses my personal authentication. We need a fine-grained Personal Access Token with only the permissions the executor needs: read/write on issues, pull requests, and repository contents. Nothing else. No admin. No settings. No secrets.
  • Failure recovery — If Claude fails on an issue, should we retry with a different model? With a higher budget? The current behavior is to label it agent-failed and notify via Telegram. Automatic retry logic would reduce manual intervention.
  • Better prompts — The prompt template is generic. Per-repository prompt templates could include specific testing commands, build steps, and architecture notes. This is essentially giving the agent a cheat sheet for each project.

Phase 2: Multiple executors

When we move to a dedicated Mac Mini M4 Pro, we can run 2–3 executors in parallel using Tart VMs (lightweight macOS virtual machines). Each executor gets its own isolated environment — its own filesystem, its own network restrictions, its own GitHub token. If one crashes or goes rogue, the others are unaffected.

Phase 3: Smart routing

Not all issues are equal. A typo fix should use Sonnet with a $0.50 budget and finish in 10 seconds. A complex feature should use Opus with a $10 budget and take 5 minutes. Phase 3 classifies issues before assigning them, picking the right model and budget for each task.

The executor is not magic. It is a bash loop that calls an API. The magic is in the API — a language model that can read code, understand what needs to change, and write a commit. Our job is to put guardrails around that magic so it works reliably, safely, and cheaply. Three bugs in, one pull request out. Not a bad ratio for a first attempt.

Build Your Own

The sections above walk through the design and the bugs. This section is the practical checklist — everything you need to go from zero to a working executor.

Prerequisites

You need four command-line tools installed:

# GitHub CLI - for searching issues and creating PRs
brew install gh
gh auth login

# jq - for parsing JSON in bash
brew install jq

# Claude Code - the AI coding agent
npm install -g @anthropic-ai/claude-code

# curl - for Telegram notifications (usually pre-installed)
curl --version

Claude Code requires either a Claude Pro/Max subscription or an Anthropic API key. If you set the ANTHROPIC_API_KEY environment variable, Claude Code bills against the API directly. Otherwise, it uses your subscription.

Configuration files

The executor reads a repos.json file to know which repositories to monitor. Create this in the same directory as the script:

{
  "repos": [
    {
      "name": "my-app",
      "url": "https://github.com/my-org/my-app.git",
      "description": "My iOS app"
    },
    {
      "name": "my-backend",
      "url": "https://github.com/my-org/my-backend.git",
      "description": "API server"
    }
  ]
}

For Telegram notifications (optional but recommended), create ~/.claude/telegram-secrets.json:

{
  "bot_token": "123456789:ABCdefGhIjKlMnOpQrStUvWxYz",
  "chat_id": "987654321"
}

To create a Telegram bot: message @BotFather on Telegram, use /newbot, and it will give you the token. To find your chat ID: message @userinfobot.

For a scoped GitHub token (optional, recommended for security), create ~/.claude/executor-secrets.json:

{
  "github_pat": "github_pat_xxxxxxxxxxxx"
}

Create a fine-grained PAT with only: Issues (read/write), Pull requests (read/write), and Contents (read/write) on the specific repos the executor manages. No admin, no settings, no secrets.

Create the labels

Every repository the executor monitors needs four labels. Create them once:

#!/bin/bash
# setup-labels.sh - Run once per repo

REPOS=("my-org/my-app" "my-org/my-backend")

for repo in "${REPOS[@]}"; do
    echo "Creating labels on $repo..."
    gh label create "agent-ready"   --repo "$repo" --color "0E8A16" --description "Ready for autonomous agent" --force
    gh label create "agent-claimed" --repo "$repo" --color "FBCA04" --description "Claimed by agent executor"  --force
    gh label create "agent-failed"  --repo "$repo" --color "D93F0B" --description "Agent failed to implement"  --force
    gh label create "agent-opus"    --repo "$repo" --color "5319E7" --description "Use Opus model (higher budget)" --force
done

Running the executor

The executor runs in the foreground and logs to both stderr and ~/.claude/executor.log. Use tmux or screen to keep it running after you close your terminal:

# Start in a tmux session
tmux new -s executor
./executor.sh

# Detach: Ctrl+B, then D
# Reattach later: tmux attach -t executor

# Or run in the background
nohup ./executor.sh &

# Stop gracefully (finishes current issue, then exits)
kill $(pgrep -f executor.sh)

Testing it

Create a simple test issue on one of your repos:

gh issue create \
    --repo my-org/my-app \
    --title "docs: add comment header to README.md" \
    --body "Add <!-- Managed by My Org --> as the first line of README.md" \
    --label "agent-ready"

Then start the executor and watch the logs. Within 60 seconds (the poll interval), it should pick up the issue, clone, run Claude, push a branch, and create a PR. You will get a Telegram message when it is done.

Complete script: The code snippets in this post are excerpts. The full executor.sh is about 350 lines. The pieces above — the loop, the search, the workspace setup, the safety flags, the prompt, the PR creation — are the essential building blocks. Wire them together in the order shown in The Loop section: find_next_issueclaim_issuesetup_workspacerun_claude → push → PR → cleanup → repeat.