Switching an AI Agent from API to Subscription

Pi-E had been running on Gemini 2.5 Flash for a week. Fast, cheap, conversational. Then the rate limit errors started. Not occasionally — constantly. Every other message returned 429 RESOURCE_EXHAUSTED. Conversations died mid-sentence.

The fix was not a bigger quota. It was not a different Gemini model. It was replacing the entire language model backend with a Claude subscription, authenticating via OAuth tokens transferred from another machine to a Raspberry Pi inside a Docker container. And it worked — after we learned that one wrong field name in a JSON file is enough to make authentication silently fail.

This is the story of that migration: the root cause analysis, the dead ends, and the eventual fix.

The Symptoms: Death by Thinking Token

Pi-E runs on OpenClaw, an open-source agent framework, inside a Docker container on a Raspberry Pi 4. Her default model was Gemini 2.5 Flash, accessed through the Gemini API on a pay-as-you-go plan.

The first sign of trouble was intermittent. A message would fail, the next would succeed. Then the failures became more frequent. Then they became the norm:

429 RESOURCE_EXHAUSTED
"Quota exceeded for quota metric 'Generate Content API requests per minute'
with base quota limit '1000000 tokens per minute'"

One million tokens per minute sounds generous. For a chatbot exchanging short messages, it should be plenty. But Gemini 2.5 Flash is a "thinking" model. It generates internal reasoning tokens that are invisible in the response but counted against the quota.

We confirmed this with controlled tests:

Request	Visible tokens	Actual tokens (est.)	Result
Simple prompt, 0 tools	~500	~2,000	200 OK
Simple prompt, 5 tools	~3,000	~15,000	200 OK
System prompt + 50 tools	~15,000	~150,000+	429 Error

The thinking multiplier was roughly 5–10x. A request that looked like 15,000 tokens was actually consuming 100,000–150,000 thinking tokens internally. And OpenClaw, running in "full" tool mode, sends dozens of tool definitions with JSON schemas in every single request. Add the system prompt, the conversation history, and the thinking overhead, and a single message was burning through a significant chunk of the 1M TPM quota.

Two or three messages in quick succession — a normal conversation pace — would exhaust the entire minute's budget.

The hidden cost of thinking models. With thinking models like Gemini 2.5 Flash, the token count you see in the response is a fraction of the actual computation. Tool definitions, system prompts, and conversation history are all amplified by the thinking multiplier. A chatbot that works fine with a simple prompt can hit rate limits the moment you give it a real tool set.

Three Dead Ends

Before switching models entirely, we tried to fix the problem within the Gemini ecosystem. Each attempt failed for a different reason.

Dead End 1: Switch to a Non-Thinking Model

Gemini 2.0 Flash does not have thinking tokens. It would use the quota efficiently. Unfortunately:

404 NOT_FOUND
"This model models/gemini-2.0-flash is no longer
available to new users."

Google had deprecated it. The only remaining non-thinking option was Gemini 2.5 Flash Lite. We tried it. The responses were noticeably worse — vague, generic, missing context. The agent's owner said it bluntly: "Pi-E seems really stupid now compared to earlier." We reverted within the hour.

Dead End 2: Reduce the Tool Set

If tool definitions are the biggest payload, send fewer tools. OpenClaw's config supports a tools.deny list that blocks specific tools from executing. We added a long deny list.

It did not help. The deny list controls which tools the agent can execute, but OpenClaw still sends all tool definitions to the model in every request. The API payload — and therefore the token count — was unchanged.

OpenClaw also defines tool "profiles" in its source code: minimal, coding, messaging, and full. But the config validator only accepts full. Setting "profile": "messaging" returns a validation error:

tools.profile: Invalid input

The feature exists in the code but is not exposed in the configuration schema.

Dead End 3: Reset the Session

OpenClaw sends the full conversation history with every API request. A week of chatting had grown the session JSONL file to over a megabyte. Maybe the accumulated history was the problem.

We truncated the session files and created fresh session IDs. Token counts dropped. But the rate limits continued, because the tool definitions alone — without any conversation history — were enough to exceed the quota with the thinking multiplier.

The root cause was not the session size. It was the model's architecture.

The Subscription Idea

With Gemini's rate limits unsolvable at their source, we needed a different model. The question was how to pay for it.

An Anthropic API key would work — Claude has no hidden thinking token multiplier on its standard models, and the tool definition payload would be handled without quota issues. But API access means pay-per-token billing. For an agent that runs 24/7 and receives dozens of messages daily, costs add up.

There was already a Claude Max subscription ($200/month) running on another machine. That subscription includes generous usage of Claude through official clients, including the Claude Code CLI. The question was: could we use that subscription to power Pi-E?

The answer turns out to be yes. OpenClaw natively supports Anthropic as a model provider, and its authentication system understands OAuth tokens — the same tokens that the Claude Code CLI uses when authenticated via a subscription.

Before:
  Pi-E → Gemini API (API key) → 429 errors

After:
  Pi-E → Anthropic API (OAuth token) → Claude Sonnet
                   ↑
         Token from Claude Max subscription
         (auto-refreshes via OAuth)

The challenge was getting the OAuth credentials from the machine where the subscription was authenticated to the Raspberry Pi running inside a Docker container.

The OAuth Puzzle

Claude Code CLI stores its OAuth credentials in the macOS Keychain. When you run claude auth login and sign in through a browser, the CLI saves an access token, a refresh token, and an expiry timestamp. On macOS, these live in a keychain entry called Claude Code-credentials.

You can extract them:

security find-generic-password \
  -s "Claude Code-credentials" -w

This returns a JSON blob containing several credential sets. The relevant one for a Claude subscription is the claudeAiOauth entry:

{
  "claudeAiOauth": {
    "accessToken": "sk-ant-oat01-...",
    "refreshToken": "sk-ant-ort01-...",
    "expiresAt": 1772143667732,
    "scopes": ["user:inference", ...],
    "subscriptionType": "max",
    "rateLimitTier": "default_claude_max_20x"
  }
}

The first instinct was to install Claude Code CLI inside the Pi's Docker container and run claude auth login there. This failed immediately — the container has no TTY, no browser, and no way to complete the OAuth flow interactively. Even with docker exec -it, the CLI detected the non-interactive environment and refused to start the flow.

The second instinct was to capture the OAuth URL and send it to a phone. We installed the CLI in the container, started the auth flow in non-interactive mode, captured the OAuth redirect URL, and sent it via Telegram. The URL was 300+ characters long and got mangled by Telegram's link preview parser. The browser showed: Invalid OAuth Request - Missing client_id parameter.

The third approach worked: skip the CLI entirely and write the credentials directly.

OpenClaw stores authentication in a file called auth-profiles.json inside each agent's directory. We extracted the OAuth tokens from the Mac's keychain, constructed the JSON in the correct format, and transferred it to the Pi via SSH and base64 encoding (to avoid shell escaping issues with the long token strings).

# On the Mac: extract, encode, and transfer
FIXED_JSON='{"version":1,"profiles":{...}}'
ENCODED=$(echo "$FIXED_JSON" | base64)

# On the Pi: decode and write into the container
ssh pi "echo '$ENCODED' | base64 -d | \
  docker exec -i container-name tee \
  /path/to/auth-profiles.json > /dev/null"

Then we updated the OpenClaw config to use Anthropic as the model provider:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-6"
      }
    }
  }
}

Restart the container. Check the logs. Wait for a test message.

[diagnostic] lane task error:
  "No API key found for provider 'anthropic'.
   Auth store: .../auth-profiles.json"

It did not work.

The Field Name That Broke Everything

The auth-profiles.json file existed. It was valid JSON. It had the right profile ID (anthropic:claude-cli), the right provider (anthropic), and the right token type (oauth). The error said "no API key found" anyway.

We dug into the source code. OpenClaw is compiled to bundled JavaScript, but the function names and comments survive minification. The relevant check was in the auth profile validation:

// How OpenClaw validates an OAuth profile
if (cred.type === "oauth")
  return Boolean(cred.access?.trim()
              || cred.refresh?.trim())

The validator checks for fields called access and refresh.

Our auth-profiles.json used fields called token and refreshToken.

This was the entire problem. Two field names.

The Claude Code CLI stores credentials in macOS Keychain with the names accessToken and refreshToken. OpenClaw's internal auth store uses access and refresh. There is no error message for "your OAuth profile has unrecognized fields." The validator simply treats the profile as empty because the expected fields are undefined, and the downstream code reports "no API key found" as if the entire file were missing.

Source	Access token field	Refresh token field
Claude CLI (macOS Keychain)	`accessToken`	`refreshToken`
OpenClaw auth-profiles.json	`access`	`refresh`

The fix was a one-line change — well, a two-field rename:

// Before (broken)
{
  "type": "oauth",
  "provider": "anthropic",
  "token": "sk-ant-oat01-...",
  "refreshToken": "sk-ant-ort01-..."
}

// After (working)
{
  "type": "oauth",
  "provider": "anthropic",
  "access": "sk-ant-oat01-...",
  "refresh": "sk-ant-ort01-..."
}

Restart the container. Send a test message. Watch the logs:

[agent/embedded] embedded run start:
  provider=anthropic
  model=claude-sonnet-4-6
  thinking=low
  messageChannel=telegram

[agent/embedded] embedded run done:
  durationMs=2312
  aborted=false

2.3 seconds. No errors. Pi-E was talking through Claude.

The Result

Pi-E now runs on Claude Sonnet 4.6 via a Claude Max subscription. The before and after:

	Before (Gemini)	After (Claude)
Model	Gemini 2.5 Flash	Claude Sonnet 4.6
Auth	API key (GCP)	OAuth (subscription)
Rate limits	1M TPM (hit constantly)	Subscription tier (not hit)
Cost	~$1.50/month API	$0 incremental (part of existing subscription)
Response quality	Good (when it worked)	Better (consistent)
Response time	~1s (when it worked)	~2.5s

The response time is slightly higher — Gemini Flash is fast when it is not rate-limited. But a response that takes 2.5 seconds and arrives every time beats a response that takes 1 second but fails half the time.

Token Refresh

OAuth access tokens expire. The one we transferred had about six hours of life remaining. But OpenClaw has built-in OAuth refresh logic — when it detects an expired token, it uses the refresh token to obtain a new access token automatically. Anthropic is registered as an OAuth provider in the refresh handler. No manual intervention needed.

Persistence

The Docker container on the Pi uses a bind mount: the container's /home/node/.openclaw maps to /home/claude/.openclaw on the host filesystem. The auth-profiles.json lives on the Pi's SD card, not inside the container's ephemeral filesystem. Container restarts, image rebuilds, and even full docker compose down && up cycles preserve it.

Do It Yourself

If you want to run an OpenClaw agent on a Claude subscription instead of an API key, here is the complete procedure. You need two machines: one where Claude Code CLI is already authenticated (your laptop), and one running OpenClaw (your server or Pi).

Step 1: Extract OAuth Credentials

On the machine where Claude Code CLI is logged in, extract the credentials from the keychain.

macOS:

security find-generic-password \
  -s "Claude Code-credentials" -w \
  | python3 -c "
import sys, json
creds = json.load(sys.stdin)
oauth = creds['claudeAiOauth']
print(json.dumps(oauth, indent=2))
"

Linux:

# Credentials are stored in:
# ~/.claude/.credentials
cat ~/.claude/.credentials | python3 -c "
import sys, json
creds = json.load(sys.stdin)
oauth = creds['claudeAiOauth']
print(json.dumps(oauth, indent=2))
"

You will get a JSON object with accessToken, refreshToken, expiresAt, subscriptionType, and rateLimitTier. Save these values.

Step 2: Build the auth-profiles.json

Create a JSON file using OpenClaw's field names. This is the critical step — the field names must be access and refresh, not what the Claude CLI uses.

{
  "version": 1,
  "profiles": {
    "anthropic:claude-cli": {
      "type": "oauth",
      "provider": "anthropic",
      "access": "<YOUR_accessToken_HERE>",
      "refresh": "<YOUR_refreshToken_HERE>",
      "expires": <YOUR_expiresAt_HERE>,
      "email": "your-email@example.com",
      "subscriptionType": "max",
      "rateLimitTier": "<YOUR_rateLimitTier_HERE>"
    }
  }
}

Field name mapping. The Claude CLI stores accessToken → use as access. The CLI stores refreshToken → use as refresh. The CLI stores expiresAt → use as expires. Getting these wrong will produce a "No API key found" error with no further explanation.

Step 3: Transfer to OpenClaw

Copy the file to OpenClaw's agent auth directory. The exact path depends on your setup:

# If OpenClaw runs directly on the host:
cp auth-profiles.json \
  ~/.openclaw/agents/main/agent/auth-profiles.json

# If OpenClaw runs in Docker with a bind mount:
cp auth-profiles.json \
  /path/to/mounted/.openclaw/agents/main/agent/auth-profiles.json

# If transferring to a remote machine via SSH:
ENCODED=$(base64 < auth-profiles.json)
ssh your-server "echo '$ENCODED' | base64 -d > \
  ~/.openclaw/agents/main/agent/auth-profiles.json"

The base64 approach avoids shell escaping issues with the long token strings.

Step 4: Update OpenClaw Config

Edit openclaw.json to use Anthropic as the model provider:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-6"
      }
    }
  }
}

Available models include anthropic/claude-sonnet-4-6, anthropic/claude-haiku-4-5, and anthropic/claude-opus-4-6. Choose based on your quality and speed requirements.

Step 5: Update Sessions

If you have existing sessions, they may have model overrides from the previous provider. Check sessions.json and clear any modelOverride or providerOverride fields, or simply delete the sessions to start fresh.

Step 6: Restart and Verify

# Restart OpenClaw (Docker example)
docker compose restart

# Check the logs for the model line
docker compose logs --tail 20
# Look for: agent model: anthropic/claude-sonnet-4-6

# Send a test message and check for errors
docker compose logs -f
# Look for: embedded run done ... isError=false

If you see "No API key found for provider anthropic," double-check the field names in auth-profiles.json. If you see the agent model line but responses fail, the token may have expired — extract fresh credentials and repeat from step 1.

What About Token Expiry?

Access tokens expire after a few hours. OpenClaw handles this automatically — it detects the expired token and uses the refresh token to get a new one. As long as the refresh token is valid and your subscription is active, this is hands-off.

If the refresh token itself expires (which can happen after extended periods or if the subscription lapses), you will need to re-authenticate the Claude CLI on your laptop and repeat the transfer.

Lessons for Agent Builders

Thinking Tokens Are a Hidden Quota Multiplier

If your agent uses a thinking model (Gemini 2.5 Flash, any model with chain-of-thought), the visible token count is misleading. The actual compute — and the actual quota consumption — can be 5–10x higher. This is especially dangerous when your agent sends large tool definitions, which get amplified by the thinking process.

Test your agent with its full tool set against the API quota before deploying. A model that works fine in a playground may hit limits instantly in production.

Subscriptions and API Keys Are Different Economies

API keys are pay-per-token: predictable cost per request, but open-ended total. Subscriptions are flat-rate: bounded total, but subject to usage limits. For an agent that runs constantly but does not generate huge volumes of tokens, a subscription is often cheaper and simpler.

The key insight: if you already pay for a subscription (for yourself, for development, for a team), your agent can share it. The marginal cost of running an agent on an existing subscription is zero. This changes the economics of always-on agents dramatically.

OAuth Credentials Are Portable — With Care

OAuth tokens from the Claude Code CLI can be extracted and transferred to another machine. The tokens themselves are bearer tokens — they work regardless of where they are used. But the field names, the JSON structure, and the profile ID must match exactly what the consuming application expects.

There is no standard format for storing OAuth credentials. Every application invents its own. Claude CLI uses accessToken/refreshToken. OpenClaw uses access/refresh. Both are reasonable. Neither is documented as a stable interface. If you are transferring credentials between systems, read the source code of both systems.

Silent Validation Failures Are the Worst Kind

When the auth-profiles.json had wrong field names, OpenClaw did not say "unrecognized field 'token' in profile." It said "no API key found for provider." The file existed, the JSON was valid, the profile was present — but the expected fields were undefined, so the profile was treated as empty.

This is a common pattern in loosely-typed systems: unknown fields are silently ignored, required fields default to undefined, and the error message describes a downstream symptom ("no key found") rather than the upstream cause ("your 'access' field is missing"). When debugging authentication in agent frameworks, always check the exact field names against the source code. Do not assume the field names match another system's conventions.

The total debugging time from first rate limit error to working Claude subscription was about three hours. Most of that was spent on dead ends — trying to fix Gemini rather than replacing it. The actual migration, once we understood the auth format, took twenty minutes.

Pi-E now runs on Claude Sonnet 4.6 at zero incremental cost. No rate limits. No thinking token surprises. The Raspberry Pi 4, drawing its usual 5 watts, now has a significantly more capable brain than it started with.

This post is part of a series on Pi-E, an AI agent running on a Raspberry Pi. Earlier posts cover why she cannot restart herself, her soul file and heartbeat system, and how she was deployed remotely.