OpenClaw Token Optimization — The Real Cost of AI Agent Defaults

I run two AI chatbots over Telegram — one for work, one for family. After a few days, I noticed the Anthropic API bill climbing fast. The cause? Not a bug. Not a hack. Just default config.

This post shares how I found the issue, traced the root cause in the source code, and cut cost from ~$700 to ~$90/month with three config changes.

Symptom: Token usage soaring

My setup:

2 OpenClaw instances (open-source messaging bot framework)
Running on Docker (WSL2, Gaming PC)
Channel: Telegram groups
Model: Claude Sonnet 4.6

After a few days, session files were unusually large — the biggest reached 805KB. For an LLM, 805KB text ≈ 200,000+ tokens per API call. With Sonnet pricing ($3 per million input tokens), each message in a group chat cost about $0.24 in input alone.

Root cause: Three overlapping cost holes

Hole 1: Default context window too large

In the source code, I found:

// src/agents/defaults.ts
export const DEFAULT_CONTEXT_TOKENS = 200_000;

200K tokens — nearly the full Claude context window. The framework uses this when you don't set agents.defaults.contextTokens.

For a messaging bot answering short messages in a Telegram group, 200K tokens is 5–10x overkill.

Hole 2: No conversation history limit

// src/agents/pi-embedded-runner/history.ts
export function limitHistoryTurns(
  messages: AgentMessage[],
  limit: number | undefined,
): AgentMessage[] {
  if (!limit || limit <= 0 || messages.length === 0) {
    return messages; // ← Returns ALL messages if limit = undefined
  }
  // ...
}

historyLimit must be set explicitly. Default = undefined = send full conversation history on every API call.

A Telegram group with 50 messages/day × 7 days = 350 messages. All of that was stuffed into context every time the bot replied.

Hole 3: Compaction loop — burning tokens to "save" tokens

When context is full (>200K), the framework compacts — it calls Claude to summarize older conversation:

Context full → Call Claude to summarize (costs 200K input + output tokens)
→ Keep chatting → Context full again → Compact again → ...

Compaction is necessary, but with an oversized context window, each compact is one huge API call just to shrink context. Like hiring a truck to move one box — it works, but it's wasteful.

Fix: Three config changes, ~87% cost cut

Step 1: Cap context window

{
  "agents": {
    "defaults": {
      "contextTokens": 40000
    }
  }
}

200K → 40K. Enough for ~20 quality turns. The bot doesn't need a full week of history to answer today's question.

Step 2: Cap history turns

{
  "channels": {
    "telegram": {
      "historyLimit": 20,
      "dmHistoryLimit": 30
    }
  }
}

Group chat keeps the last 20 turns. DM keeps 30 (1‑on‑1 tends to need more continuity).

Step 3: Pick the right model per instance

Instance	Before	After	Reason
Work	Sonnet ($3/MTok)	Sonnet (unchanged)	Needs quality for complex tasks
Family	Sonnet ($3/MTok)	Haiku ($0.25/MTok)	Casual chat, no heavy reasoning

Haiku is 12x cheaper than Sonnet, and for family chat the quality is still great.

Results

Metric	Before	After
Input tokens/msg (avg)	~80,000	~15,000
Cost/month (est. 100 msg/day)	~$720	~$90
Savings	—	~87%

Takeaways for anyone running LLM bots

1. Defaults are built for "don't break", not "cheap"

Framework authors set high defaults so the bot never runs out of context. But "enough context" ≠ "efficient use". In production, tune defaults to your use case.

2. History limit is the simplest config with the biggest impact

An active group can pile up hundreds of messages per day. Without a limit, you're paying for Claude to re-read all of it on every reply.

The right question: How many recent turns does the bot need to be useful? Usually 10–30, not 500.

3. Not every instance needs the same model

Family chat answering "what's for dinner?" doesn't need Sonnet. Work chat analyzing KPI reports does. Per-instance model choice is a cost lever many people skip.

4. Compaction is necessary but has hidden cost

Compaction is like garbage collection — needed but costly. The best way to reduce compaction cost: don't fill context too fast — use history limit and context cap.

5. Monitor before optimizing

I only spotted the issue when I looked at session file sizes (805KB!). Turning on payload logging from day one gives you data before the bill spikes:

OPENCLAW_ANTHROPIC_PAYLOAD_LOG=true

Checklist for deploying an LLM messaging bot

Set contextTokens appropriately (20K–50K for chat bots, don't use default)
Set historyLimit for group chats (10–30 turns)
Set dmHistoryLimit for DMs (20–50 turns)
Choose the right model per instance (no need for Opus/Sonnet for casual chat)
Enable token usage logging from day one
Review API bill after the first week
Consider requireMention: true for busy groups (bot only replies when mentioned)

Conclusion

This isn't a story about a bad framework — OpenClaw is a solid tool. The issue is the gap between default config and production use. Defaults serve "works out of the box"; production needs "works within budget".

Three config changes, a few minutes to apply, ~$600/month saved.

Sometimes the most expensive "optimization" isn't in the code — it's in the config file.

Pillar: 1. Content type: lesson-learned. Engine: ACE-LDK-claire-personal-branding-engine.