AI / Agentic Systems
The Real Cost of Default Configs: How My AI Bot Burned $700/month on Tokens
A practical guide to optimizing LLM token usage in messaging bots. How default context windows, missing history limits, and compaction loops silently drain your API budget.
2026-03-175 min read
I run two AI chatbots over Telegram — one for work, one for family. After a few days, I noticed the Anthropic API bill climbing fast. The cause? Not a bug. Not a hack. Just default config.
This post shares how I found the issue, traced the root cause in the source code, and cut cost from ~$700 to ~$90/month with three config changes.
Symptom: Token usage soaring
My setup:
- 2 OpenClaw instances (open-source messaging bot framework)
- Running on Docker (WSL2, Gaming PC)
- Channel: Telegram groups
- Model: Claude Sonnet 4.6
After a few days, session files were unusually large — the biggest reached 805KB. For an LLM, 805KB text ≈ 200,000+ tokens per API call. With Sonnet pricing ($3 per million input tokens), each message in a group chat cost about $0.24 in input alone.
Root cause: Three overlapping cost holes
Hole 1: Default context window too large
In the source code, I found:
// src/agents/defaults.ts
export const DEFAULT_CONTEXT_TOKENS = 200_000;200K tokens — nearly the full Claude context window. The framework uses this when you don't set agents.defaults.contextTokens.
For a messaging bot answering short messages in a Telegram group, 200K tokens is 5–10x overkill.
Hole 2: No conversation history limit
// src/agents/pi-embedded-runner/history.ts
export function limitHistoryTurns(
messages: AgentMessage[],
limit: number | undefined,
): AgentMessage[] {
if (!limit || limit <= 0 || messages.length === 0) {
return messages; // ← Returns ALL messages if limit = undefined
}
// ...
}historyLimit must be set explicitly. Default = undefined = send full conversation history on every API call.
A Telegram group with 50 messages/day × 7 days = 350 messages. All of that was stuffed into context every time the bot replied.
Hole 3: Compaction loop — burning tokens to "save" tokens
When context is full (>200K), the framework compacts — it calls Claude to summarize older conversation:
Context full → Call Claude to summarize (costs 200K input + output tokens)
→ Keep chatting → Context full again → Compact again → ...
Compaction is necessary, but with an oversized context window, each compact is one huge API call just to shrink context. Like hiring a truck to move one box — it works, but it's wasteful.
Fix: Three config changes, ~87% cost cut
Step 1: Cap context window
{
"agents": {
"defaults": {
"contextTokens": 40000
}
}
}200K → 40K. Enough for ~20 quality turns. The bot doesn't need a full week of history to answer today's question.
Step 2: Cap history turns
{
"channels": {
"telegram": {
"historyLimit": 20,
"dmHistoryLimit": 30
}
}
}Group chat keeps the last 20 turns. DM keeps 30 (1‑on‑1 tends to need more continuity).
Step 3: Pick the right model per instance
| Instance | Before | After | Reason |
|---|---|---|---|
| Work | Sonnet ($3/MTok) | Sonnet (unchanged) | Needs quality for complex tasks |
| Family | Sonnet ($3/MTok) | Haiku ($0.25/MTok) | Casual chat, no heavy reasoning |
Haiku is 12x cheaper than Sonnet, and for family chat the quality is still great.
Results
| Metric | Before | After |
|---|---|---|
| Input tokens/msg (avg) | ~80,000 | ~15,000 |
| Cost/month (est. 100 msg/day) | ~$720 | ~$90 |
| Savings | — | ~87% |
Takeaways for anyone running LLM bots
1. Defaults are built for "don't break", not "cheap"
Framework authors set high defaults so the bot never runs out of context. But "enough context" ≠ "efficient use". In production, tune defaults to your use case.
2. History limit is the simplest config with the biggest impact
An active group can pile up hundreds of messages per day. Without a limit, you're paying for Claude to re-read all of it on every reply.
The right question: How many recent turns does the bot need to be useful? Usually 10–30, not 500.
3. Not every instance needs the same model
Family chat answering "what's for dinner?" doesn't need Sonnet. Work chat analyzing KPI reports does. Per-instance model choice is a cost lever many people skip.
4. Compaction is necessary but has hidden cost
Compaction is like garbage collection — needed but costly. The best way to reduce compaction cost: don't fill context too fast — use history limit and context cap.
5. Monitor before optimizing
I only spotted the issue when I looked at session file sizes (805KB!). Turning on payload logging from day one gives you data before the bill spikes:
OPENCLAW_ANTHROPIC_PAYLOAD_LOG=true
Checklist for deploying an LLM messaging bot
- Set
contextTokensappropriately (20K–50K for chat bots, don't use default) - Set
historyLimitfor group chats (10–30 turns) - Set
dmHistoryLimitfor DMs (20–50 turns) - Choose the right model per instance (no need for Opus/Sonnet for casual chat)
- Enable token usage logging from day one
- Review API bill after the first week
- Consider
requireMention: truefor busy groups (bot only replies when mentioned)
Conclusion
This isn't a story about a bad framework — OpenClaw is a solid tool. The issue is the gap between default config and production use. Defaults serve "works out of the box"; production needs "works within budget".
Three config changes, a few minutes to apply, ~$600/month saved.
Sometimes the most expensive "optimization" isn't in the code — it's in the config file.
Pillar: 1. Content type: lesson-learned. Engine: ACE-LDK-claire-personal-branding-engine.
