End-to-End Build — Complete Agent-Native Tool from Scratch

English title: End-to-End Build: procurement CLI from API to Agent

Mở đầu

Series này bắt đầu từ một câu hỏi: làm thế nào để một CLI thông thường trở thành tool mà AI agent có thể discover, invoke, và reason về?

Qua 9 bài, chúng ta đã đi qua từng layer của câu trả lời. Bài cuối này tổng hợp lại qua một case study thực tế — procurement CLI, một internal purchasing management CLI tôi build trong môi trường enterprise.

Không phải mọi detail đều được publish vì lý do confidentiality. Nhưng pattern, số liệu, và bài học đều là thực.

1. Bối cảnh: procurement CLI và bài toán

Một procurement CLI là một hệ thống quản lý mua hàng nội bộ. Backend là REST API với hơn 439 endpoints: purchase requests, purchase orders, vendors, approvals, budgets, contracts, payments.

Vấn đề: Data ops team cần automate reporting và reconciliation. Họ viết Python scripts gọi thẳng REST API. Kết quả sau 6 tháng: 23 scripts, mỗi script có copy-paste của auth logic, pagination, và error handling. Khi API thay đổi auth header format, team mất 3 ngày sửa 23 scripts.

Mục tiêu khi bắt đầu build procurement CLI:

Một interface thống nhất cho 439 endpoints
Agent-ready: structured JSON output, meaningful exit codes, no interactive prompts
Không viết tay — codegen từ API spec
Maintain được: khi API thay đổi, chạy lại pipeline là xong

2. Timeline và tech decisions

Tuần 1 — Extract & Design
├── Extract OpenAPI spec → 439 methods metadata JSON
├── Define domain models: PurchaseRequest, PurchaseOrder, Vendor, etc.
├── Define AdapterError hierarchy (exit codes 0-5)
└── Design folder structure

Tuần 2 — Pipeline
├── Build Jinja2 templates (adapter, command, test)
├── Build generator.py
├── First run: 32 adapter files, 32 command files, 96 test cases
└── Fix template bugs (6 iterations)

Tuần 3 — Quality & Agent Features
├── Add --json flag to ALL commands
├── Add --force flag to ALL destructive commands
├── Add ensure_* idempotent variants (25 methods)
├── Generate SKILL.md (487 tokens)
└── Total tests: 395, all passing

Tuần 4 — Integration
├── Wire into data ops agent (subprocess.run)
├── Add MCP server for IDE usage
├── Performance testing: 439 commands, avg 180ms/call
└── Ship to data ops team

3. Kiến trúc cuối cùng

puro-cli/
├── metadata/
│   └── api-metadata.json          # 439 methods — SSOT (bài 4)
│
├── tools/
│   ├── extract_openapi.py         # API spec → metadata (bài 4)
│   └── normalize_metadata.py      # Cleanup + validation
│
├── pipeline/
│   ├── generator.py               # Main pipeline (bài 5)
│   ├── generate_skills.py         # → SKILL.md (bài 8)
│   ├── generate_mcp.py            # → mcp_server.py (bài 7)
│   └── templates/
│       ├── adapter.py.j2
│       ├── command.py.j2
│       └── test.py.j2
│
├── adapters/                      # GENERATED — DO NOT EDIT
│   ├── base.py                    # BaseAdapter, AdapterError
│   ├── purchase_request_adapter.py
│   ├── purchase_order_adapter.py
│   ├── vendor_adapter.py
│   └── ... (32 files total)
│
├── cli/                           # GENERATED — DO NOT EDIT
│   ├── __main__.py                # Entry point
│   ├── purchase_requests.py
│   ├── purchase_orders.py
│   └── ... (32 files total)
│
├── tests/                         # GENERATED + handwritten
│   ├── test_generator.py          # Handwritten: test generator logic
│   ├── test_contracts.py          # Handwritten: contract tests (bài 6)
│   └── unit/                      # GENERATED: 395 test cases
│
├── SKILL.md                       # GENERATED: 487 tokens (bài 8)
├── mcp_server.py                  # GENERATED: MCP server (bài 7)
└── Makefile                       # codegen, test, all

4. Số liệu thực tế

Metric	Value
API endpoints	439
Generated adapter files	32
Generated CLI commands	439
Generated test cases	395
Handwritten code (total)	~800 dòng
Generated code (total)	~16,400 dòng
SKILL.md size	487 tokens
MCP server size	~55,000 tokens khi loaded
Codegen time (make all)	2.8 giây
Test run time	43 giây
Time-to-first-working-CLI	3 ngày (vs estimate 3 tuần viết tay)

Số liệu quan trọng nhất: 800 dòng handwritten, 16,400 dòng generated. Ratio 1:20. Mỗi dòng handwritten template tạo ra 20 dòng working code.

5. Bài học từ thực tế

Bài học 1: Template bugs nhân 439 lần

Khi template có bug, mọi generated file đều có bug. Trong tuần 2, có một bug trong pagination logic — generator thiếu params = {k: v for k, v in params.items() if v is not None}. Tất cả 32 adapter files gửi None values cho API, gây 400 errors.

Fix template → chạy lại generator → tất cả 32 files sạch. Mất 10 phút. Nếu viết tay, mất 32 × (5 phút) = 160 phút.

Lesson: Test generator aggressively (bài 6). Template quality = code quality × N.

Bài học 2: --json flag không đủ nếu output shape không nhất quán

Generator ban đầu tạo JSON output với shapes khác nhau: list command trả [...], get command trả {...}, create command trả {"result": {...}}. Agent phải xử lý 3 shapes khác nhau.

Fix: Chuẩn hóa output schema trong template:

List: luôn [{...}]
Get/Create/Update: luôn {"data": {...}, "action": "..."}
Delete: luôn {"deleted": true, "id": ...}

Agent bây giờ chỉ cần xử lý 2 shapes (array hoặc object).

Bài học 3: Skills doc > MCP cho data ops use case

Data ops team dùng agent để viết và run Python scripts. Họ không cần IDE integration. Họ cần agent biết CLI commands để inject vào scripts.

Kết quả khi test: với SKILL.md (487 tokens), agent generate correct CLI calls trong 94% cases. Với MCP (55k tokens), accuracy tương tự nhưng context window của mỗi conversation giảm từ ~185k tokens xuống ~130k tokens — ảnh hưởng đến khả năng xử lý large datasets.

Decision: Data ops dùng Level 2 (Skills), developer IDE dùng Level 3 (MCP).

Bài học 4: Idempotency là bắt buộc cho automation

Agent retry khi timeout. Nếu create_purchase_request không idempotent, một timeout có thể tạo 2 purchase requests. Với tài chính, đây là vấn đề nghiêm trọng.

ensure_purchase_request (idempotent variant) check existence trước khi create. Agent gọi ensure_* thay vì create_* cho mọi write operations. Không có duplicate trong 3 tháng production.

6. Checklist: Apply cho project của bạn

## Phase 1: Foundation (1-2 ngày)
- [ ] Xác định nguồn API spec: OpenAPI, SDK, hoặc HTML docs
- [ ] Run extractor → metadata/api-metadata.json
- [ ] Run normalizer, fix validation errors
- [ ] Define exit code schema (0-5 standard)
- [ ] Define output shape standard (array vs object)
 
## Phase 2: Adapter (1-2 ngày)
- [ ] Implement BaseAdapter với _request(), error mapping
- [ ] Define domain dataclasses (User, Order, etc.)
- [ ] Build adapter.py.j2 template
- [ ] Test template với 1-2 representative methods
- [ ] Run generator, review output
 
## Phase 3: CLI (1 ngày)
- [ ] Build command.py.j2 template
- [ ] Verify --json flag in EVERY command
- [ ] Verify --force flag in ALL destructive commands
- [ ] Verify idempotent variants (ensure_*) cho create operations
- [ ] Run generator, smoke test key commands
 
## Phase 4: Testing (1-2 ngày)
- [ ] Build test.py.j2 template
- [ ] Write generator unit tests (test_generator.py)
- [ ] Write contract tests (test_contracts.py)
  - [ ] Every method has adapter implementation
  - [ ] Every command has --json flag
  - [ ] Every delete has --force flag
- [ ] Write snapshot tests cho representative sample
- [ ] All tests passing: make all
 
## Phase 5: Agent Integration (1 ngày)
- [ ] Generate SKILL.md (generate_skills.py)
- [ ] Test agent with SKILL.md: can it invoke correct commands?
- [ ] Decision: MCP needed?
  - [ ] If yes: generate mcp_server.py, register in IDE settings
  - [ ] If no: Skills + CLI subprocess sufficient
- [ ] Document chosen level in Architecture Decision Record
 
## Phase 6: Ongoing
- [ ] API changes → re-run: make codegen
- [ ] Template bugs → fix template → re-run: make codegen
- [ ] New resources → update metadata → re-run: make codegen

7. Retrospective: Series này

Chín bài đã cover một path đầy đủ từ "tôi có REST API" đến "agent có thể dùng tool này":

Bài	Chủ đề	Key takeaway
1	Interface evolution	CLI không phải legacy — đó là optimal layer cho agents
2	7 nguyên tắc	Structured output + exit codes + idempotency = agent-ready
3	Adapter pattern	Tách business logic khỏi delivery mechanism
4	Metadata extraction	Single source of truth cho mọi thứ
5	Codegen pipeline	1:20 ratio — 1 template dòng = 20 code dòng
6	Testing strategy	Test generator, không test từng generated file
7	MCP server	20 dòng wrap adapter, 55k tokens cost
8	Skills doc	487 tokens, 99.6% reduction, similar accuracy
9	Decision framework	5 câu hỏi để chọn đúng level
10	End-to-end	Pattern áp dụng được, số liệu thực tế

8. What's next

Nếu bạn muốn đi sâu hơn:

Agent SDK patterns — Anthropic's Agent SDK cho phép build agents phức tạp hơn với memory, multi-step reasoning, và tool orchestration. Adapter pattern từ series này là compatible hoàn toàn.

OpenTelemetry cho CLI — Observability cho agent tool calls: trace every subprocess.run, log exit codes, measure latency. Khi agent fail một task, bạn muốn biết tại command nào.

Multi-agent tool sharing — Khi nhiều agents dùng cùng tool, centralized MCP server vs distributed subprocess là trade-off thú vị.

Kết

CLI không chết khi AI agents xuất hiện. CLI evolved. Từ interactive tools cho humans sang structured interfaces cho agents — nhưng core vẫn là Unix philosophy: do one thing well, output to stdout, exit with meaningful code.

Adapter pattern, metadata pipeline, codegen, progressive enhancement ladder — đây không phải patterns phức tạp. Đây là cách áp dụng software engineering fundamentals vào một context mới.

Agent-native không phải về magic. Đó là về discipline: structured output, exit codes, idempotency, documentation. Những thứ good engineers đã làm từ trước khi AI agents tồn tại.

$ puro list-purchase-requests --status pending --json | jq 'length'
47
$ echo $?
0

Đơn giản. Reliable. Agent-ready.