[T] Token Engineering Lab — Claude Code Engineering Guide

[01] O Problema [01] The Problem

Por que seu plano acaba antes do mês.

Why your plan runs out before the month does.

Cada turno do Claude Code recarrega ~100K tokens de contexto. Você não paga só pelo que ele escreve — paga por tudo que ele relê, repetidamente.

Every Claude Code turn reloads ~100K tokens of context. You don't pay only for what it writes — you pay for everything it rereads, repeatedly.

[01]

Cold start em cada turno

Cold start every turn

96% dos seus tokens são Cache Read — o Claude relendo o que já leu. Invisível nos logs, devastador no plano.

96% of your tokens are Cache Read — Claude rereading what it already read. Invisible in logs, devastating to your plan.

[02]

Opus onde Sonnet basta

Opus where Sonnet suffices

67% das calls vão pra Opus, mas o output médio é igual ao Sonnet. Você está pagando 5x por tarefa que custaria 1x.

67% of calls go to Opus, but average output is the same as Sonnet. You're paying 5x for tasks that would cost 1x.

[03]

Compressão silenciosa

Silent compression

A cada ~30 turnos, o Claude apaga o histórico para caber na janela. Tudo que você carregou no init: perdido. Sem alerta.

Every ~30 turns, Claude wipes history to fit the window. Everything you loaded at init: gone. No alert.

[04]

MCPs reconectando a cada call

MCPs reconnecting every call

Cada tool call carrega novamente 10K+ tokens de definição de MCPs. Overhead silencioso que nunca aparece como linha de custo.

Each tool call reloads 10K+ tokens of MCP definitions. Silent overhead that never shows up as a cost line.

[02] Os 8 Padrões [02] The 8 Patterns

Engenharia, não dicas.

Engineering, not tips.

Cada padrão é uma decisão arquitetural com benchmark medido em produção. Tudo que você precisa para implementar — sem prompting magic.

Each pattern is an architectural decision with production-measured benchmark. Everything you need to implement — no prompting magic.

[01]

Cache em duas camadas

Two-layer cache

L1-Lite (mapa, ~3K tokens, sempre carregado) + L1-Full (contexto completo, sob demanda). Stack Redis + SQLite + vector store.

L1-Lite (map, ~3K tokens, always loaded) + L1-Full (full context, on demand). Redis + SQLite + vector store stack.

→ economia: ~90% dos roundtrips de init

→ saves: ~90% of init roundtrips

[02]

Roteamento por complexidade

Routing by complexity

Classificador YAML (haiku/sonnet/opus) sem LLM, <1ms. Auto-correção por feedback. Inverte a proporção Opus/Sonnet de 67/33 para 20/80.

YAML classifier (haiku/sonnet/opus), no LLM, <1ms. Self-correcting via feedback. Flips Opus/Sonnet ratio from 67/33 to 20/80.

→ economia: 5x em 80% das tarefas

→ saves: 5x on 80% of tasks

[03]

Interceptação e reuso de calls

Call interception and reuse

Hooks PreToolUse que rastreiam quota, fazem cache de resultados e reutilizam respostas idênticas dentro da janela TTL.

PreToolUse hooks that track quota, cache results, and reuse identical responses within the TTL window.

→ economia: ~60% de cache hit em dispatch

→ saves: ~60% cache hit on dispatch

[04]

Benchmarks por tipo de tarefa

Benchmarks by task type

Tabela de custo real por operação (Bash, Read, Edit, Grep, Subagents). Onde investir otimização e onde ignorar.

Real cost table per operation (Bash, Read, Edit, Grep, Subagents). Where to invest optimization and where to ignore.

→ identifica os 3 maiores vilões em 7 dias

→ identifies the 3 biggest villains in 7 days

[05]

Sessões persistentes

Persistent sessions

Anti-compressão por hook: a cada ~30 turnos, recarrega o mapa Lite antes que a compressão destrua o contexto. ~3K tokens previnem ~50K.

Anti-compression hook: every ~30 turns, reload the Lite map before compression destroys context. ~3K tokens prevent ~50K.

→ elimina perda de contexto

→ eliminates context loss

[06]

Context Compilation

Modelo leve (Haiku) lê 10–20 arquivos e entrega briefing denso de 5K tokens. Compressão típica 200:1. Cache Redis com TTL 36h.

Light model (Haiku) reads 10–20 files, delivers dense 5K-token briefing. Typical 200:1 compression. Redis cache with 36h TTL.

→ init de 60s → 2–5s

→ init from 60s → 2–5s

[07]

Delegação auditada

Audited delegation

Pipeline Planner (Opus) → Gerador (Sonnet/Haiku) → Auditor (Opus, vê só o diff). Reduz custo de geração longa em 65–85%.

Pipeline Planner (Opus) → Generator (Sonnet/Haiku) → Auditor (Opus, sees only diff). Cuts long-generation cost by 65–85%.

→ 100% Opus → 25–35% custo

→ 100% Opus → 25–35% cost

[08]

Hook Architecture

UserPromptSubmit + PreToolUse + PostToolUse. Classificação na entrada, web search gate, critical path guard. Disciplina como infraestrutura.

UserPromptSubmit + PreToolUse + PostToolUse. Entry classification, web search gate, critical path guard. Discipline as infrastructure.

→ comportamento correto sem disciplina manual

→ correct behavior without manual discipline

[04] Para quem é [04] Who it's for

Para quem opera Claude Code como infraestrutura.

For those who run Claude Code as infrastructure.

[DEV]

Devs em sessão diária

Daily-session devs

Você usa Claude Code todo dia, paga Pro/Max, e sente o plano acabar antes da semana terminar.

You use Claude Code every day, pay for Pro/Max, and feel the plan run out before the week ends.

[FOUNDER]

Fundadores técnicos

Technical founders

Você delega execução técnica ao Claude e precisa que cada token entregue valor proporcional ao custo.

You delegate technical execution to Claude and need every token to deliver value proportional to its cost.

[ENG]

Engenheiros de IA

AI engineers

Você constrói sistemas multi-agente e precisa de arquitetura de cache, roteamento e auditoria que escale.

You build multi-agent systems and need cache architecture, routing and auditing that scales.

Não é para você se

Not for you if

Usa Claude esporadicamente (< 1h/dia)
Não tem assinatura Pro/Max
Quer aprender prompting básico
Não pretende setup de infra (Redis, hooks)

You use Claude occasionally (< 1h/day)
You don't have a Pro/Max subscription
You want to learn basic prompting
You won't set up infra (Redis, hooks)

[05] O que você recebe [05] What you get

PDF de 24 páginas. Pronto para implementar.

24-page PDF. Ready to implement.

PDF de 24 páginas em PT e EN, design técnico
24-page PDF in PT and EN, technical design
8 padrões arquiteturais com benchmark medido
8 architectural patterns with measured benchmarks
Stack completo: Redis + SQLite + vector store
Full stack: Redis + SQLite + vector store
Snippets de código prontos para copiar (Python, YAML, JSON)
Ready-to-copy code snippets (Python, YAML, JSON)
Roteiro de implementação em 4 fases priorizadas por ROI
4-phase implementation roadmap prioritized by ROI
Checklist de auditoria para validar cada estratégia
Audit checklist to validate each strategy
Acesso vitalício, atualizações futuras incluídas
Lifetime access, future updates included

~/.claude/projects/audit.py

# before optimization
tokens_per_session = ~8_000_000
cache_read_pct = 94.7
opus_share = 0.67

# after the guide
tokens_per_session = ~2_000_000 # -75%
cache_read_pct = 62.0 # real reuse
opus_share = 0.18 # inverted

[07] Perguntas [07] Questions

As respostas que importam.

The answers that matter.

Funciona se eu não usar Claude Code? Does it work if I don't use Claude Code? +

Os padrões aplicam-se a qualquer agente LLM com tool use intensivo (Cursor, Aider, custom agents). Os exemplos são em Claude Code porque foi onde rodamos os benchmarks, mas a arquitetura é portável.

The patterns apply to any LLM agent with intensive tool use (Cursor, Aider, custom agents). Examples are in Claude Code because that's where we ran benchmarks, but the architecture is portable.

Preciso saber Python para implementar? Do I need Python to implement? +

Os snippets são em Python, mas a lógica é replicável em qualquer linguagem. Hooks, dispatch, cache — tudo isso é arquitetura, não tecnologia específica.

Snippets are in Python, but the logic is replicable in any language. Hooks, dispatch, cache — all of this is architecture, not specific technology.

Quanto tempo leva para implementar tudo? How long does it take to implement everything? +

Fase 1 (diagnóstico) em 1 dia. Fase 2 (quick wins, 30–50% de redução) em 2–3 dias. Stack completo em 2–3 semanas. O ROI é incremental: você começa a economizar no segundo dia.

Phase 1 (diagnosis) in 1 day. Phase 2 (quick wins, 30–50% reduction) in 2–3 days. Full stack in 2–3 weeks. ROI is incremental: you start saving on day two.

Como recebo após pagar? How do I receive after payment? +

Download imediato dos PDFs (PT + EN) e do pack de snippets. Atualizações futuras notificadas por email.

Instant download of PDFs (PT + EN) and snippet pack. Future updates notified by email.

É um curso ou só leitura? Is it a course or just reading? +

É um manual de engenharia. Denso, técnico, direto. Cada capítulo tem o porquê, o como, e o benchmark. Sem fluff, sem encheção.

It's an engineering manual. Dense, technical, direct. Each chapter has the why, the how, and the benchmark. No fluff.

Reembolso se não funcionar? Refund if it doesn't work? +

7 dias, sem perguntas. Se você implementou as estratégias da Fase 2 e não viu redução de pelo menos 30%, devolvemos integral.

7 days, no questions. If you implemented Phase 2 strategies and didn't see at least 30% reduction, full refund.

Pare de pagar pelomesmo contextoduas vezes.

Stop paying forthe same contexttwice.

Por que seu plano acaba antes do mês.

Why your plan runs out before the month does.

Cold start em cada turno

Cold start every turn

Opus onde Sonnet basta

Opus where Sonnet suffices

Compressão silenciosa

Silent compression

MCPs reconectando a cada call

MCPs reconnecting every call

Engenharia, não dicas.

Engineering, not tips.

Cache em duas camadas

Two-layer cache

Roteamento por complexidade

Routing by complexity

Interceptação e reuso de calls

Call interception and reuse

Benchmarks por tipo de tarefa

Benchmarks by task type

Sessões persistentes

Persistent sessions

Context Compilation

Context Compilation

Delegação auditada

Audited delegation

Hook Architecture

Hook Architecture

Mensurado, não estimado.

Measured, not estimated.

Para quem opera Claude Code como infraestrutura.

For those who run Claude Code as infrastructure.

Devs em sessão diária

Daily-session devs

Fundadores técnicos

Technical founders

Engenheiros de IA

AI engineers

Não é para você se

Not for you if

PDF de 24 páginas. Pronto para implementar.

24-page PDF. Ready to implement.

Uma vez. Seu para sempre.

Once. Yours forever.

As respostas que importam.

The answers that matter.

Pare de ler.Comece a medir.

Stop reading.Start measuring.

Pare de pagar pelo
mesmo contexto
duas vezes.

Stop paying for
the same context
twice.

Pare de ler.
Comece a medir.

Stop reading.
Start measuring.