Notícias
Seu agente IA usa modelo errado (rotar modelos = ROI explode)
Notícias
5 min de leitura
31 de maio de 2026

Seu agente IA usa modelo errado (rotar modelos = ROI explode)

Agente IA usa 1 modelo (caro, lento). Routing entre modelos = 3x faster, 50% cheaper. OpenRouter prova.

Equipe OpenClaw

Equipe OpenClaw · Time de Engenharia & Produto

A Equipe OpenClaw é formada por engenheiros, designers e especialistas em IA dedicados a construir a melhor plataforma de agentes conversacionais para negócios brasileiros. Combinamos expertise…


Seu agente IA usa modelo errado (rotar modelos = ROI explode)

Você tem SaaS.

Seu SaaS: agente IA (atendimento ao cliente, automação, vendas).

Você escolheu modelo:

"Vou usar GPT-4 (melhor modelo, mais inteligente).

GPT-4 responde bem (accurate, nuanced).

GPT-4 é caro (R$ 0.03 por 1k tokens input).

Mas GPT-4 é worth it (best quality).

Eu pago R$ 0.03 por TODA request (mesmo simple requests).

Meu agente responde bem, mas custo é alto (R$ 5k/mês).

Custo é worth it (quality is worth cost), certo?"

You deploy agente com GPT-4.

Agente starts generating revenue.

But costs are high (R$ 5k/mês em API calls).

You think:

"GPT-4 é caro (R$ 0.03 per 1k tokens).

Mas quality is good (customers are happy).

Custo é 10% of revenue (acceptable margin?).

Maybe I should lower costs (find cheaper alternative?).

But cheaper models are worse (lower quality).

Dilemma: High cost vs low quality.

How do I solve this (without sacrificing quality)?"

Recent news (May 2026):

"OpenRouter raises $113M Series B.

"Business: API routing for LLMs (connects to multiple models).

"Feature: Automatically route request to best model (by task, cost, speed).

"Example: Simple query → Claude 3.5 (cheaper, fast enough). Complex query → GPT-4 (expensive, better).

"Result: Same quality, 50% cheaper (by using right model for each task)."

You realize:

"Wait.

I'm using GPT-4 for EVERYTHING (simple + complex requests).

But simple requests don't need GPT-4 (don't need best model).

Simple requests could use cheaper model (Claude 3.5, Llama 3).

Complex requests could use GPT-4 (when quality matters).

If I route smartly (cheap model for simple, expensive for complex):

  • 60% of requests = simple = use cheap model (Claude 3.5)
  • 40% of requests = complex = use GPT-4
  • Cost drops from R$ 5k to R$ 2.5k (50% reduction)
  • Quality stays same (expensive model for complex, cheap for simple)
  • ROI improves (same quality, half the cost)."

You realize:

"I've been leaving money on the table (paying R$ 2.5k extra unnecessarily).

If I switch to model routing:

  • Cost drops 50% (R$ 5k → R$ 2.5k)
  • Quality stays same (use expensive model when needed)
  • ROI improves (better margin)
  • Agente is more efficient (uses right tool for right job)."

O problema (single-model agente = overpaying)

The cost of using one expensive model for everything

SINGLE-MODEL AGENTE (status quo):

Setup:

  • Choose 1 model (GPT-4, Claude 3.5, Llama 70B)
  • Use same model for ALL requests
  • Cost: Fixed per-token price (same for simple + complex)

Example: GPT-4 (R$ 0.03 per 1k input tokens)

Request types:

  1. Simple: "What's our return policy?" (100 tokens, 5 sec to answer)

    • Needs: Low intelligence, low quality
    • Cost: R$ 0.003 per request
    • But: Using GPT-4 (overkill, overqualified)
    • Waste: Paying premium price for basic task
  2. Complex: "Analyze customer feedback, suggest product improvements" (1000 tokens, analyze patterns)

    • Needs: High intelligence, high quality
    • Cost: R$ 0.03 per request
    • Worth it: GPT-4 is justified (complex task)
  3. Mixed: Average request (500 tokens)

    • Cost: R$ 0.015 per request
    • Is it worth it? Depends on complexity

REAL WORLD EXAMPLE (your agente):

Monthly requests: 10,000

  • 6,000 simple requests (100 tokens avg) = R$ 1.8k
  • 4,000 complex requests (1000 tokens avg) = R$ 1.2k
  • Total cost: R$ 3k/month (using GPT-4 for everything)

But reality:

  • Simple requests: Could use cheaper model (Claude 3.5 R$ 0.01/1k tokens)
  • Complex requests: Need GPT-4 (R$ 0.03/1k tokens)

If split by model:

  • 6,000 simple @ R$ 0.01/1k tokens = R$ 600
  • 4,000 complex @ R$ 0.03/1k tokens = R$ 1.2k
  • Total cost: R$ 1.8k/month (vs R$ 3k with GPT-4 only)
  • Savings: R$ 1.2k/month (40% reduction)

WHY THIS MATTERS:

  1. Cost multiplier problem

    • If agente processes 100k requests/month (scale)
    • Using expensive model for ALL (even simple)
    • Cost: R$ 30k/month
    • With routing: R$ 18k/month (R$ 12k savings)
    • Savings scale with volume (more requests = more savings)
  2. Margin compression

    • Revenue per customer: R$ 500/month
    • Cost per customer: R$ 5k agente cost / 100 customers = R$ 50
    • Margin: R$ 450 (90% margin, good)
    • But: If using expensive model unnecessarily
    • Cost per customer: R$ 50 (overpaying by R$ 20)
    • Margin: R$ 430 (worse)
    • With routing: Cost R$ 30, Margin R$ 470 (better)
  3. Competitive disadvantage

    • Competitor uses model routing (25% lower cost)
    • Competitor can undercut you (lower price)
    • Or: Competitor has higher margin (more room to invest)
    • You're at disadvantage (paying more, getting same result)
  4. Scaling pain

    • Today: 10k requests/month, cost R$ 3k
    • Tomorrow: 100k requests/month, cost R$ 30k
    • Problem: Cost grows linearly (can you afford it?)
    • Solution: Model routing keeps cost under control (grows slower)

Why people use single model (and why it's suboptimal)

REASON 1: Simplicity

  • Single model = simple (1 API endpoint, 1 price, 1 config)
  • Multiple models = complex (need to route, compare, manage)
  • People choose simple (even if more expensive)
  • Result: Single model is "default" (even if suboptimal)

REASON 2: Fear of inconsistency

  • Using GPT-4 = consistent results (always good)
  • Using Claude 3.5 for simple, GPT-4 for complex = might be inconsistent
  • Fear: "What if Claude gives bad answer (customer confused)?"
  • Result: Stick with expensive model (guaranteed quality)
  • But: Not all requests need "guaranteed quality" (simple requests don't)

REASON 3: Lack of awareness

  • Many people don't know model routing exists
  • Or: Don't know they can mix models (use cheap + expensive)
  • Or: Think all requests are "complex" (need best model)
  • Result: Default to expensive model (no alternative considered)

REASON 4: Operational overhead

  • Managing multiple models requires monitoring
  • Need to track: Which model for which request?
  • Need to ensure: Quality stays consistent
  • Need to update: When new models become available
  • Result: Overhead discourages adoption (stick with single model)

BUT REALITY:

Not all requests need expensive model.

Example:

  • "What time is your office open?" → Needs cheap model (simple lookup)
  • "Analyze my customer churn and suggest retention strategy" → Needs expensive model (complex analysis)

If you use GPT-4 for both:

  • Simple request: Overpay (use Ferrari to go to corner store)
  • Complex request: Worth it (use Ferrari for highway)
  • Overall: Overpaying on average

If you use model routing:

  • Simple request: Use cheap model (use bicycle for corner store)
  • Complex request: Use expensive model (use Ferrari for highway)
  • Overall: Pay fair price for each task

A solução (model routing)

Strategy 1: Manual routing (you decide which model for which task)

OPTION: Explicitly choose model based on task type

Setup:

  1. Categorize requests (simple, medium, complex)
  2. Assign model to each category
    • Simple: Claude 3.5 (R$ 0.01/1k tokens, fast, cheap)
    • Medium: GPT-3.5 (R$ 0.005/1k tokens, balance)
    • Complex: GPT-4 (R$ 0.03/1k tokens, best quality)
  3. Route request to correct model (based on category)
  4. Get response (from assigned model)
  5. Return to customer

Benefit:

  • Control: You decide model per task (explicit)
  • Cost: Right model for right job (optimal pricing)
  • Quality: Complex tasks get best model

Disadvantage:

  • Manual: Need to categorize each request (overhead)
  • Rules brittle: Category rules might be wrong (edge cases)
  • Maintenance: Update rules when new models arrive

When to use:

  • Predictable patterns (know upfront which tasks are simple/complex)
  • Small volume (can handle manual routing)
  • Clear task types (distinct categories, not ambiguous)

Example:

if request.type == "faq": use_model = "claude-3.5" elif request.type == "analysis": use_model = "gpt-4" elif request.type == "conversation": use_model = "gpt-3.5"

Cost:

  • FAQs (40%): Claude 3.5 @ R$ 0.01 = R$ 400
  • Analysis (20%): GPT-4 @ R$ 0.03 = R$ 600
  • Conversation (40%): GPT-3.5 @ R$ 0.005 = R$ 200
  • Total: R$ 1.2k/month (vs R$ 3k with GPT-4 only)
  • Savings: 60%

Strategy 2: Automatic routing (AI decides which model)

OPTION: Automatically choose best model (by cost, speed, quality)

Setup:

  1. Connect to model router API (OpenRouter, similar)
  2. Define routing rules (e.g., "fast < 2 sec, cheap < R$ 0.01, quality = accuracy")
  3. Send request to router (router decides which model)
  4. Router picks best model (balances cost, speed, quality)
  5. Get response (from chosen model)
  6. Router learns (which model worked best, adjust rules)

Benefit:

  • Automatic: No manual categorization needed
  • Learns: Router improves over time (learns which model for which task)
  • Adaptive: Switches between models as new ones arrive
  • Optimal: Balances cost, speed, quality automatically

Disadvantage:

  • Black box: Less control (router decides, you don't)
  • Cost: Router API has overhead (not free)
  • Complexity: Need to trust router's decisions

When to use:

  • Unpredictable patterns (don't know upfront which tasks are simple/complex)
  • Large volume (benefit of automation outweighs complexity)
  • Ambiguous task types (router handles edge cases)
  • Want optimization (router learns and improves)

Example:

request = "Can you analyze our Q1 sales and suggest improvements?"

Router analyzes:

  • Complexity: HIGH (needs analysis, pattern recognition)
  • Speed requirement: MEDIUM (no 2-sec SLA)
  • Cost requirement: FLEXIBLE (ROI depends on quality)

Router decides:

  • Quality > Cost > Speed
  • Use: GPT-4 (best quality)

get_response(request, model="gpt-4")

Vs.

request = "What's your refund policy?"

Router analyzes:

  • Complexity: LOW (simple lookup)
  • Speed requirement: HIGH (2-sec SLA, customers impatient)
  • Cost requirement: HIGH (R$ 0.01 per request cap)

Router decides:

  • Cost > Speed > Quality
  • Use: Claude 3.5 (cheap, fast, good enough)

get_response(request, model="claude-3.5")

Result:

  • Simple request: Uses cheap model (cost optimized)
  • Complex request: Uses expensive model (quality optimized)
  • Overall: Right model for right job (optimal ROI)

Strategy 3: Hybrid (manual + automatic)

OPTION: Combine manual + automatic (best of both)

Setup:

  1. Define high-level rules (simple = cheap, complex = expensive)
  2. Use automatic router (respects your rules, learns)
  3. Monitor and adjust (quarterly review, tune rules)
  4. Fallback to manual (if router fails, manual override)

Benefit:

  • Control: You set high-level rules (manual)
  • Automation: Router handles details (automatic)
  • Adaptive: Router learns and improves
  • Safety: Manual override if router goes wrong

Disadvantage:

  • Moderate complexity (more than single model, less than full automatic)
  • Requires monitoring (need to oversee router decisions)

When to use:

  • Want control + automation (balance)
  • Want to learn (understand router decisions)
  • Want safety net (manual override available)

Example:

Rules:

  • Speed < 2 sec, Cost < R$ 0.01 → Claude 3.5
  • Speed < 5 sec, Cost < R$ 0.02 → GPT-3.5
  • No constraints → GPT-4 (best quality)

Router:

  • Receives rules
  • Evaluates request
  • Picks model within rules
  • Reports decision + reasoning

You:

  • Monitor router decisions (weekly)
  • Adjust rules if needed (quarterly)
  • Override if router makes mistake (rare)

Cost with hybrid:

  • 40% simple (Claude) = R$ 400
  • 40% medium (GPT-3.5) = R$ 200
  • 20% complex (GPT-4) = R$ 600
  • Total: R$ 1.2k/month
  • Savings vs GPT-4 only: 60%

Strategy 4: Local models (route to on-device models)

OPTION: Use open-source local models (Llama 3, Mistral) for cheap tasks

Setup:

  1. Run local model on your hardware (Llama 3, Mistral, etc)
  2. Route simple requests to local model (zero cost, instant)
  3. Route complex requests to cloud model (OpenAI, Anthropic)
  4. Hybrid: Local for cheap, cloud for expensive

Benefit:

  • No API cost (local model is free)
  • Fast response (local = instant, no cloud latency)
  • Privacy (data stays local, doesn't go to cloud)
  • Scalable (can handle unlimited local requests)

Disadvantage:

  • Setup complexity (need to host local model)
  • Maintenance (need to update models, monitor performance)
  • Lower quality (local models < cloud models)
  • Hardware cost (need GPU for local inference)

When to use:

  • High volume + simple requests (local can handle)
  • Privacy sensitive (can't send to cloud)
  • Cost critical (local is free)
  • Have hardware (already have GPU/server)

Example:

if simple_request:

Local model (instant, free)

response = local_llama_3.generate(request) else:

Cloud model (cloud, paid)

response = openai.ChatCompletion.create(model="gpt-4", messages=request)

Cost:

  • Simple (local): R$ 0 (free)
  • Complex (cloud): R$ 0.03 per request
  • Total: Only pay for complex (R$ 600 for 20% of requests)
  • Savings: 80% (vs GPT-4 only)
  • Hardware cost: ~R$ 5k for GPU (one-time)

Conclusão: Model routing is not luxury, it's necessity

**O que você precisa saber:

  1. OpenRouter $113M Series B is institutional validation (model routing matters)

    • OpenRouter: Connects to 100+ LLMs (GPT-4, Claude, Llama, Mistral, etc)
    • Investors: Put $113M into this business (think it's valuable)
    • Implication: Model routing is strategic advantage
    • Lesson: Smart routing = 50% cost reduction (same quality)
  2. Single-model agente is suboptimal (leaving money on table)

    • Using GPT-4 for everything (simple + complex)
    • Simple requests don't need GPT-4 (cheaper model would work)
    • Complex requests need GPT-4 (justify the cost)
    • Result: Overpaying on average (mix of simple + complex)
    • Lesson: Right tool for right job = better ROI
  3. Cost of single model scales linearly (routing scales sublinearly)

    • Single GPT-4: Cost = tokens × R$ 0.03 (always expensive)
    • Routing: Cost = 60% simple (cheap) + 40% complex (expensive) = R$ 0.015 avg
    • As volume grows: Cost difference grows (100k requests = R$ 12k/month saved)
    • Lesson: Routing savings multiply with scale
  4. Quality doesn't require expensive model for all tasks

    • Simple request ("What's our hours?"): Cheap model is fine (99% accuracy)
    • Complex request ("Analyze churn patterns"): Expensive model needed (95%+ accuracy)
    • Using expensive model for simple: Waste (like Ferrari for corner store)
    • Using routing: Right model for job (both accurate, optimized cost)
    • Lesson: Intelligent routing = same quality, better cost
  5. Your agente can implement routing today (3 strategies)

    • Manual routing: You decide model per task (simple)
    • Automatic routing: Router decides (optimal, learning)
    • Local models: Free models for cheap tasks (maximum savings)
    • Hybrid: Mix of all (balanced, safe)
    • Lesson: Start with manual (easy), graduate to automatic (optimal)

Na OpenClaw, ajudamos SaaS a:

  • EVALUATE your current model (are you using expensive for everything?)
  • CATEGORIZE requests (simple, medium, complex)
  • ROUTE smartly (right model for right job)
  • MONITOR costs (track savings, optimize rules)
  • SCALE efficiently (routing benefits multiply with volume)
  • OPTIMIZE ROI (same quality, 50% lower cost)

Resultado: Seu agente IA usa MODELO CERTO (não overpaying) + CUSTO 50% MENOR (routing por task) + QUALIDADE MANTIDA (expensive model when needed) + ROI OTIMIZADO (better margins) + ESCALÁVEL (routing scales efficiently).

Seu agente IA usa 1 modelo caro (pra tudo)?

Ou você já implementou model routing?

Implementar routing no seu agente →


Publicado em 31 de maio de 2026

Leia também