Skip to main content
IDProviderContextInput $/1MOutput $/1MBest for
anthropic/claude-opus-4-8Anthropic200k$15$75Deep reasoning, complex code
anthropic/claude-sonnet-4-6Anthropic200k$3$15Sweet spot price/quality
anthropic/claude-haiku-4-5Anthropic200k$1$5Simple tasks, high volume
google/gemini-2.5-proGoogle1M$1.25$5Long context, multimodal
google/gemini-2.5-flashGoogle1M$0.15$0.60Cheapest in catalog
openai/gpt-5OpenAI400k$1.25$10General reasoning
openai/gpt-4.1OpenAI1M$2$8Long context
openai/gpt-4.1-miniOpenAI1M$0.40$1.60Cheap OpenAI
openai/o4-miniOpenAI200k$1.10$4.40Reasoning (CoT)
deepseek/deepseek-chatDeepSeek64k$0.27$1.10Open-weight, very cheap
deepseek/deepseek-reasonerDeepSeek64k$0.55$2.19Open-weight reasoning
moonshot/kimi-k2Moonshot256k$0.60$2.50Chinese model, strong at code
moonshot/moonshot-v1-128kMoonshot128k$1.66$1.66Symmetric cost
xai/grok-4xAI256k$3$15Access to X data
xai/grok-3xAI131k$3$15Previous gen
xai/grok-3-minixAI131k$0.30$0.50Cheap xAI

When to use each

For critical tasks with budget

Claude Opus 4.8 or GPT-5. Top of class in reasoning.

For production at scale

Claude Sonnet 4.6. Price/quality balance. You’d pick it blind if you didn’t know the rest.

For high volume / low cost

Gemini 2.5 Flash or DeepSeek Chat. Sub-dollar per 1M tokens.

For reasoning (chain of thought, step-by-step)

DeepSeek Reasoner or o4-mini. Specifically designed for structured reasoning.

For very long context

Gemini 2.5 Pro/Flash (1M tokens) or GPT-4.1 (1M). Process entire documents.

For code

Kimi K2 or Claude Sonnet 4.6. Strong code performance.

Natural failover

Because all models share the same endpoint and SDK, failover across providers is trivial:
def call_with_fallback(messages):
    for model in ["anthropic/claude-sonnet-4-6", "openai/gpt-5", "google/gemini-2.5-pro"]:
        try:
            return client.chat.completions.create(model=model, messages=messages)
        except Exception:
            continue
    raise RuntimeError("All providers failed")