Chat models - Geek Hub

ID	Provider	Context	Input $/1M	Output $/1M	Best for
`anthropic/claude-opus-4-8`	Anthropic	200k	$15	$75	Deep reasoning, complex code
`anthropic/claude-sonnet-4-6`	Anthropic	200k	$3	$15	Sweet spot price/quality
`anthropic/claude-haiku-4-5`	Anthropic	200k	$1	$5	Simple tasks, high volume
`google/gemini-2.5-pro`	Google	1M	$1.25	$5	Long context, multimodal
`google/gemini-2.5-flash`	Google	1M	$0.15	$0.60	Cheapest in catalog
`openai/gpt-5`	OpenAI	400k	$1.25	$10	General reasoning
`openai/gpt-4.1`	OpenAI	1M	$2	$8	Long context
`openai/gpt-4.1-mini`	OpenAI	1M	$0.40	$1.60	Cheap OpenAI
`openai/o4-mini`	OpenAI	200k	$1.10	$4.40	Reasoning (CoT)
`deepseek/deepseek-chat`	DeepSeek	64k	$0.27	$1.10	Open-weight, very cheap
`deepseek/deepseek-reasoner`	DeepSeek	64k	$0.55	$2.19	Open-weight reasoning
`moonshot/kimi-k2`	Moonshot	256k	$0.60	$2.50	Chinese model, strong at code
`moonshot/moonshot-v1-128k`	Moonshot	128k	$1.66	$1.66	Symmetric cost
`xai/grok-4`	xAI	256k	$3	$15	Access to X data
`xai/grok-3`	xAI	131k	$3	$15	Previous gen
`xai/grok-3-mini`	xAI	131k	$0.30	$0.50	Cheap xAI

When to use each

For critical tasks with budget

Claude Opus 4.8 or GPT-5. Top of class in reasoning.

For production at scale

Claude Sonnet 4.6. Price/quality balance. You’d pick it blind if you didn’t know the rest.

For high volume / low cost

Gemini 2.5 Flash or DeepSeek Chat. Sub-dollar per 1M tokens.

For reasoning (chain of thought, step-by-step)

DeepSeek Reasoner or o4-mini. Specifically designed for structured reasoning.

For very long context

Gemini 2.5 Pro/Flash (1M tokens) or GPT-4.1 (1M). Process entire documents.

For code

Kimi K2 or Claude Sonnet 4.6. Strong code performance.

Natural failover

Because all models share the same endpoint and SDK, failover across providers is trivial:

def call_with_fallback(messages):
    for model in ["anthropic/claude-sonnet-4-6", "openai/gpt-5", "google/gemini-2.5-pro"]:
        try:
            return client.chat.completions.create(model=model, messages=messages)
        except Exception:
            continue
    raise RuntimeError("All providers failed")

​When to use each

​For critical tasks with budget

​For production at scale

​For high volume / low cost

​For reasoning (chain of thought, step-by-step)

​For very long context

​For code

​Natural failover