Skip to main content
Endpoint /v1/audio/speech is OpenAI-SDK compatible: same body shape, same response shape (binary audio with Content-Type per format). Supports full or streaming responses and respects org-level ZDR flags.

Syntax

curl -X POST https://api.geekhub.mx/v1/audio/speech \
  -H "Authorization: Bearer ghub_sk_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -o speech.mp3 \
  -d '{
    "model": "openai/tts-1",
    "input": "Hello, this is a voice synthesis test.",
    "voice": "alloy",
    "response_format": "mp3",
    "speed": 1.0
  }'
Response is binary audio directly (not JSON). Content-Type indicates the format.

Output formats

response_formatContent-TypeWhen to use
mp3 (default)audio/mpegCompatible with all browsers and players
opusaudio/opusLow-latency bidirectional streaming (WebRTC)
wavaudio/wavPost-processing (ASR, mixing). Lossless
pcmaudio/L16No container, custom engine integration
aacaudio/aacNative iOS / Apple devices
flacaudio/flacLossless, compressed. Archival

Real-time streaming

Pass stream: true to receive audio in chunks as the model generates it (lower perceptual latency):
curl -N -X POST https://api.geekhub.mx/v1/audio/speech \
  -H "Authorization: Bearer ghub_sk_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/tts-1",
    "input": "Text to play while it generates...",
    "voice": "nova",
    "response_format": "opus",
    "stream": true
  }' > stream.opus

Tone instructions (gpt-4o-mini-tts)

The openai/gpt-4o-mini-tts model accepts an instructions field with style prompting:
{
  "model": "openai/gpt-4o-mini-tts",
  "input": "Welcome to Geek Hub.",
  "voice": "verse",
  "response_format": "mp3",
  "instructions": "Speak with a warm, professional tone, paced and clear."
}

Available models

Model IDVoicesPriceZDR
openai/tts-1alloy, echo, fable, onyx, nova, shimmer$15 / 1M chars
openai/tts-1-hdalloy, echo, fable, onyx, nova, shimmer$30 / 1M chars
openai/gpt-4o-mini-ttsalloy, echo, fable, onyx, nova, shimmer, verse$12 / 1M chars

Pricing structure

TTS models are charged per characters of synthesized text, not tokens or seconds. This mirrors provider pricing and makes cost predictable: 1000 characters ≈ 150 words regardless of audio duration. Geek Hub applies the standard +5% markup.

ZDR pre-flight

If your org requires ZDR for the TTS model’s group, or if the request includes zdr: true, the gateway verifies before processing. Unverified → HTTP 422 with alternative models. See Zero Data Retention.