Token Station

Use any model in your agent and apps. One API key, one account, zero markup.

Get Started Available Models

Free models

No per-token charge — run them through the same OpenAI-compatible base URL as everything else. See all models →

nvidia-nim/nemotron-3-ultra-550b-a55b

nvidia-nim/gpt-oss-120b

nvidia-nim/qwen3-coder-480b-a35b-instruct

glm/glm-4-flash

xiaomi/mimo-v2.5-tts

xiaomi/mimo-v2.5-tts-voiceclone

xiaomi/mimo-v2.5-tts-voicedesign

Featured AI Models

A selection across modalities — every model works through one OpenAI-compatible base URL. See all models →

Supported APIs

Chat Completions — OpenAI-compatible universal LLM API /v1/chat/completions

Works with every LLM provider. Send OpenAI-format requests — the gateway translates to each provider's native format when needed, and preserves raw OpenAI request bytes for native OpenAI traffic. Supports text, images, streaming, and tool use.

curl -X POST http://GATEWAY/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer gw-YOUR_KEY" \
  -d '{
    "model": "openai/gpt-5.4",
    "messages": [
      {"role": "system", "content": "You are helpful."},
      {"role": "user", "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
      ]}
    ],
    "max_completion_tokens": 1024,
    "stream": true
  }'

from openai import OpenAI

client = OpenAI(
    base_url="http://GATEWAY/v1",
    api_key="gw-YOUR_KEY"
)

response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[
        {"role": "system", "content": "You are helpful."},
        {"role": "user", "content": [
            {"type": "text", "text": "What is in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
        ]}
    ],
    max_completion_tokens=1024,
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Responses API — OpenAI-compatible universal LLM API /v1/responses

Stateless /v1/responses works with every LLM provider. The gateway uses the native Responses endpoint where available (OpenAI, Groq, xAI, Bailian, Codex) and translates to Anthropic Messages or OpenAI Chat Completions behind the scenes for the rest. Stateful usage — threading reasoning continuity across turns via encrypted_content or Anthropic thinking blocks — is only preserved on providers that natively support it (OpenAI, OpenAI Codex, Anthropic, Claude Code); see the model list for the Stateful badge.

curl -X POST http://GATEWAY/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer gw-YOUR_KEY" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "input": [
      {
        "role": "user",
        "content": [
          {"type": "input_text", "text": "Describe this image in detail."},
          {"type": "input_image", "image_url": "https://example.com/photo.jpg"}
        ]
      }
    ]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="http://GATEWAY/v1",
    api_key="gw-YOUR_KEY"
)

response = client.responses.create(
    model="anthropic/claude-sonnet-4-6",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Describe this image in detail."},
                {"type": "input_image", "image_url": "https://example.com/photo.jpg"}
            ]
        }
    ]
)

print(response.output_text)

Anthropic Messages API — Anthropic-compatible universal LLM API /v1/messages

Works with every LLM provider. Native Anthropic and Claude Code requests pass through byte-for-byte (preserving signed thinking blocks); everything else is translated into the provider's native format and streamed back as Anthropic SSE. Use this when your client speaks the Anthropic contract.

curl -X POST http://GATEWAY/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer gw-YOUR_KEY" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "system": "You are helpful.",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Explain what this gateway does."}
    ],
    "stream": true
  }'

import httpx

resp = httpx.post(
    "http://GATEWAY/v1/messages",
    headers={
        "Authorization": "Bearer gw-YOUR_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "anthropic/claude-sonnet-4-6",
        "system": "You are helpful.",
        "max_tokens": 1024,
        "messages": [
            {"role": "user", "content": "Explain what this gateway does."}
        ]
    },
)

print(resp.json())

Speech to Text /v1/audio/transcriptions

Transcribe audio files using OpenAI Whisper models. Supports mp3, mp4, mpeg, mpga, m4a, wav, and webm.

curl -X POST http://GATEWAY/v1/audio/transcriptions \
  -H "Authorization: Bearer gw-YOUR_KEY" \
  -F file=@audio.mp3 \
  -F model=openai/gpt-4o-transcribe

from openai import OpenAI

client = OpenAI(
    base_url="http://GATEWAY/v1",
    api_key="gw-YOUR_KEY"
)

with open("audio.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="openai/gpt-4o-transcribe",
        file=f
    )

print(transcript.text)

Text to Speech /v1/audio/speech

Convert text to natural-sounding speech using OpenAI TTS models. Voices: alloy, echo, fable, onyx, nova, shimmer.

curl -X POST http://GATEWAY/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer gw-YOUR_KEY" \
  -d '{
    "model": "openai/tts-1",
    "input": "Hello, world! Welcome to Token Station.",
    "voice": "alloy"
  }' --output speech.mp3

from openai import OpenAI

client = OpenAI(
    base_url="http://GATEWAY/v1",
    api_key="gw-YOUR_KEY"
)

response = client.audio.speech.create(
    model="openai/tts-1",
    input="Hello, world! Welcome to Token Station.",
    voice="alloy"
)

response.stream_to_file("speech.mp3")

Image Generation /v1/images/generations

Generate images from text prompts. Supports OpenAI DALL-E / GPT Image, Google Imagen, and xAI Grok.

curl -X POST http://GATEWAY/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer gw-YOUR_KEY" \
  -d '{
    "model": "openai/gpt-image-1.5",
    "prompt": "A sunset over mountains, oil painting style",
    "n": 1,
    "size": "1024x1024"
  }'

from openai import OpenAI

client = OpenAI(
    base_url="http://GATEWAY/v1",
    api_key="gw-YOUR_KEY"
)

response = client.images.generate(
    model="openai/gpt-image-1.5",
    prompt="A sunset over mountains, oil painting style",
    n=1,
    size="1024x1024"
)

print(response.data[0].url)

Video Generation /v1/video/generations

Generate videos from text or images. Supports Gemini Veo, Kling, Bailian Wan, BytePlus Seedance, xAI Grok, and OpenAI Sora. The gateway handles async polling internally.

# Text-to-video
curl -X POST http://GATEWAY/v1/video/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer gw-YOUR_KEY" \
  -d '{
    "model": "gemini/veo-3.1-generate-preview",
    "prompt": "A timelapse of a flower blooming in a garden",
    "aspect_ratio": "16:9"
  }'

# Image-to-video
curl -X POST http://GATEWAY/v1/video/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer gw-YOUR_KEY" \
  -d '{
    "model": "kling/kling-v3",
    "prompt": "The character turns and walks away",
    "image": "https://example.com/photo.jpg",
    "duration": 5,
    "aspect_ratio": "16:9"
  }'

import openai

client = openai.OpenAI(
    base_url="http://GATEWAY/v1",
    api_key="gw-YOUR_KEY"
)

# Text-to-video (custom endpoint)
response = client.post(
    "/v1/video/generations",
    body={
        "model": "gemini/veo-3.1-generate-preview",
        "prompt": "A timelapse of a flower blooming in a garden",
        "aspect_ratio": "16:9"
    },
    cast_to=object
)

print(response)

Claude Code via Token Station

Claude Code speaks Anthropic-style APIs. Point it at Token Station's Anthropic-compatible /v1/messages surface and run OpenAI models through it — the gateway translates each turn. Configure it either with a ~/.claude/settings.json file or with environment variables.

mkdir -p ~/.claude
cat > ~/.claude/settings.json <<'EOF'
{
  "env": {
    "ANTHROPIC_BASE_URL": "https://models.bytefuture.ai",
    "ANTHROPIC_AUTH_TOKEN": "YOUR TOKEN AT TOKEN STATION",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "openai/gpt-5.5",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "openai/gpt-5.4-mini",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "openai/gpt-5.4-nano",
    "CLAUDE_CODE_SUBAGENT_MODEL": "openai/gpt-5.4-mini"
  }
}
EOF

claude -p "Respond with exactly the word: pong"

export ANTHROPIC_BASE_URL="https://models.bytefuture.ai"
export ANTHROPIC_AUTH_TOKEN="YOUR TOKEN AT TOKEN STATION"

export ANTHROPIC_DEFAULT_OPUS_MODEL="openai/gpt-5.5"
export ANTHROPIC_DEFAULT_SONNET_MODEL="openai/gpt-5.4-mini"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="openai/gpt-5.4-nano"
export CLAUDE_CODE_SUBAGENT_MODEL="openai/gpt-5.4-mini"

claude -p "Respond with exactly the word: pong"

Codex via Token Station

Run OpenAI models in Codex through Token Station. Unlike Claude Code, Codex can't be pointed at a custom endpoint with environment variables alone — its built-in OpenAI provider ignores OPENAI_BASE_URL and always dials api.openai.com — so you define a one-time custom provider in ~/.codex/config.toml (Codex requires wire_api = "responses"). Your Token Station token still lives in an environment variable, referenced by env_key.

mkdir -p ~/.codex
cat > ~/.codex/config.toml <<'EOF'
model = "openai/gpt-5.5"
model_provider = "token_station"

[model_providers.token_station]
name = "Token Station"
base_url = "https://models.bytefuture.ai/v1"
env_key = "TOKEN_STATION_API_KEY"
wire_api = "responses"
EOF

export TOKEN_STATION_API_KEY="YOUR TOKEN AT TOKEN STATION"

codex exec "Respond with exactly the word: pong"

Token Station

Free models

Featured AI Models

SOTA LLMs

Open-source LLMs

Agentic coding

Image understanding

Image generation

Video generation

Voice recognition

Text to voice

Supported APIs