Overview

UnifyRoute provides OpenAI-compatible API endpoints that work with any OpenAI SDK or tool without modifications.

Base URL: http://localhost:6565/api/v1

Authentication: Include your API token in the Authorization header:

1
2
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  http://localhost:6565/api/v1/chat/completions

Chat Completions

Create a chat completion for a given prompt.

Endpoint: POST /api/v1/chat/completions

Request

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
curl -X POST http://localhost:6565/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 100
  }'

Request Parameters

ParameterTypeDefaultDescription
modelstring-Model identifier (required)
messagesarray-Array of message objects (required)
temperaturenumber1.0Sampling temperature (0-2)
top_pnumber1.0Nucleus sampling parameter
top_knumber-Top-k sampling parameter
max_tokensnumber-Maximum tokens in response
frequency_penaltynumber0Frequency penalty (-2.0 to 2.0)
presence_penaltynumber0Presence penalty (-2.0 to 2.0)
streambooleanfalseStream response tokens
userstring-Unique user identifier

Response

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "created": 1677649420,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 30,
    "completion_tokens": 10,
    "total_tokens": 40
  }
}

Text Completions

Generate text completion for a given prompt.

Endpoint: POST /api/v1/completions

Request

1
2
3
4
5
6
7
8
9
curl -X POST http://localhost:6565/api/v1/completions \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-davinci-003",
    "prompt": "The future of AI is",
    "temperature": 0.7,
    "max_tokens": 50
  }'

Request Parameters

ParameterTypeDefaultDescription
modelstring-Model identifier (required)
promptstring-Text prompt (required)
temperaturenumber1.0Sampling temperature
max_tokensnumber100Max completion length
top_pnumber1.0Nucleus sampling
frequency_penaltynumber0Frequency penalty
presence_penaltynumber0Presence penalty
streambooleanfalseStream response

List Models

Get available models from configured providers.

Endpoint: GET /api/v1/models

Request

1
2
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  http://localhost:6565/api/v1/models

Response

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
  "object": "list",
  "data": [
    {
      "id": "gpt-3.5-turbo",
      "object": "model",
      "created": 1688660000,
      "owned_by": "openai",
      "provider": "openai"
    },
    {
      "id": "gpt-4",
      "object": "model",
      "created": 1687882411,
      "owned_by": "openai",
      "provider": "openai"
    },
    {
      "id": "claude-2",
      "object": "model",
      "created": 1693052800,
      "owned_by": "anthropic",
      "provider": "anthropic"
    }
  ]
}

Error Handling

Errors are returned in standard OpenAI format:

1
2
3
4
5
6
7
8
{
  "error": {
    "message": "Insufficient quota",
    "type": "insufficient_quota",
    "param": null,
    "code": "quota_exceeded"
  }
}

Error Codes

CodeStatusDescription
invalid_request_error400Invalid request parameters
authentication_error401Invalid or missing API token
permission_error403Token lacks required permissions
not_found_error404Resource not found
rate_limit_error429Rate limit exceeded
server_error500UnifyRoute server error
provider_error502Error from LLM provider
quota_exceeded429Provider quota exceeded

Streaming Responses

For streaming responses, set stream: true in the request:

1
2
3
4
5
6
7
8
9
curl -X POST http://localhost:6565/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }' \
  --stream

Responses come as Server-Sent Events (SSE):

data: {"choices":[{"delta":{"content":"Hello"}...}]}

data: {"choices":[{"delta":{"content":" there"}...}]}

data: [DONE]

Rate Limiting

UnifyRoute enforces rate limits based on your API token configuration.

Rate Limit Headers:

X-RateLimit-Limit-Requests: 100
X-RateLimit-Limit-Tokens: 10000
X-RateLimit-Remaining-Requests: 99
X-RateLimit-Remaining-Tokens: 9950
X-RateLimit-Reset: 1234567890

Provider Routing

Request routing is handled automatically based on your configuration. You can optionally specify provider preferences:

1
2
3
4
5
6
{
  "model": "gpt-3.5-turbo",
  "messages": [...],
  "provider": "openai",
  "tags": ["production", "high-priority"]
}

Webhook Notifications

Configure webhooks for events like quota changes or provider failures:

POST /api/v1/webhooks - Register webhook PUT /api/v1/webhooks/{id} - Update webhook DELETE /api/v1/webhooks/{id} - Delete webhook

Webhook events include:

  • provider.quota_exceeded
  • provider.offline
  • provider.online
  • token.created
  • token.revoked
Last updated: January 1, 0001