跳转到内容

速率限制

等级RPM(请求/分钟)TPM(Token/分钟)并发数
免费(新注册)2050,0002
入门60200,0005
专业3001,000,00020
企业自定义自定义自定义

每个响应都包含速率限制信息:

x-ratelimit-limit-requests: 60
x-ratelimit-remaining-requests: 58
x-ratelimit-reset-requests: 45s
x-ratelimit-limit-tokens: 200000
x-ratelimit-remaining-tokens: 189000
x-ratelimit-reset-tokens: 30s

触发速率限制时,API 返回 429 Too Many Requests。实现指数退避

import time
import random
from openai import OpenAI, RateLimitError
client = OpenAI(
api_key="tsn_live_xxx",
base_url="https://api.tokensupernova.com/v1",
)
def chat_with_retry(messages, model="deepseek-chat", max_retries=5):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model=model,
messages=messages,
)
except RateLimitError:
if attempt == max_retries - 1:
raise
delay = (2 ** attempt) + random.uniform(0, 1)
time.sleep(delay)
  • 批量请求——尽可能合并请求,而不是高频单次请求
  • 缓存响应——对重复 prompt 缓存结果
  • 监控响应头——避免触碰限制
  • 升级套餐——生产环境建议升级