速率限制
速率限制等级
Section titled “速率限制等级”| 等级 | RPM(请求/分钟) | TPM(Token/分钟) | 并发数 |
|---|---|---|---|
| 免费(新注册) | 20 | 50,000 | 2 |
| 入门 | 60 | 200,000 | 5 |
| 专业 | 300 | 1,000,000 | 20 |
| 企业 | 自定义 | 自定义 | 自定义 |
速率限制响应头
Section titled “速率限制响应头”每个响应都包含速率限制信息:
x-ratelimit-limit-requests: 60x-ratelimit-remaining-requests: 58x-ratelimit-reset-requests: 45sx-ratelimit-limit-tokens: 200000x-ratelimit-remaining-tokens: 189000x-ratelimit-reset-tokens: 30s处理速率限制
Section titled “处理速率限制”触发速率限制时,API 返回 429 Too Many Requests。实现指数退避:
import timeimport randomfrom openai import OpenAI, RateLimitError
client = OpenAI( api_key="tsn_live_xxx", base_url="https://api.tokensupernova.com/v1",)
def chat_with_retry(messages, model="deepseek-chat", max_retries=5): for attempt in range(max_retries): try: return client.chat.completions.create( model=model, messages=messages, ) except RateLimitError: if attempt == max_retries - 1: raise delay = (2 ** attempt) + random.uniform(0, 1) time.sleep(delay)- 批量请求——尽可能合并请求,而不是高频单次请求
- 缓存响应——对重复 prompt 缓存结果
- 监控响应头——避免触碰限制
- 升级套餐——生产环境建议升级