Rate Limits
Rate Limit Tiers
Section titled “Rate Limit Tiers”| Tier | RPM (Requests/Min) | TPM (Tokens/Min) | Concurrency |
|---|---|---|---|
| Free (new signup) | 20 | 50,000 | 2 |
| Starter | 60 | 200,000 | 5 |
| Pro | 300 | 1,000,000 | 20 |
| Enterprise | Custom | Custom | Custom |
Rate Limit Headers
Section titled “Rate Limit Headers”Every response includes rate limit headers:
x-ratelimit-limit-requests: 60x-ratelimit-remaining-requests: 58x-ratelimit-reset-requests: 45sx-ratelimit-limit-tokens: 200000x-ratelimit-remaining-tokens: 189000x-ratelimit-reset-tokens: 30sHandling Rate Limits
Section titled “Handling Rate Limits”When you hit a rate limit, the API returns 429 Too Many Requests. Implement exponential backoff:
import timeimport randomfrom openai import OpenAI, RateLimitError
client = OpenAI( api_key="tsn_live_xxx", base_url="https://api.tokensupernova.com/v1",)
def chat_with_retry(messages, model="deepseek-chat", max_retries=5): for attempt in range(max_retries): try: return client.chat.completions.create( model=model, messages=messages, ) except RateLimitError: if attempt == max_retries - 1: raise delay = (2 ** attempt) + random.uniform(0, 1) time.sleep(delay)- Batch requests when possible instead of rapid-fire single requests
- Cache responses for repeated prompts
- Monitor headers to avoid hitting limits
- Upgrade your tier for production workloads