Skip to content

Best Practices

Set the behavior and tone with system messages:

messages = [
{"role": "system", "content": "You are an expert Python developer. Answer with code examples."},
{"role": "user", "content": "How do I sort a dictionary by value?"},
]

Chinese models respond well to detailed, structured prompts:

❌ Bad: "Write code"
✅ Good: "Write a Python function that takes a list of integers and returns the top 3 most frequent values"

Limit response length to reduce cost and latency:

response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Summarize: ..."}],
max_tokens=200, # Short summary
)

For chat interfaces, always stream to show tokens as they arrive:

stream = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
stream=True,
)

Always log usage to monitor costs:

response = client.chat.completions.create(...)
print(f"Cost: {response.usage.total_tokens} tokens")
# prompt_tokens, completion_tokens, total_tokens
TaskBest ModelWhy
Chat / Supportdeepseek-chatBest price-performance
Math / Logicdeepseek-reasonerSpecialized reasoning
Translationqwen-maxMultilingual optimized
Chinese contentglm-4Chinese-native model
Budget tasksqwen-plusCheap, still capable

Always handle API errors gracefully:

from openai import (
APIError, RateLimitError, APIConnectionError, AuthenticationError
)
try:
response = client.chat.completions.create(...)
except AuthenticationError:
print("Invalid API key")
except RateLimitError:
print("Too many requests — backing off")
time.sleep(10)
except APIConnectionError:
print("Network issue — retrying")
except APIError as e:
print(f"API error: {e}")
  • Use environment variables, never hardcode keys
  • Rotate keys periodically
  • Use separate keys for development and production
  • Never expose keys in client-side code or public repos