Best Practices
Prompt Engineering
Section titled “Prompt Engineering”Use System Messages
Section titled “Use System Messages”Set the behavior and tone with system messages:
messages = [ {"role": "system", "content": "You are an expert Python developer. Answer with code examples."}, {"role": "user", "content": "How do I sort a dictionary by value?"},]Be Specific
Section titled “Be Specific”Chinese models respond well to detailed, structured prompts:
❌ Bad: "Write code"✅ Good: "Write a Python function that takes a list of integers and returns the top 3 most frequent values"Performance Optimization
Section titled “Performance Optimization”Set max_tokens
Section titled “Set max_tokens”Limit response length to reduce cost and latency:
response = client.chat.completions.create( model="deepseek-chat", messages=[{"role": "user", "content": "Summarize: ..."}], max_tokens=200, # Short summary)Use Streaming for UX
Section titled “Use Streaming for UX”For chat interfaces, always stream to show tokens as they arrive:
stream = client.chat.completions.create( model="deepseek-chat", messages=messages, stream=True,)Cost Management
Section titled “Cost Management”Track Token Usage
Section titled “Track Token Usage”Always log usage to monitor costs:
response = client.chat.completions.create(...)print(f"Cost: {response.usage.total_tokens} tokens")# prompt_tokens, completion_tokens, total_tokensChoose the Right Model
Section titled “Choose the Right Model”| Task | Best Model | Why |
|---|---|---|
| Chat / Support | deepseek-chat | Best price-performance |
| Math / Logic | deepseek-reasoner | Specialized reasoning |
| Translation | qwen-max | Multilingual optimized |
| Chinese content | glm-4 | Chinese-native model |
| Budget tasks | qwen-plus | Cheap, still capable |
Error Handling
Section titled “Error Handling”Always handle API errors gracefully:
from openai import ( APIError, RateLimitError, APIConnectionError, AuthenticationError)
try: response = client.chat.completions.create(...)except AuthenticationError: print("Invalid API key")except RateLimitError: print("Too many requests — backing off") time.sleep(10)except APIConnectionError: print("Network issue — retrying")except APIError as e: print(f"API error: {e}")Security
Section titled “Security”- Use environment variables, never hardcode keys
- Rotate keys periodically
- Use separate keys for development and production
- Never expose keys in client-side code or public repos