Knowledge
Fix the OpenAI API rate limit (429) error
#AI
A 429 from the OpenAI API means you have sent requests faster than your account is allowed, or run out of quota. Here is what the error actually means and how to fix it with backoff, batching, and the right limits.
Published by Mark van Eijk on July 1, 2026
Updated on July 1, 2026 · 2 minute read
- About the 429 error
- Why do I see this error
- Fix a speed-related 429
- Reduce how much you send
- Fix a quota-related 429
- Raise your rate limit
About the 429 error
429 Too Many Requests is the OpenAI API telling you it's refusing a request because you've gone over a limit. It's the AI-API equivalent of any other HTTP rate-limit response — the request was well-formed, the server just won't process it right now.
There are two very different causes hiding behind the same status code, and fixing the wrong one wastes time.
Why do I see this error
Read the JSON body first — it names the real cause:
- Rate limit exceeded — you sent requests faster than your account's requests per minute (RPM) or tokens per minute (TPM) allows. This is a speed problem.
insufficient_quota— your account is out of credit or has hit its spending limit. This is a billing problem, and slowing down won't help.
A 429 that appears even though you're barely sending any requests is almost always the second kind.
Fix a speed-related 429
The correct response to a rate limit is to back off and retry, not to hammer the endpoint. Use exponential backoff with jitter: wait a moment, and double the wait on each retry, with a little randomness so multiple clients don't all retry at the same instant.
import time, random
from openai import OpenAI, RateLimitError
client = OpenAI()
def chat_with_retry(messages, retries=5):
for attempt in range(retries):
try:
return client.chat.completions.create(
model="gpt-4o-mini", messages=messages
)
except RateLimitError:
if attempt == retries - 1:
raise
wait = (2 ** attempt) + random.random()
time.sleep(wait)
The official SDKs already retry with backoff internally, so if you're using one, raise its retry count rather than writing your own loop. If you call the API directly over HTTP, honour the Retry-After header when it's present — it tells you exactly how long to wait. The same pattern applies in PHP; see use the OpenAI API in Laravel.
Reduce how much you send
Backoff treats the symptom. To stop hitting the limit at all, lower your request and token rate:
- Batch multiple items into one request instead of one call each.
- Cache repeated or identical prompts so you don't pay to ask twice.
- Cap
max_tokensso a runaway response can't burn your TPM budget. - Pick a smaller model — it has higher rate limits and costs less per token.
Fix a quota-related 429
If the body says insufficient_quota, no amount of backoff will help. Open your OpenAI dashboard and:
- Check your remaining credit and add a payment method if needed.
- Review your monthly spending limit — you may have hit a cap you set yourself.
- Confirm you're using the right API key for the right project.
Raise your rate limit
Rate limits scale with your usage tier, which increases automatically as your account spends more over time. You can't set the numbers by hand on the lower tiers, so the only durable fix is to let the tier mature while keeping usage efficient with the batching and caching above. If a fixed, predictable limit matters more to you than scaling, running a model on your own server removes the per-minute cap entirely.
Subscribe to our newsletter
Do you want to receive regular updates with fresh and exclusive content to learn more about web development, hosting, security and performance? Subscribe now!
Frequently asked questions
- What does a 429 error from OpenAI mean?
- HTTP 429 is "Too Many Requests". OpenAI returns it when you exceed your rate limit, measured in requests per minute (RPM) or tokens per minute (TPM), or when you have run out of quota or credit. The response body names which limit you hit, so always read it before assuming it is just speed.
- How do I fix a 429 caused by too many requests per minute?
- Slow down and retry with exponential backoff: wait, then double the wait on each subsequent failure, with a little random jitter so many clients do not retry in lockstep. Most official SDKs do this automatically, but if you call the API yourself you have to implement it. Honour the Retry-After header when it is present.
- Why do I get a 429 even though I am barely making any requests?
- That is usually a quota or billing problem, not a speed problem. A 429 with an "insufficient_quota" error means your account is out of credit or has hit its spending limit, not that you are sending too fast. Check your usage and billing in the OpenAI dashboard, and add credit or raise the limit.
- How do I raise my OpenAI rate limit?
- Rate limits scale with your usage tier, which rises automatically as your account spends more over time. You cannot set them by hand on the lower tiers, so the levers are: add a payment method, let your tier mature, and in the meantime batch requests and reduce token usage to stay under the cap.