Rate Limiting

Master this essential documentation concept

Quick Definition

A control mechanism that restricts how many API requests a user or application can make within a given time period, typically documented in API reference guides.

How Rate Limiting Works

sequenceDiagram participant Client as API Client participant Gateway as API Gateway participant Counter as Rate Limit Counter participant API as Backend API Client->>Gateway: POST /api/data (Request #1) Gateway->>Counter: Check request count (user_id: 123) Counter-->>Gateway: 45/60 requests used Gateway->>API: Forward request API-->>Client: 200 OK + X-RateLimit-Remaining: 15 Client->>Gateway: POST /api/data (Request #61) Gateway->>Counter: Check request count (user_id: 123) Counter-->>Gateway: 60/60 requests used — LIMIT REACHED Gateway-->>Client: 429 Too Many Requests Note over Client,Gateway: Retry-After: 30s header included

Understanding Rate Limiting

A control mechanism that restricts how many API requests a user or application can make within a given time period, typically documented in API reference guides.

Key Features

  • Centralized information management
  • Improved documentation workflows
  • Better team collaboration
  • Enhanced user experience

Benefits for Documentation Teams

  • Reduces repetitive documentation tasks
  • Improves content consistency
  • Enables better content reuse
  • Streamlines review processes

Keeping Rate Limiting Knowledge Out of Video Silos

When your engineering team implements rate limiting for a new API integration, the details often get explained once — during a sprint review, an onboarding call, or a recorded architecture walkthrough. Someone screen-shares the API reference, walks through the request thresholds, and explains what happens when a client exceeds the allowed call volume. It feels thorough in the moment.

The problem surfaces two months later when a developer on a different team hits a 429 error and has no idea where to look. They know someone explained rate limiting in a meeting recording, but scrubbing through 45 minutes of video to find a two-minute explanation is rarely how anyone wants to spend their afternoon. Critical details — like per-endpoint limits, retry strategies, or backoff intervals — stay buried in recordings that are difficult to search and easy to overlook.

Converting those recordings into structured documentation changes how your team accesses this information. Instead of rewatching a full demo, someone can search directly for "rate limiting" and land on a clear explanation with the specific thresholds and handling logic your team actually uses — pulled from the original discussion, not rewritten from scratch.

If your team regularly captures API decisions and integration guidance through recorded meetings or training sessions, see how you can turn those recordings into searchable, reusable documentation.

Real-World Documentation Use Cases

Documenting Tiered Rate Limits for a SaaS API with Free and Paid Plans

Problem

Developers integrating a SaaS API (e.g., a weather or payments API) hit 429 errors without understanding why their tier has different limits than examples in the docs, leading to frustrated support tickets and failed integrations.

Solution

Rate limiting documentation explicitly maps each subscription tier (Free, Pro, Enterprise) to specific request quotas, window durations, and HTTP response headers, so developers can design their retry logic before writing a single line of code.

Implementation

['Create a limits reference table listing each plan tier with its requests-per-minute, requests-per-day, and burst allowance values side by side.', 'Document every rate-limit-related HTTP header (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After) with example values and explanations of how to parse them.', 'Add a code snippet in Python, Node.js, and cURL showing how to read the Retry-After header and implement exponential backoff when a 429 is received.', 'Include a troubleshooting section mapping common error scenarios (e.g., batch jobs exhausting daily quota at midnight) to their root causes and recommended architectural fixes.']

Expected Outcome

Support tickets related to 429 errors drop by 40%, and developers successfully implement compliant retry logic before their first production deployment.

Writing Rate Limit Runbooks for Internal Microservices Teams

Problem

An engineering team running dozens of microservices has no shared documentation on rate limits between services, causing cascading failures when Service A hammers Service B during peak load, with no runbook for on-call engineers to follow.

Solution

A centralized internal runbook documents per-service rate limit thresholds, the circuit breaker behavior triggered at 80% capacity, and step-by-step remediation procedures engineers can follow during a 3 AM incident.

Implementation

['Audit all inter-service HTTP calls and document the rate limit configuration (requests/second, burst size) for each service pair in a shared Confluence or Notion page.', 'Define alert thresholds (e.g., >70% of rate limit consumed triggers a warning; >95% triggers a PagerDuty alert) and document what each alert means and who owns the response.', "Write a decision-tree runbook: 'If you receive a 429 from the Inventory Service, check the X-RateLimit-Reset header, wait, then retry. If retries exceed 3, escalate to the Inventory team.'", "Schedule quarterly reviews to update limits as services scale, and link the runbook from the service's README and monitoring dashboard."]

Expected Outcome

Mean time to resolution (MTTR) for rate-limit-induced incidents decreases from 45 minutes to under 10 minutes, with on-call engineers resolving issues without escalation.

Documenting Rate Limits for a Public Developer Portal (e.g., Twitter/X or Stripe-style API)

Problem

A public API provider launches a new endpoint but fails to document its rate limits separately from global limits. Third-party developers build applications that work in testing but break in production because the new endpoint has a stricter per-endpoint limit.

Solution

The API reference documents rate limits at both the global account level and the per-endpoint level, with a dedicated 'Rate Limits' page that explains the hierarchy, scoping rules, and how limits interact when multiple endpoints are called concurrently.

Implementation

["Add a 'Rate Limits' section to every endpoint's reference page showing its specific limit (e.g., 'GET /v1/charges: 100 requests/minute per API key') alongside the global account limit.", 'Create a dedicated Rate Limits overview page explaining the limit hierarchy: global account limit > per-endpoint limit > per-IP limit, with a diagram showing which limit applies first.', 'Provide a sandbox environment with artificially low rate limits (e.g., 5 requests/minute) so developers can test their 429-handling logic without waiting for real quota exhaustion.', 'Publish a changelog entry whenever any rate limit changes, including the old value, new value, effective date, and migration guidance for affected use cases.']

Expected Outcome

Developer onboarding time decreases by 25%, and the API provider sees a 60% reduction in forum posts about unexpected 429 errors after launching new endpoints.

Creating Rate Limit Documentation for CI/CD Pipelines Consuming Third-Party APIs

Problem

A DevOps team's CI/CD pipeline makes hundreds of calls to a third-party API (e.g., GitHub API, Jira API) during parallel builds, hitting rate limits and causing random build failures that are difficult to reproduce and diagnose.

Solution

Internal documentation captures the third-party API's rate limit rules, explains how parallel CI jobs multiply request volume, and prescribes architectural patterns (request queuing, caching, token pooling) to stay within limits.

Implementation

["Document the third-party API's rate limit (e.g., 'GitHub API: 5,000 requests/hour per token') alongside a calculation showing how many CI jobs can run concurrently before the limit is breached.", "Write a configuration guide showing how to implement a shared Redis-based rate limit counter across CI workers so all parallel jobs share a single token's quota awareness.", 'Document a caching strategy for idempotent API calls (e.g., cache GitHub branch protection rules for 5 minutes) with example Nginx or Varnish cache configuration.', 'Add a monitoring guide showing how to emit a metric when remaining rate limit drops below 20%, and configure a Grafana alert to notify the DevOps team before builds start failing.']

Expected Outcome

CI/CD pipeline failures due to rate limiting drop to zero, and build times become predictable with p99 build duration variance reduced by 30%.

Best Practices

Document Rate Limits at Both the Global and Per-Endpoint Level

Many APIs enforce limits at multiple scopes simultaneously — an account-wide limit and a stricter per-endpoint limit. Documenting only the global limit leaves developers blindsided when a specific endpoint throttles them at a lower threshold. Each endpoint's reference page should explicitly state its own rate limit alongside the global account limit.

✓ Do: Add a 'Rate Limits' subsection to every endpoint's API reference page listing the specific requests-per-window value, and link to a global rate limits overview page that explains how multiple limits interact.
✗ Don't: Don't document rate limits only on a single overview page and assume developers will cross-reference it when reading individual endpoint docs — most won't, and they'll hit 429s in production.

Include All Rate-Limit HTTP Response Headers with Parsed Examples

Headers like X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After are the primary signals developers use to implement compliant retry logic. Without clear documentation of each header's format, unit (seconds vs. Unix timestamp), and meaning, developers either ignore them or misparse them. Concrete examples with real values eliminate ambiguity.

✓ Do: Document each header with its name, a real example value (e.g., 'X-RateLimit-Reset: 1693526400'), the unit and format (Unix epoch UTC), and a code snippet showing how to calculate the wait time from it.
✗ Don't: Don't list header names without example values or explanations of their format — a header named 'X-RateLimit-Reset' with no context leaves developers guessing whether the value is seconds-until-reset or a Unix timestamp.

Provide Working Code Examples for Exponential Backoff and Retry Logic

Telling developers to 'implement retry logic' without showing them how leads to naive implementations that retry immediately in a tight loop, making rate limit problems worse. Providing language-specific code examples for exponential backoff with jitter gives developers a correct, copy-paste starting point and reduces the risk of retry storms.

✓ Do: Include retry code examples in at least two popular languages (e.g., Python and JavaScript) that demonstrate reading the Retry-After header, waiting the specified duration, applying exponential backoff with random jitter on subsequent retries, and setting a maximum retry count.
✗ Don't: Don't provide only a conceptual description of backoff strategies without runnable code — abstract explanations lead to inconsistent and often incorrect implementations that can amplify load on the API.

Clearly Distinguish Between Rate Limit Window Types (Fixed, Sliding, Token Bucket)

A fixed 60-second window resets all at once, while a sliding window continuously rolls, and a token bucket allows short bursts above the average rate. These behave very differently under load, and developers who assume the wrong model will design systems that fail unpredictably. Documenting the algorithm type prevents mismatched mental models.

✓ Do: Name the rate limiting algorithm used (e.g., 'We use a sliding window counter') and explain its practical implication with an example: 'If you make 60 requests at 12:00:00, you cannot make another request until 12:01:00 under a fixed window, but under a sliding window, each request becomes available again 60 seconds after it was made.'
✗ Don't: Don't use vague language like '60 requests per minute' without specifying the window type — a developer building a batch job will design very different scheduling logic depending on whether the window is fixed or sliding.

Maintain a Rate Limit Changelog and Notify Developers Before Changes Take Effect

Silently changing rate limits breaks production applications and destroys developer trust. A versioned changelog entry for every rate limit change — including the old value, new value, effective date, and affected endpoints — gives developers time to adapt their applications before the change takes effect. This is especially critical for public APIs with external consumers.

✓ Do: Publish rate limit changes in your API changelog at least 30 days before they take effect, include the specific old and new values (e.g., 'Reduced from 1,000 req/min to 500 req/min for the /v2/search endpoint'), and send an email notification to registered API users with a link to the changelog entry.
✗ Don't: Don't update rate limits silently in a configuration change without a corresponding documentation update — even a small reduction in limits can cause cascading failures for high-volume API consumers who have no warning to adjust their integration.

How Docsie Helps with Rate Limiting

Build Better Documentation with Docsie

Join thousands of teams creating outstanding documentation

Start Free Trial