What is concurrency?
Concurrency refers to the number of API requests you can have in progress (or running) simultaneously. If your plan supports 10 concurrent requests, you can process up to 10 requests at the same time. You’ll get a rate limit error if you send an 11th request while 10 are already processing. Think of concurrency like a team of workers in an office. Each worker represents a “concurrent request slot.” If you have 10 workers, you can assign them 10 tasks (requests) simultaneously. If you try to assign an 11th task while all workers are occupied, you’ll need to wait until one worker finishes. In cloro, each “task” is an API request to an AI model, and each “worker” is a concurrent request slot available based on your subscription.Rate limits vs. concurrency limits
Free trial accounts are limited to 1 concurrent job. Upgrade for multi-threaded workloads. See the pricing table for concurrency limits per plan. cloro uses two different types of limits depending on the endpoint type:| Limit type | Endpoints affected | How it works |
|---|---|---|
| Rate limits | All endpoints (/v1/*) | 1,000 requests per second per endpoint |
| Concurrency limits | Monitor endpoints (/v1/monitor/*) | Based on your subscription plan (simultaneous requests) |
Monitoring concurrency with headers
Each response includes HTTP headers to help you manage your API usage:| Header | Description |
|---|---|
X-Concurrent-Limit | Total concurrent requests allowed by your plan |
X-Concurrent-Current | Number of requests currently processing |
X-Concurrent-Remaining | Available concurrent slots when request was received |
Monitoring rate limits with headers
All endpoints include rate limit headers in each response:| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests per second allowed (1,000) |
X-RateLimit-Remaining | Remaining requests available in this second |
/v1/* path has its own independent 1,000 RPS bucket. For example, if you make a request to /v1/monitor/chatgpt:
Need higher concurrency? Self-serve plans can be upgraded in the dashboard and the new limit applies immediately. Enterprise customers should email support@cloro.dev to upgrade their existing plan.
Using headers for optimization
Monitor these headers to optimize your request patterns:Implementation patterns
Most programming languages require explicit concurrency handling. Two common approaches:Pattern 1: Async with webhooks
For large-scale processing, submit tasks and handle results via webhooks. You don’t need to send requests in batches; cloro handles concurrency automatically. Send API requests for all your tasks concurrently (one request per task):Pattern 2: Concurrent workers
For real-time processing where you want immediate results, run multiple workers that make direct API calls:Quick reference
| Use case | Pattern | When to use |
|---|---|---|
| Large batches | Async + webhooks | Large batches, don’t need immediate results |
| Real-time results | Concurrent workers | Need immediate responses, smaller batches |
Common questions
Why am I getting 429 rate limit errors?
A 429 error means you’re hitting rate limits. This can happen for two reasons: Concurrency limit exceeded (monitor endpoints only) You’re making too many simultaneous requests beyond your plan’s concurrent request limit. Solution:- Check your current usage with response headers:
X-Concurrent-Limit,X-Concurrent-Current,X-Concurrent-Remaining - Implement request queuing in your application
- Use exponential backoff when retrying
- Upgrade your plan for higher limits
- Monitor
X-RateLimit-Remainingheader - Spread requests over time (the counter resets every second)
- Use the async queue for non-time-sensitive requests
- Implement retry logic with exponential backoff
How do I check my concurrency limit?
Your concurrency limit is shown in the response headers of every API call:Can I increase my concurrency limit?
Yes. Self-serve plans can be upgraded directly in the dashboard — the new limit applies immediately, no support ticket required. If you need concurrency above the highest self-serve tier, email support@cloro.dev for an enterprise quote.Does higher concurrency delay my logs or dashboards?
No. Dashboard log ingestion runs independently from request processing. If logs look delayed during heavy load, the cause is usually batching on the dashboard side, not concurrency — entries normally surface within a minute.Can I burst above my concurrency limit?
No. The limit is hard — the (N+1)th simultaneous request gets a429 immediately rather than queueing. Use the async API if you want cloro to handle queueing for you instead of managing burst capacity yourself.
What’s the best way to handle large batches of requests?
For large batches, choose the right pattern based on your needs: For non-time-sensitive batches (recommended):- Use Pattern 1: Async with webhooks
- Send all API requests concurrently. cloro handles queuing automatically
- Receive results via webhook when complete
- No need to manage concurrency yourself
- Use Pattern 2: Concurrent workers
- Respect your plan’s concurrency limit
- Monitor
X-Concurrent-Remainingheader - Implement exponential backoff for 429 errors