> ## Documentation Index
> Fetch the complete documentation index at: https://cloro.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Concurrency

> Learn how to process multiple requests efficiently with the cloro API using async patterns, webhooks, and concurrent workers for optimal throughput.

## What is concurrency?

Concurrency refers to the number of API requests you can have in progress (or running) simultaneously. If your plan supports 10 concurrent requests, you can process up to 10 requests at the same time. You'll get a [rate limit error](#rate-limits-vs-concurrency-limits) if you send an 11th request while 10 are already processing.

Think of concurrency like a team of workers in an office. Each worker represents a "concurrent request slot." If you have 10 workers, you can assign them 10 tasks (requests) simultaneously. If you try to assign an 11th task while all workers are occupied, you'll need to wait until one worker finishes.

In cloro, each "task" is an API request to an AI model, and each "worker" is a concurrent request slot available based on your subscription.

## Rate limits vs. concurrency limits

Free trial accounts are limited to **1 concurrent job**. Upgrade for multi-threaded workloads. See [the pricing table](https://cloro.dev/#pricing) for concurrency limits per plan.

cloro uses two different types of limits depending on the endpoint type:

| Limit type             | Endpoints affected                  | How it works                                            |
| ---------------------- | ----------------------------------- | ------------------------------------------------------- |
| **Rate limits**        | All endpoints (`/v1/*`)             | 1,000 requests per second per endpoint                  |
| **Concurrency limits** | Monitor endpoints (`/v1/monitor/*`) | Based on your subscription plan (simultaneous requests) |

**Rate limits** restrict how many requests you can make per second, while **concurrency limits** restrict how many requests can be processing simultaneously. Monitor endpoints are subject to both rate limits (1,000/sec per endpoint) and concurrency limits (subscription-based).

## Monitoring concurrency with headers

Each response includes HTTP headers to help you manage your API usage:

| Header                   | Description                                          |
| ------------------------ | ---------------------------------------------------- |
| `X-Concurrent-Limit`     | Total concurrent requests allowed by your plan       |
| `X-Concurrent-Current`   | Number of requests currently processing              |
| `X-Concurrent-Remaining` | Available concurrent slots when request was received |

For example, if your plan supports 20 concurrent requests and you send 3 requests simultaneously:

```
X-Concurrent-Limit: 20
X-Concurrent-Current: 3
X-Concurrent-Remaining: 17
```

This means 17 slots were available when the request was processed.

## Monitoring rate limits with headers

All endpoints include rate limit headers in each response:

| Header                  | Description                                 |
| ----------------------- | ------------------------------------------- |
| `X-RateLimit-Limit`     | Maximum requests per second allowed (1,000) |
| `X-RateLimit-Remaining` | Remaining requests available in this second |

The limit is **per endpoint per API key** — each `/v1/*` path has its own independent 1,000 RPS bucket. For example, if you make a request to `/v1/monitor/chatgpt`:

```
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
```

This means you can make 999 more requests to that endpoint in the current second before hitting the rate limit. The counter resets every second.

<Info>
  **Need higher concurrency?** Self-serve plans can be upgraded in the [dashboard](https://dashboard.cloro.dev) and the new limit applies immediately. Enterprise customers should email [support@cloro.dev](mailto:support@cloro.dev) to upgrade their existing plan.
</Info>

## Using headers for optimization

Monitor these headers to optimize your request patterns:

```javascript theme={null}
function checkConcurrencyUsage(response) {
  const limit = parseInt(response.headers["x-concurrent-limit"]);
  const current = parseInt(response.headers["x-concurrent-current"]);
  const remaining = parseInt(response.headers["x-concurrent-remaining"]);

  console.log(`Concurrency: ${current}/${limit} (${remaining} available)`);

  // Adjust your batch size based on remaining slots
  return Math.min(remaining, 5); // Don't exceed 5 requests per batch
}
```

For all endpoints with rate limits, you can monitor usage and implement backoff:

```javascript theme={null}
async function checkRateLimit(response) {
  const limit = parseInt(response.headers["x-ratelimit-limit"]);
  const remaining = parseInt(response.headers["x-ratelimit-remaining"]);

  console.log(`Rate limit: ${remaining}/${limit} requests remaining`);

  // If you're running low on requests, wait before continuing
  if (remaining < 50) {
    const waitTime = 1000; // Wait 1 second for the counter to reset
    console.log(`Rate limit nearly exceeded. Waiting ${waitTime}ms...`);
    await new Promise((resolve) => setTimeout(resolve, waitTime));
  }

  return remaining;
}
```

## Implementation patterns

Most programming languages require explicit concurrency handling. Two common approaches:

### Pattern 1: Async with webhooks

For large-scale processing, submit tasks and handle results via webhooks. You don't need to send requests in batches; cloro handles concurrency automatically. Send API requests for all your tasks concurrently (one request per task):

<CodeGroup>
  ```javascript Node.js (axios) theme={null}
  import axios from "axios";

  const API_KEY = process.env.API_KEY;
  const TASK_API = "https://api.cloro.dev/v1/async/task";

  async function submitTasks(tasks, webhookUrl) {
    // Send API requests concurrently (one request per task)
    await Promise.all(
      tasks.map((task) =>
        axios.post(
          TASK_API,
          {
            taskType: "CHATGPT",
            webhook: { url: webhookUrl },
            payload: task,
          },
          {
            headers: { Authorization: `Bearer ${API_KEY}` },
          }
        )
      )
    );
  }

  // Webhook handler (Express.js)
  app.post("/webhook-handler", (req, res) => {
    const { task, response } = req.body;
    console.log(`Task ${task.id} completed: ${response.text.slice(0, 100)}...`);

    // Process your result here
    saveResult(task.id, response);

    // Always respond quickly
    res.status(200).send();
  });

  // Usage
  const tasks = [
    { prompt: "Analyze market trends", country: "US" },
    { prompt: "Research competitors", country: "US" },
    // ... hundreds more
  ];

  submitTasks(tasks, "https://your-app.com/webhook-handler");
  ```

  ```python Python (requests) theme={null}
  import requests
  import asyncio
  import json

  API_KEY = "YOUR_API_KEY"
  TASK_API = "https://api.cloro.dev/v1/async/task"

  async def submit_tasks(tasks, webhook_url):
      # Send API requests concurrently (one request per task)
      submit_tasks = []
      for task in tasks:
          submit_tasks.append(
              requests.post(
                  TASK_API,
                  json={
                      "taskType": "CHATGPT",
                      "webhook": {"url": webhook_url},
                      "payload": task
                  },
                  headers={"Authorization": f"Bearer {API_KEY}"}
              )
          )

      # Wait for all submissions to complete
      responses = await asyncio.gather(*[asyncio.to_thread(req) for req in submit_tasks])
      print("All tasks submitted")

  # Usage
  tasks = [
      {"prompt": "Analyze market trends", "country": "US"},
      {"prompt": "Research competitors", "country": "US"},
      # ... hundreds more
  ]

  asyncio.run(submit_tasks(tasks, "https://your-app.com/webhook-handler"))

  # Webhook handler example (Flask)
  """
  from flask import Flask, request, jsonify

  app = Flask(__name__)

  @app.route('/webhook-handler', methods=['POST'])
  def handle_webhook():
      data = request.json
      task = data.get('task', {})
      response = data.get('response', {})

      print(f"Task {task.get('id')} completed: {response.get('text', '')[:100]}...")

      # Process your result here
      save_result(task.get('id'), response)

      return '', 200
  """
  ```

  ```bash cURL theme={null}
  #!/bin/bash

  API_KEY="YOUR_API_KEY"
  TASK_API="https://api.cloro.dev/v1/async/task"
  WEBHOOK_URL="https://your-app.com/webhook-handler"

  # Submit tasks individually (cloro handles concurrency automatically)
  submit_task() {
    local prompt="$1"
    local country="$2"

    curl -s -X POST "$TASK_API" \
      -H "Authorization: Bearer $API_KEY" \
      -H "Content-Type: application/json" \
      -d "{
        \"taskType\": \"CHATGPT\",
        \"webhook\": {\"url\": \"$WEBHOOK_URL\"},
        \"payload\": {
          \"prompt\": \"$prompt\",
          \"country\": \"$country\"
        }
      }" &
  }

  # Example usage - send API requests concurrently
  tasks=(
    "Analyze market trends|US"
    "Research competitors|US"
    "Extract pricing data|US"
    "Summarize industry reports|US"
    "Identify key trends|US"
    # ... add more tasks as needed
  )

  echo "Submitting ${#tasks[@]} tasks..."

  # Submit all tasks in background (no need to limit concurrency)
  for task_data in "${tasks[@]}"; do
    IFS='|' read -r prompt country <<< "$task_data"
    submit_task "$prompt" "$country"
  done

  # Wait for all submissions to complete
  wait

  echo "All tasks submitted successfully"
  ```
</CodeGroup>

### Pattern 2: Concurrent workers

For real-time processing where you want immediate results, run multiple workers that make direct API calls:

<CodeGroup>
  ```javascript Node.js (axios) theme={null}
  import axios from "axios";

  const API_KEY = process.env.API_KEY;
  const API_URL = "https://api.cloro.dev/v1/monitor/chatgpt";

  async function makeRequest(id, prompt) {
    const start = Date.now();

    try {
      const response = await axios.post(API_URL, {
        prompt: prompt,
        country: "US",
      }, {
        headers: {
          Authorization: `Bearer ${API_KEY}`
        }
      });

      const latency = Date.now() - start;
      console.log(`Request #${id}: Success (${latency}ms)`);

      // Monitor concurrency usage
      const limit = parseInt(response.headers["x-concurrent-limit"]);
      const current = parseInt(response.headers["x-concurrent-current"]);
      const remaining = parseInt(response.headers["x-concurrent-remaining"]);

      return {
        success: true,
        latency,
        data: response.data,
        usage: { limit, current, remaining }
      };

    } catch (error) {
      const latency = Date.now() - start;
      console.log(`Request #${id}: Failed (${latency}ms)`);

      if (error.response?.status === 429) {
        console.log(`Rate limited - ${error.response.headers["retry-after"] || "unknown"} seconds`);
      }

      return { success: false, latency, error: error.message };
    }
  }

  async function runConcurrentRequests(prompts, concurrency = 10) {
    console.log(`Starting ${prompts.length} requests with ${concurrency} concurrent workers\n`);

    const startTime = Date.now();
    const results = [];
    let requestId = 0;

    // Worker function
    async function worker() {
      while (requestId < prompts.length) {
        const id = ++requestId;
        const result = await makeRequest(id, prompts[requestId - 1]);
        results.push(result);
      }
    }

    // Run concurrent workers
    await Promise.all(
      Array(concurrency).fill(0).map(() => worker())
    );

    const duration = Date.now() - startTime;
    const successful = results.filter(r => r.success).length;
    const rateLimited = results.filter(r => r.error?.includes('Rate limited')).length;

    console.log("\n" + "=".repeat(40));
    console.log(`Total: ${prompts.length}`);
    console.log(`Success: ${successful} (${((successful/prompts.length)*100).toFixed(1)}%)`);
    console.log(`Rate limited: ${rateLimited}`);
    console.log(`Duration: ${(duration/1000).toFixed(1)}s`);
    console.log(`RPS: ${(prompts.length/duration*1000).toFixed(1)}`);
    console.log("=".repeat(40));

    return results;
  }

  // Usage
  const prompts = [
    "What is AI and how does it work?",
    "Explain machine learning basics",
    "What are neural networks?",
    "How does deep learning work?",
    "What is natural language processing?",
    // ... add more prompts as needed
  ];

  runConcurrentRequests(prompts, 5) // Start with conservative concurrency
    .then(results => console.log(`Completed processing`))
    .catch(console.error);
  ```

  ```python Python (requests + asyncio) theme={null}
  import asyncio
  import aiohttp
  import time
  from typing import List, Dict, Any

  class ConcurrentWorker:
      def __init__(self, api_key: str, api_url: str):
          self.api_key = api_key
          self.api_url = api_url

      async def make_request(self, session, id: int, prompt: str):
          start_time = time.time()

          try:
              async with session.post(
                  self.api_url,
                  json={
                      "prompt": prompt,
                      "country": "US"
                  },
                  headers={
                      "Authorization": f"Bearer {self.api_key}"
                  }
              ) as response:
                  data = await response.json()
                  latency = (time.time() - start_time) * 1000

                  # Monitor concurrency usage
                  limit = int(response.headers.get("x-concurrent-limit", 0))
                  current = int(response.headers.get("x-concurrent-current", 0))
                  remaining = int(response.headers.get("x-concurrent-remaining", 0))

                  print(f"Request #{id}: Success ({latency:.0f}ms)")

                  return {
                      "success": True,
                      "latency": latency,
                      "data": data,
                      "usage": {"limit": limit, "current": current, "remaining": remaining}
                  }

          except Exception as e:
              latency = (time.time() - start_time) * 1000
              print(f"Request #{id}: Failed ({latency:.0f}ms)")

              return {
                  "success": False,
                  "latency": latency,
                  "error": str(e)
              }

      async def run_concurrent_requests(self, prompts: List[str], concurrency: int = 10):
          print(f"Starting {len(prompts)} requests with {concurrency} concurrent workers\n")

          start_time = time.time()
          results = []
          request_id = 0

          async with aiohttp.ClientSession() as session:
              async def worker():
                  nonlocal request_id
                  while request_id < len(prompts):
                      id = request_id + 1
                      result = await self.make_request(session, id, prompts[request_id])
                      results.append(result)
                      request_id += 1

              # Run concurrent workers
              await asyncio.gather(*[worker() for _ in range(concurrency)])

          duration = time.time() - start_time
          successful = sum(1 for r in results if r["success"])
          rate_limited = sum(1 for r in results if "rate limited" in r.get("error", "").lower())

          print("\n" + "=" * 40)
          print(f"Total: {len(prompts)}")
          print(f"Success: {successful} ({(successful/len(prompts)*100):.1f}%)")
          print(f"Rate limited: {rate_limited}")
          print(f"Duration: {duration:.1f}s")
          print(f"RPS: {len(prompts)/duration:.1f}")
          print("=" * 40)

          return results

  # Usage
  async def main():
      worker = ConcurrentWorker("YOUR_API_KEY", "https://api.cloro.dev/v1/monitor/chatgpt")

      prompts = [
          "What is AI and how does it work?",
          "Explain machine learning basics",
          "What are neural networks?",
          # ... add more prompts as needed
      ]

      results = await worker.run_concurrent_requests(prompts, concurrency=5)
      print(f"Completed processing")

  if __name__ == "__main__":
      asyncio.run(main())
  ```

  ```bash cURL theme={null}
  #!/bin/bash

  API_KEY="YOUR_API_KEY"
  API_URL="https://api.cloro.dev/v1/monitor/chatgpt"
  TOTAL_REQUESTS=10
  CONCURRENCY=5

  # Function to make a single request
  make_request() {
    local id=$1
    local prompt=$2
    local start_time=$(date +%s%3N)

    response=$(curl -s -w "\n%{http_code}\n%{time_total}\n%{x-concurrent-limit}\n%{x-concurrent-current}\n%{x-concurrent-remaining}" \
      -X POST "$API_URL" \
      -H "Authorization: Bearer $API_KEY" \
      -H "Content-Type: application/json" \
      -d "{\"prompt\":\"$prompt\",\"country\":\"US\"}" \
      2>/dev/null)

    # Parse response
    http_code=$(echo "$response" | sed -n '2p')
    time_total=$(echo "$response" | sed -n '3p')
    limit=$(echo "$response" | sed -n '4p')
    current=$(echo "$response" | sed -n '5p')
    remaining=$(echo "$response" | sed -n '6p')
    body=$(echo "$response" | sed -n '1p')

    latency=$(echo "$time_total * 1000" | bc)

    if [[ "$http_code" == "200" ]]; then
      echo "Request #$id: Success (${latency%.*}ms) - Usage: $current/$limit ($remaining available)"
    else
      echo "Request #$id: Failed (${latency%.*}ms) - HTTP $http_code"
    fi
  }

  # Worker function that processes multiple requests
  worker() {
    local worker_id=$1
    local max_requests=$2

    while true; do
      # Get next request ID from shared counter
      id=$(wget -qO- "http://localhost:8080/next-id" 2>/dev/null || echo "0")

      if [[ $id -gt $max_requests ]]; then
        break
      fi

      prompt="Query $id: What are the benefits of concurrent processing?"
      make_request "$id" "$prompt"
    done
  }

  # Simple ID counter using files (you could use Redis or a database)
  echo "1" > /tmp/next_id

  # Start workers
  for ((i=1; i<=CONCURRENCY; i++)); do
    worker $i $TOTAL_REQUESTS &
  done

  # Wait for all workers to complete
  wait

  echo -e "\n$(printf '=%.0s' {1..40})"
  echo "Load test completed"

  # Cleanup
  rm -f /tmp/next_id
  ```
</CodeGroup>

## Quick reference

| Use case          | Pattern            | When to use                                 |
| ----------------- | ------------------ | ------------------------------------------- |
| Large batches     | Async + webhooks   | Large batches, don't need immediate results |
| Real-time results | Concurrent workers | Need immediate responses, smaller batches   |

## Common questions

### Why am I getting 429 rate limit errors?

A 429 error means you're hitting rate limits. This can happen for two reasons:

**Concurrency limit exceeded** (monitor endpoints only)

You're making too many simultaneous requests beyond your plan's concurrent request limit.

Solution:

* Check your current usage with response headers: `X-Concurrent-Limit`, `X-Concurrent-Current`, `X-Concurrent-Remaining`
* Implement request queuing in your application
* Use exponential backoff when retrying
* [Upgrade your plan](https://cloro.dev/#pricing) for higher limits

**Rate limit exceeded** (all endpoints)

You've exceeded 1,000 requests per second to a single endpoint.

Solution:

* Monitor `X-RateLimit-Remaining` header
* Spread requests over time (the counter resets every second)
* Use the [async queue](/api-reference/endpoint/monitor-chatgpt#async-requests) for non-time-sensitive requests
* Implement retry logic with exponential backoff

See the [error handling guide](/guides/error-handling#rate-limiting-429) for detailed error responses and implementation patterns.

### How do I check my concurrency limit?

Your concurrency limit is shown in the response headers of every API call:

```
X-Concurrent-Limit: 20        # Your total limit
X-Concurrent-Current: 5       # Currently in use
X-Concurrent-Remaining: 15    # Available slots
```

You can also use the [async status endpoint](/api-reference/endpoint/get-async-status) to see concurrency stats of your account.

### Can I increase my concurrency limit?

Yes. Self-serve plans can be upgraded directly in the [dashboard](https://dashboard.cloro.dev) — the new limit applies immediately, no support ticket required. If you need concurrency above the highest self-serve tier, email [support@cloro.dev](mailto:support@cloro.dev) for an enterprise quote.

### Does higher concurrency delay my logs or dashboards?

No. Dashboard log ingestion runs independently from request processing. If logs look delayed during heavy load, the cause is usually batching on the dashboard side, not concurrency — entries normally surface within a minute.

### Can I burst above my concurrency limit?

No. The limit is hard — the (N+1)th simultaneous request gets a `429` immediately rather than queueing. Use the [async API](/guides/making-requests/async) if you want cloro to handle queueing for you instead of managing burst capacity yourself.

### What's the best way to handle large batches of requests?

For large batches, choose the right pattern based on your needs:

For non-time-sensitive batches (recommended):

* Use [Pattern 1: Async with webhooks](#pattern-1-async-with-webhooks)
* Send all API requests concurrently. cloro handles queuing automatically
* Receive results via webhook when complete
* No need to manage concurrency yourself

For real-time results:

* Use [Pattern 2: Concurrent workers](#pattern-2-concurrent-workers)
* Respect your plan's concurrency limit
* Monitor `X-Concurrent-Remaining` header
* Implement exponential backoff for 429 errors
