Tip of the Day: Build an AI Task Queue for Reliable Automation

TL;DR: AI agents fail more than raw API error rates suggest β€” rate limits, context timeouts, and transient LLM errors compound. A lightweight task queue with retry logic, dead-letter handling, and rate-limit awareness transforms an unreliable agent into one you can trust with production workloads. Here's how to build one in under 50 lines of code.

Why Your AI Agent Keeps Dropping Tasks

You've probably noticed it: your AI agent works fine in testing but starts dropping tasks as soon as you walk away. A query here, a timeout there β€” nothing catastrophic, but the accumulated failures mean you can't trust it with anything important.

The root cause isn't the LLM. It's the lack of a task queue.

Standard async queues (Bull, Celery, Sidekiq) assume predictable workloads: a database query either succeeds or fails fast, and retries are simple. AI operations are different. If you're still building your first agent, start with the ReAct pattern guide for local agents β€” queues matter once your agent leaves prototyping.

  • LLM APIs have burst rate limits that reset on unpredictable schedules
  • A single query can take 2–60+ seconds depending on model load
  • Transient errors (429, 503, timeout) are common at peak hours
  • Context windows mean retries aren't idempotent β€” you can't just replay the same request

A dedicated AI task queue addresses all of these. Here's the practical pattern.

The Three-Queue Pattern

The simplest production-ready setup uses three queues with different characteristics.

Queue Purpose Retry Policy Concurrency
Fast Simple queries, embeddings, classifications 3 retries, 1s delay 5 concurrent
Heavy Multi-step reasoning, code generation, long context 2 retries, 10s delay, exponential backoff 2 concurrent
Dead Letter Failed-after-retry tasks for human review No retries β€” manual inspection N/A

Separating fast and heavy work prevents a complex code generation task from blocking hundreds of quick embedding calls. The dead letter queue ensures no task is silently lost β€” even failures are captured for review.

Building a Minimal AI Task Queue

You don't need a heavy framework. A Node.js queue using better-queue or Python's asyncio.Queue with tenacity for retries takes about 40 lines. The key components are:

class="language-python">import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential

class AIQueue: def init(self, max_concurrent=3): self.fast = asyncio.Queue() self.heavy = asyncio.Queue() self.dead_letter = [] self.sem = asyncio.Semaphore(max_concurrent)

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=2, min=2, max=30)) async def call_llm(self, prompt, model=β€œfast”):

Your LLM API call here

pass

async def worker(self): while True: task = await self.fast.get() async with self.sem: try: result = await self.call_llm(task.prompt) await task.callback(result) except Exception as e: self.dead_letter.append({β€œtask”: task, β€œerror”: str(e)})

The semaphore limits concurrent LLM calls, the tenacity retry decorator handles transient errors with exponential backoff, and the dead letter list captures everything that fails permanently.

Rate Limit Awareness

The single biggest cause of AI task failure is hitting API rate limits. Most providers return a 429 status with a Retry-After header. Your queue must respect this.

class="language-python">async def rate_limited_call(self, prompt):
 while True:
 resp = await self.raw_api_call(prompt)
 if resp.status == 429:
 wait = int(resp.headers.get("Retry-After", 5))
 await asyncio.sleep(wait)
 continue
 return await resp.json()

This pattern ensures your queue slows down automatically when the API is under load, rather than burning retries and making the problem worse.

When to Level Up

The minimal approach above handles most personal and small-team use cases. You should consider a full queue system (Bull, Celery, or a managed service) when:

  • You need persistent queues that survive process restarts
  • You have multiple workers across different machines
  • Your task throughput exceeds 100 calls/minute
  • You need scheduled/delayed task execution

Bull with Redis persistence handles all of these and adds built-in rate limiting, job scheduling, and a dashboard for monitoring. Pair it with prompt caching to slash LLM costs while your queue keeps everything running reliably.

Frequently Asked Questions

Why not just use a simple retry loop?

A retry loop handles transient errors but doesn't manage concurrency, respect rate limits, or separate fast from heavy tasks. A queue adds structure that prevents cascading failures.

What should go in the dead letter queue?

Any task that fails after exhausting all retries. The key is to store the full context β€” the original prompt, all retry attempts, error messages, and timestamps β€” so a human can inspect and re-queue if needed.

Does every agent need a task queue?

No. If your agent makes fewer than 10 API calls per session and you don't care about a few dropped tasks, a simple retry wrapper is sufficient. Add a queue when you need reliability guarantees.

How do I handle idempotency in retries?

This is the hardest part. For read-only operations, retries are safe. For writes, include an idempotency key in your request and check it on the receiving end. Most LLM providers now support idempotency headers.

Should failed tasks be re-queued automatically?

For transient failures (rate limits, timeouts), yes β€” with exponential backoff. For logic errors (bad prompts, exceeding token limits), no β€” these need human intervention to fix the root cause.

← Back to all posts