How We Sent 1 Million Emails in 25 Minutes: Engineering a Two-Stage Queue Pipeline

Sending a million personalized emails sounds like a problem you throw money at - more servers, bigger instances, higher API rate limits. But brute force doesn't scale economically when you're running a multi-tenant platform where hundreds of organizations need to send campaigns simultaneously.

We needed a smarter approach. So we re-engineered the entire email sending pipeline from the ground up, replacing Mautic's default one-by-one sending with a two-stage queue architecture that dispatches 1 million personalized emails in 25 minutes. That's roughly 40,000 emails per minute - and the architecture is designed to scale further.

Here's how we built it.

Why Default Mautic Email Sending Doesn't Scale

Out of the box, Mautic processes email campaigns contact by contact. For each recipient, it:

Loads the contact's data from the database.
Resolves all personalization tokens (name, company, branch-specific content).
Renders the full email body with personalized content.
Makes an API call to the email service provider to send that single email.

For a campaign of 1,000 contacts, this is fine. For 100,000 contacts, it's slow. For 1,000,000 contacts, it's impossible - a million sequential API calls would take days.

The bottleneck isn't just the API calls. It's the CPU-bound personalization happening on our infrastructure for every single contact. All that rendering work - token replacement, dynamic content blocks, conditional sections - runs on our servers before the email even reaches the sending provider. At scale, this approach doesn't just hit a wall; it creates one.

The Key Insight: Let Mailgun Do the Personalization

The breakthrough came from rethinking where personalization happens. Instead of rendering a complete personalized email for each contact on our infrastructure, we send Mailgun a template with placeholder tokens plus a list of up to 1,000 recipients, each with their own token values.

Mailgun handles the per-recipient personalization at send time. One API call sends 1,000 individually personalized emails.

The math changes dramatically:

| Approach | API Calls for 1M Emails | |---|---| | Default (one-by-one) | 1,000,000 | | Batch API (1,000 per call) | 1,000 |

That's a 1,000x reduction in API calls. But the API calls are only part of the story - we also needed to prepare those batches efficiently. Fetching contact data, resolving tokens, and building payloads for a million contacts still requires serious engineering.

Two-Stage Queue Architecture: The Engine

The core innovation is splitting the email sending process into two distinct stages, each with its own SQS queue and worker pool. This separation is what makes the system fast, resilient, and scalable.

Stage 1: Segment Scanning (Seconds)

When a campaign launches, Stage 1 runs immediately:

The system fetches the full list of contact IDs matching the campaign's segment criteria.
It splits these IDs into lightweight batches - just arrays of contact IDs, no full data.
These minimal batches are pushed to the first SQS queue.

The critical property of Stage 1 is speed. Because it only deals with contact IDs (not full contact records), it completes in seconds - even for segments containing millions of contacts. The campaign operator sees immediate feedback: "Your campaign is sending." No spinning wheels, no minutes-long wait for the system to "prepare."

This speed matters for user experience, but it also matters architecturally. By completing Stage 1 quickly, we free up the originating process and let the heavy work happen asynchronously in Stage 2.

Stage 2: Payload Assembly (Parallel)

Stage 2 is where the real work happens, distributed across many parallel workers:

A worker pulls a batch of contact IDs from the first queue.
It loads the full contact data for those IDs from the database - names, email addresses, personalization token values, branch-specific content.
It builds the Mailgun payload: the email template with token placeholders plus the recipient list with each contact's token values.
The assembled payload is pushed to a second queue for dispatch to the Mailgun API.

Each worker operates independently, processing its batch without coordination with other workers. This independence is what enables massive parallelism.

Why Two Stages Instead of One?

You might wonder: why not just create the full Mailgun payloads directly in Stage 1? Three reasons:

Speed of feedback. Stage 1 finishes in seconds because it only shuffles IDs. If Stage 1 also had to load contact data and build payloads, the campaign operator would wait minutes before seeing any progress.

Resilience. If a Stage 2 worker fails (node dies, spot instance reclaimed, out-of-memory error), only the batch it was processing needs to be retried. The contact IDs are already safely in the queue. Nothing is lost, and the retry is automatic via SQS visibility timeout.

Parallelism. Stage 2 scales horizontally. Need more throughput? Add more workers. Each worker is stateless and pulls independently from the queue. During our stress test, we ran approximately 52 workers in parallel - and the architecture supports adding more.

52 Parallel Workers: The Throughput Machine

During our 1-million email stress test, the system ran with approximately 52 parallel Symfony Messenger consumers processing batches simultaneously.

Each consumer processes roughly one batch per minute - that's 1,000 emails dispatched per minute per worker. With 52 workers running in parallel, the aggregate throughput reaches approximately 40,000 emails per minute.

In normal operation, each tenant instance runs 7 default consumers. During peak campaign periods, the system scales workers up based on queue depth. The scaling is linear and predictable: double the workers, double the throughput. There's no shared state that becomes a bottleneck (with one notable exception we'll get to).

The workers run on Kubernetes pods scheduled across our worker node group. Because they consume from SQS queues and process independently, they're ideal for running on spot instances - if AWS reclaims a node, the workers on that node shut down gracefully, their in-progress messages return to the queue, and other workers pick them up.

The Stress Test: How We Validated the Numbers

We didn't estimate these numbers - we measured them. The stress test setup:

Synthetic contacts generated via CLI. We created a realistic dataset with all the fields our personalization engine uses - names, branch affiliations, contact details, custom properties.
A single campaign segment containing all contacts. This forced the system to process the full million in one campaign.
All user tokens included in payloads. Every personalization field was populated and resolved, exercising the complete rendering pipeline.
Full infrastructure under load. Kubernetes autoscaling, SQS queues, S3 payload offloading, Mailgun API - everything running at peak.

The result: 1,000,000 emails dispatched in 25 minutes. Stage 1 completed in seconds. Stage 2 workers ran for the remaining time, processing batches in parallel until the queues were drained.

The Deduplication Challenge: One of the Hardest Problems

Here's a problem that doesn't appear in architecture diagrams but caused significant engineering effort: email deduplication.

In many organizations, the same physical person exists as multiple contacts - they might be registered under different branches, departments, or subsidiaries. When a campaign targets all these contacts, the same email address appears multiple times.

Sending duplicate emails to the same address is bad for several reasons:

Recipient experience. Nobody wants three identical emails.
Deliverability impact. Email providers penalize senders who send duplicates.
Reputation damage. Your sender score drops, affecting all future campaigns.

The engineering challenge: how do you deduplicate across 52 parallel workers, each processing batches independently, without creating a bottleneck?

A naive approach - a central deduplication lock - would serialize all workers through a single point of contention, destroying the parallelism we worked so hard to build. A database-based approach would add latency to every batch.

We implemented a real-time deduplication mechanism that ensures only one physical email is sent per unique address within a campaign, even when the same address appears across multiple batches processed by different workers simultaneously. Building this without a central lock and without introducing sequential processing was one of the most nuanced engineering challenges in the entire project.

Queue Priority: Keeping the Pipeline Flowing

When a large campaign launches, the system generates thousands of queue messages. Stage 1 floods the first queue with contact ID batches. Stage 2 workers start pulling from that queue and producing assembled payloads for the second queue. Meanwhile, the dispatch workers are sending payloads to Mailgun.

Without priority management, a subtle problem emerges: later-stage messages (fully assembled, ready-to-send payloads) can get stuck behind earlier-stage messages (contact ID batches still waiting for assembly). The pipeline backs up at the wrong end.

We implemented queue priority rules ensuring that assembled email batches are always dispatched first. This keeps the pipeline flowing smoothly: dispatch workers always have work when payloads are ready, rather than waiting for more batches to be assembled.

This matters most when multiple campaigns launch simultaneously - a common scenario on a multi-tenant platform where different organizations trigger campaigns at overlapping times.

Campaign Scheduling and Continuation

Large campaigns that span millions of contacts may take tens of minutes to fully process. During that time, a lot can happen: a worker pod restarts, a spot instance is reclaimed, a deployment rolls through the cluster.

We built campaign continuation logic that tracks the progress of each campaign through the pipeline. If processing is interrupted - for any reason - the system picks up where it left off. Operators never need to manually restart a campaign or worry about partially-sent campaigns leaving some contacts unsent.

This continuation logic also handles campaign scheduling: campaigns can be queued for future send times, and the system automatically transitions them through the pipeline when the scheduled time arrives.

Performance Results

| Metric | Result | |---|---| | Emails sent | 1,000,000 | | Time | 25 minutes | | Throughput | ~40,000 emails/minute | | Parallel workers | ~52 | | Batch size | 1,000 per Mailgun API call | | Stage 1 duration | Seconds | | Default consumers per instance | 7 | | Consumer throughput | ~1 batch/minute per worker |

What This Architecture Enables

The two-stage pipeline with batch API integration isn't just about sending one million emails fast. It's about building a foundation that scales predictably:

Horizontal scaling. Add workers for more throughput. The relationship is linear.
Multi-tenant isolation. Each tenant's campaigns flow through the same pipeline independently. One tenant's large campaign doesn't block another's.
Cost efficiency. Workers run on spot instances (50-65% savings). The batch API reduces API call volume by 1,000x. SQS is pay-per-message.
Resilience. Any component can fail and the system recovers automatically. Messages retry. Workers are stateless. The pipeline continues.

The platform is designed to support 300-500 tenants sending campaigns simultaneously. The architecture patterns - two-stage queuing, batch APIs, parallel stateless workers - apply to any high-volume processing system, not just email.

Key Takeaways

Batch APIs change the economics. Going from 1 API call per email to 1 API call per 1,000 emails is a 1,000x improvement in API efficiency.

Split your pipeline stages. Fast initial scanning (seconds) plus parallel heavy processing gives you both user responsiveness and throughput.

Parallelism must be truly parallel. If your workers share a lock, a database table, or any coordination point, you've created a bottleneck. Design for independence.

Deduplication at scale is genuinely hard. It's easy to solve for sequential processing; it's hard to solve across dozens of parallel workers without killing performance.

Queue priority prevents pipeline stalls. Ensure later-stage ready-to-send messages aren't blocked by earlier-stage preparation messages.

Test with realistic load. Our stress test with synthetic contacts and full personalization pipelines gave us confidence in numbers - not estimates.

Ready to Scale Your Email Infrastructure?

If you're sending high-volume email campaigns and hitting throughput limits with your current platform, we can help. We've engineered a pipeline that sends 40,000 personalized emails per minute - and the architecture scales from there.

[Book a free consultation](https://www.droptica.com/contact/) to discuss your email performance and infrastructure goals.

Why Default Mautic Email Sending Doesn't Scale

The Key Insight: Let Mailgun Do the Personalization

Two-Stage Queue Architecture: The Engine

Stage 1: Segment Scanning (Seconds)

Stage 2: Payload Assembly (Parallel)

Why Two Stages Instead of One?

52 Parallel Workers: The Throughput Machine

The Stress Test: How We Validated the Numbers

The Deduplication Challenge: One of the Hardest Problems

Queue Priority: Keeping the Pipeline Flowing

Campaign Scheduling and Continuation

Performance Results

What This Architecture Enables

Key Takeaways

Ready to Scale Your Email Infrastructure?

Tags

Mautomic Team

More Articles

Multi-Tenant vs Multi-Instance: Why We Chose Isolated Mautic Installations Over a Shared Database

From 15 Repositories to 1: Consolidating a Multi-Vendor Python Codebase

Monitoring 671 Metrics Across a Multi-Tenant Platform: What We Track and Why

Need Help with Mautic?