500x Faster: Why We Migrated from EFS to FSx for Our Kubernetes Workloads

Our multi-tenant Mautic platform was slow. Not broken - slow. Deployments dragged. The UI felt sluggish. Builds took longer than they should. Every operation had a vague, frustrating delay that we couldn't immediately pin on any single component.

After weeks of profiling, the culprit turned out to be the most boring part of our infrastructure: the shared filesystem. Specifically, Amazon EFS - the managed NFS service that's the default choice for shared storage on AWS Kubernetes clusters.

Migrating from EFS to Amazon FSx took our read/write throughput from approximately 30 KB/s to 15 MB/s. That's roughly a 500x improvement. This post explains what happened, why EFS broke at our scale, and how to decide which storage option is right for your Kubernetes workloads.

Why We Needed Shared Storage in the First Place

Our platform runs multiple Mautic instances on Kubernetes, each serving a different tenant. Every instance runs several pods - Apache web servers, PHP-FPM workers - that need access to the same set of files.

Two categories of files need to be shared across pods:

Media files. Every Mautic instance stores uploaded images, email assets, landing page resources, and other media. When a marketing user uploads an image through the Mautic UI (which hits one pod), that image needs to be visible immediately to all other pods serving that instance.

Cache files. Mautic generates various cache artifacts - compiled templates, configuration caches, metadata files. These must be consistent across all pods. If pod A generates a cache file that pod B can't see, you get inconsistent behavior and hard-to-debug errors.

In Kubernetes, this pattern requires a ReadWriteMany (RWX) volume - a persistent volume that multiple pods can mount simultaneously for both reads and writes. On AWS, the two primary options for RWX volumes are EFS and FSx.

We started with EFS. It's the path of least resistance.

The EFS Experience: It Just Works - Until It Doesn't

Amazon Elastic File System is the go-to choice for shared storage on AWS. It's fully managed, scales automatically, and integrates cleanly with EKS through the EFS CSI driver. Setup is straightforward: create a filesystem, configure a storage class, and mount it into your pods. Done.

At small scale - a few tenant instances, a modest number of files - EFS performed acceptably. Response times were fine, deployments completed in reasonable timeframes, and the platform felt snappy.

Then we grew. More tenants. More media uploads. More cache files. The filesystem accumulated roughly 500,000 files across all tenant instances.

And everything slowed down.

The Breaking Point: 30 KB/s

When we finally instrumented the filesystem to measure actual throughput, the numbers were startling. EFS was delivering approximately 30 KB/s for our read and write operations. Not 30 MB/s. Thirty kilobytes per second.

To put that in perspective: a floppy disk from 1995 had a transfer rate of around 60 KB/s. Our cloud-native, fully managed, elastic filesystem was performing at half the speed of a floppy drive.

The symptoms were everywhere:

Deployments were painfully slow. Copying files, clearing caches, running build steps - every operation that touched the filesystem became a bottleneck.
Application responsiveness suffered. Mautic's PHP code reads and writes files constantly - configuration, templates, media. Every file operation was throttled by EFS.
Builds took too long. CI/CD pipelines that involved file operations on the shared volume were the slowest step by a wide margin.

The platform wasn't broken. Everything technically worked. It was just unbearably, inexplicably slow.

The Technical Explanation: Why EFS Struggles with Many Small Files

EFS is built on the NFSv4 protocol (Network File System version 4). NFS is a mature, well-understood protocol for sharing files across a network, but it has characteristics that make it a poor fit for workloads with many small files.

Protocol overhead per operation. Every file operation - open, read, write, close, stat - involves a network round trip with NFS protocol overhead. For large sequential reads (reading a 1GB video file), this overhead is negligible. For hundreds of thousands of small files (PHP source files, cached metadata, configuration snippets), the per-operation overhead dominates.

Metadata-heavy workloads. When your application does a lot of stat() calls, directory listings, or file existence checks - which PHP applications do constantly - each one becomes a network request through the NFS layer. With 500,000 files, even listing a directory's contents becomes a significant operation.

Not an IOPS problem in the traditional sense. EFS is designed to scale throughput with data size. The more data you store, the more throughput you get. But our data was small - hundreds of thousands of tiny files. We had plenty of aggregate data to qualify for higher throughput tiers, but the per-operation latency was the bottleneck, not the bandwidth.

In short: EFS is optimized for workloads that read and write large files sequentially. A PHP application with half a million small files is the opposite of that workload. For our use case, EFS was practically unusable.

The Migration to FSx

Amazon FSx for Lustre (and FSx for OpenZFS, depending on your needs) operates at a fundamentally different level than EFS. Where EFS wraps file operations in the NFS protocol, FSx provides storage with significantly less protocol overhead.

As one of our engineers put it: FSx is "lower level, without the additional protocol overhead" that makes EFS slow for this type of workload.

After migrating our shared volumes from EFS to FSx, our measured throughput jumped to approximately 15 MB/s for the same workload. Same files. Same access patterns. Same Kubernetes cluster. The only change was the underlying storage service.

The impact was immediate and dramatic.

What Changed After the Migration

Deployments became fast. Operations that previously took minutes completed in seconds. Cache clears, file copies, build steps - everything that touched the shared filesystem was suddenly responsive.

Application performance improved across the board. Mautic's UI became snappy. Page loads dropped. API responses tightened. The vague "everything is slow" feeling disappeared overnight.

Build times shrank. CI/CD pipelines that previously bottlenecked on filesystem operations now completed in a fraction of the time.

The team's description of the change was colorful but accurate: the platform went from sluggish to screaming fast. A single infrastructure change - invisible to end users - transformed the entire platform experience.

The Cost Trade-Off

FSx is more expensive than EFS per gigabyte. That's the trade-off, and it's worth acknowledging directly.

But here's how we think about it: the performance difference between EFS and FSx for our workload isn't incremental - it's transformational. We're not talking about a 20% improvement that might or might not justify the cost. We're talking about 500x.

At that magnitude, the cost calculation shifts:

Developer time saved. Faster deployments and builds mean less waiting, more shipping.
Operational efficiency. Fewer support tickets about "slow platform" performance.
User experience. Tenants on the platform experience a responsive application, which reduces churn and increases satisfaction.
Total cost of ownership. Slow infrastructure doesn't just cost money in hosting - it costs money in developer productivity, customer satisfaction, and operational overhead.

The incremental cost of FSx over EFS is a fraction of what we'd lose to the slowdowns.

When to Choose EFS vs. FSx: A Decision Framework

Not every workload needs FSx. Here's how we'd guide the decision:

Choose EFS when:

Your files are large and accessed sequentially. Video processing, large dataset analysis, log storage - workloads where you read big files from start to finish.
Your file count is low. Hundreds or a few thousand files, not hundreds of thousands.
Cost is the primary constraint. EFS is cheaper, and for workloads it handles well, the performance is adequate.
You need simplicity. EFS requires almost zero configuration. For non-performance-critical workloads, it's the easiest path.

Choose FSx when:

You have many small files. CMS platforms, marketing automation tools, PHP applications, WordPress, Drupal - any application that generates and reads thousands of small files.
Your workload is metadata-heavy. Lots of file existence checks, directory listings, stat operations.
Performance is critical. If filesystem speed directly affects user experience or deployment velocity.
You're running Kubernetes workloads that share storage across pods. The RWX pattern with many small files is exactly where FSx shines and EFS struggles.

Our recommendation:

If your Kubernetes workload involves a PHP application, a CMS, or any software that generates many small files - benchmark FSx early. Don't wait until performance degrades. Run a side-by-side test with your actual file patterns before committing to a storage solution.

Lessons Learned

Don't assume managed services perform equally. EFS and FSx are both "managed shared filesystems on AWS," but they have fundamentally different performance characteristics. The service category is the same; the behavior under load is not.

File count matters more than data size for NFS-based storage. We had a modest amount of total data (not terabytes), but the sheer number of files - 500,000 - was what killed performance. If you're evaluating storage, benchmark with your expected file count, not just your expected data volume.

Storage performance problems are insidious. They don't show up as errors. They don't trigger alerts (unless you're specifically monitoring filesystem latency). They make everything slightly slower - deployments, UI, builds, developer experience - without an obvious smoking gun. When "everything is slow but nothing is broken," check your filesystem.

Benchmark with realistic workloads before production. We would have caught this issue months earlier if we'd run EFS benchmarks with our actual file patterns instead of trusting the default recommendation. Fifteen minutes of benchmarking could have saved weeks of troubleshooting.

The Bottom Line

A storage migration gave us a 500x performance improvement. Not a code optimization, not a caching layer, not a CDN - a simple infrastructure swap from one AWS service to another.

EFS is a good service. It's just not the right service for every workload. If you're running Kubernetes with shared volumes and experiencing unexplained slowness, the filesystem is worth investigating.

[Reach out to our team](https://www.droptica.com/contact/) if you'd like help evaluating storage options for your Kubernetes infrastructure. We've been through the benchmarks, the migration, and the "why is everything slow" debugging - and we can help you skip straight to the fast part.

Why We Needed Shared Storage in the First Place

The EFS Experience: It Just Works - Until It Doesn't

The Breaking Point: 30 KB/s

The Technical Explanation: Why EFS Struggles with Many Small Files

The Migration to FSx

What Changed After the Migration

The Cost Trade-Off

When to Choose EFS vs. FSx: A Decision Framework

Choose EFS when:

Choose FSx when:

Our recommendation:

Lessons Learned

The Bottom Line

Tags

Mautomic Team

More Articles

Multi-Tenant vs Multi-Instance: Why We Chose Isolated Mautic Installations Over a Shared Database

From 15 Repositories to 1: Consolidating a Multi-Vendor Python Codebase

Monitoring 671 Metrics Across a Multi-Tenant Platform: What We Track and Why

Need Help with Mautic?