The Silent Killer: How gp3 EBS Volumes Cost Us $3,100/Month Without Anyone Noticing
"Why are our EBS costs suddenly so high?" That was the email that landed in my inbox, kicking off a hunt for an insidious, often overlooked cloud expense.
The Mystery of the Inflated EBS Bill
As a platform engineer, I've seen my share of unexpected cloud bills. Usually, it's a runaway EC2 instance, an unoptimized S3 bucket, or a forgotten database. But this time, the culprit was something far more subtle: AWS EBS volumes.
Our latest monthly bill showed an alarming spike in storage costs. A quick glance revealed the bulk of the increase was attributed to Elastic Block Store (EBS). We'd been diligent about our EC2 rightsizing, implemented S3 lifecycle policies, and optimized our databases. EBS, however, usually just sat there, predictably consuming its allocated share.
Digging deeper, we uncovered the surprising truth: a significant portion of our EBS volumes were provisioned as gp3
(General Purpose SSD), even though their actual workloads only required the more cost-effective st1
(Throughput Optimized HDD). This seemingly minor detail was inflating our storage costs by a staggering $3,100 every single month.
It turned out that new instances, especially those spun up by our development teams for quick tests or staging environments, often defaulted to gp3
. It's faster, yes, but for many of our internal applications—like log processing, backup archives, or batch data loading—the raw IOPS of an SSD were completely unnecessary. We were paying for a sports car when a sturdy pickup truck would have done the job just fine, and cheaper.


The Manual Migration Headache
Our first instinct was to launch a manual audit. "Let's find all the gp3
volumes, check their purpose, and switch them to st1
where appropriate," I declared. What followed was a week of escalating frustration.
The scale of the problem quickly became apparent. We had hundreds of EBS volumes spread across multiple AWS accounts and regions. Many were attached to instances with vague naming conventions like "dev-server-temp" or "ml-experiment-worker." Tracing ownership and understanding the actual workload profile for each was a monumental task.
When we did identify potential candidates for st1
migration, we hit another roadblock: developer apprehension. "What if it's really critical for this application?" "I just need it to be fast, better safe than sorry!" The fear of introducing performance bottlenecks, even on non-critical systems, was real. Without concrete performance data to back up our recommendations, it was tough to convince anyone to make a change.
We managed to reconfigure a handful of volumes, saving a small fraction of the monthly overage. But for every one we fixed, it felt like new gp3
volumes were being provisioned elsewhere. It was like trying to empty a bathtub with a teaspoon while the tap was still running, full blast. The manual approach was not only inefficient but also unsustainable, leading to significant operational overhead and minimal impact on the rising costs.
The Data-Driven Revelation: Unmasking Actual Utilization
The turning point came when we shifted our focus from simply identifying gp3
volumes to understanding their actual performance utilization. It wasn't enough to know a volume was gp3
; we needed to know if it was truly using gp3
performance capabilities.
We began correlating provisioned storage types with real-time I/O patterns: IOPS (Input/Output Operations Per Second), throughput, and queue depth. Our hypothesis was simple: if a gp3
volume consistently showed low IOPS and throughput—metrics well within st1
's typical range—then it was a prime candidate for migration.
What we discovered was an eye-opener. Hundreds of gp3
volumes, provisioned for thousands of IOPS, were averaging less than 100 IOPS. Their throughput was minimal, often sitting idle for hours. These volumes were performing exactly like throughput-optimized HDDs, but at the premium price of SSDs.
It was clear: the problem wasn't just poor provisioning choices; it was a systemic issue rooted in default settings and a lack of real-time performance visibility. The "Aha!" moment solidified: we needed an automated, intelligent system that could continuously monitor, analyze, and recommend (or even execute) these optimizations based on actual usage patterns, not just provisioning assumptions.


EazyOps: Intelligent EBS Optimization
This is where EazyOps stepped in, offering precisely the kind of intelligent automation we desperately needed. EazyOps isn't just a monitoring tool; it's an autonomous optimization engine designed to rightsize cloud resources based on real-world performance metrics.
Here's how EazyOps tackled our gp3
problem:
- Continuous Performance Analysis: EazyOps continuously monitors EBS volume metrics like IOPS, throughput, and burst credit utilization. It doesn't just look at peak usage, but understands sustained patterns over time.
- Workload Baselining: By analyzing historical data, EazyOps automatically establishes performance baselines for different workloads. It can distinguish between a
gp3
volume that occasionally bursts (and genuinely needsgp3
) and one that consistently operates atst1
levels. - Intelligent Identification: The platform precisely identifies
gp3
volumes that are consistently underutilized and whose actual I/O profile perfectly aligns with the characteristics ofst1
HDDs.st1
is ideal for large, sequential I/O (like log processing, data warehousing, or media streaming), where its throughput capabilities shine without the higher cost of SSDs. - Automated Migration & Approval: EazyOps doesn't just recommend; it can automate the entire migration process. After presenting clear data-backed recommendations, it can, with approval, initiate the change from
gp3
tost1
, including snapshotting for safety and ensuring minimal (or zero) downtime during the process.
With EazyOps, we moved beyond reactive firefighting to proactive, data-driven optimization. It was no longer about hunting down individual misconfigured volumes, but letting an intelligent system manage our entire EBS fleet, ensuring every volume was precisely matched to its actual workload, not just its initial provisioning.
Quantifiable Impact: 65% Cost Reduction
The results of implementing EazyOps for EBS optimization were immediate and significant. Within just a few weeks, the platform had identified hundreds of misaligned gp3
volumes and, after our review and approval, seamlessly migrated them to st1
.
Cost Savings
- Overall EBS Cost Reduction: A remarkable 65%.
- Monthly Savings: $3,100 directly recovered.
- Annualized Savings: Over $37,000 in storage costs.
Operational Efficiency
- Reduced Manual Overhead: Engineers redirected hours from tedious audits to more strategic tasks.
- Eliminated "Fear of Change": Data-backed recommendations built trust, making approvals faster and easier.
- Proactive Management: New
gp3
volumes are now automatically flagged if they don't meet their performance profile, preventing future waste.
Performance & Reliability
- No Performance Degradation: Workloads continued to perform optimally, as migrations only occurred when actual usage matched
st1
capabilities. - Optimized Resource Utilization: Each volume now serves its purpose with the most cost-effective storage type.
- Improved Cloud Hygiene: A cleaner, more efficient cloud environment.
The $3,100/month overspend wasn't just a number; it represented resources that could now be invested in innovation, not idle infrastructure. This transformation proved that intelligent automation is not just about cost-cutting, but about enabling a more efficient and agile engineering organization.
Key Takeaways from Our EBS Journey
- Default Settings Are Cost Traps: Don't assume cloud provider defaults are optimized for your wallet. Always review and customize.
- Actual Utilization Trumps Provisioned Capacity: What you provision might be vastly different from what you actually use. Focus on real-time metrics for true optimization.
- Automation is Essential for Scale: Manual cloud cost optimization is a losing battle in dynamic environments. Intelligent automation is the only way to stay ahead.
- The Right Storage for the Right Workload: SSDs are great, but HDDs like
st1
still have a crucial, cost-effective role for throughput-intensive, non-latency-sensitive workloads. Don't pay for what you don't need. - Empower with Data, Not Just Rules: Providing engineers with clear data on actual usage versus cost empowers them to make better decisions, fostering a culture of cost awareness without sacrificing performance.
The Future of Autonomous Cloud Optimization
Our experience with EBS volumes is just one example of the vast potential for autonomous cloud optimization. As cloud environments grow in complexity, the need for intelligent systems like EazyOps becomes paramount. We're not just looking at EBS; similar principles apply to EC2 instance types, S3 storage tiers, and even database configurations.
At EazyOps, we envision a future where cloud resources are dynamically matched to demand, preventing waste before it even occurs. This intelligent, continuous optimization frees engineering teams to focus on building innovative products, rather than constantly battling rising cloud bills. It’s about more than just saving money; it’s about making cloud infrastructure genuinely efficient, resilient, and responsive to business needs.
The goal isn't just to spend less, but to spend smarter, unlocking the full potential of your cloud investment for genuine innovation.
About Shujat
Shujat is a Senior Backend Engineer at EazyOps, working at the intersection of performance engineering, cloud cost optimization, and AI infrastructure. He writes to share practical strategies for building efficient, intelligent systems.