GCP Persistent Disk Snapshot Retention Gaps: A $9,000 Lesson in Cloud Hygiene

"Why are we spending so much on persistent disk snapshots?"

The question echoed in the weekly FinOps meeting. Our GCP bill had ballooned unexpectedly, and the culprit, as it turned out, wasn't compute or network egress, but something far more mundane: persistent disk snapshots. We were drowning in them.

We'd been diligently creating daily snapshots for compliance reasons, a best practice drilled into us by security. The problem? We'd focused so much on *creation* that we completely overlooked *retention*. Snapshots, intended as safety nets, had become a costly burden.

An abstract image representing a large pile of disks, symbolizing the accumulation of snapshots.

The Snapshot Mountain

Over three months, we'd amassed over 15,000 snapshots. These weren't small, incremental snapshots either. Many were full copies of multi-terabyte production databases. The cost? A staggering $9,000 in just 90 days. For data that was, in most cases, redundant and obsolete.

Manual Cleanup: A Sisyphean Task

Our initial attempt at a solution was painful: manual deletion. We tasked a junior engineer with sifting through thousands of snapshots, trying to identify which ones were safe to remove. It was slow, error-prone, and incredibly tedious. We managed to reclaim some storage, but it was clear this wasn’t a sustainable approach.

A visual metaphor for the tedious and repetitive task of manual deletion, perhaps a figure pushing a boulder uphill.

EazyOps: Automating the Solution

That’s when we turned to EazyOps. Its automated snapshot management feature was exactly what we needed. We defined simple retention rules: keep daily snapshots for 30 days, weekly snapshots for 90 days, and anything older could be safely deleted. The setup took minutes.

An abstract image representing order and efficiency, symbolizing the automated cleanup and cost savings.

The Results: Breathing Room (and Budget)

Within a week, EazyOps had pruned our snapshot graveyard, reducing the count by over 90%. Our snapshot costs plummeted, saving us nearly $7,000 per quarter. More importantly, it freed up our engineers to focus on more strategic tasks than manual garbage collection.

Lessons Learned: Cloud Hygiene Matters

  • Automation is key: Manual processes don't scale. Automate cloud hygiene tasks like snapshot management.
  • Small leaks sink ships: Even seemingly minor costs, like orphaned snapshots, can add up significantly over time.
  • Proactive monitoring is crucial: Don’t wait for a surprise bill. Implement proactive cost monitoring and alerting.
A conceptual image depicting a clear path forward, symbolizing proactive cloud cost optimization and future possibilities.

What's Next: Beyond Snapshots

EazyOps has helped us move beyond reactive cost management to proactive optimization. We're now exploring its other features for rightsizing instances, managing unused resources, and implementing cost-aware policies across our entire GCP environment. The journey to cloud cost efficiency is ongoing, but with EazyOps, we have a powerful ally.

About Shujat

Shujat is a Senior Backend Engineer at EazyOps, working at the intersection of performance engineering, cloud cost optimization, and AI infrastructure. He writes to share practical strategies for building efficient, intelligent systems.