Before “observability,” there was plain old monitoring - mostly about whether our servers or production systems were up or down.

We didn’t have SLOs, composable dashboards, or queryable time-series databases. If you wanted to know what was happening in production, you SSH’d into systems and hoped for the best.

Capacity planning was guesswork. Incident response was manual. And running a large-scale system with those tools? From experience, frankly, a pain in the ass.

Fast forward to today: our tools are far more capable, but the pain hasn’t disappeared - it’s just shifted.

Observability now eats up 15-25% of many companies’ infrastructure costs. In one extreme case, I saw a team get hit with a $50,000 monthly bill - just for metrics. We’re awash in dashboards and data, but the real question is: are we any better off?

Building dashboards that actually deliver signal over noise has become a job in itself.

So today, we’re taking a trip down memory lane - exploring the evolution of monitoring and observability, the new pain points it’s created, and where we go from here.

With that, let’s welcome today’s guest: Aaron Pacheco, better known as “Checo.” He’s the founder and CEO of Ottermon.ai, formerly at New Relic, and - fun fact - my old “work spouse”.

You can reach him at his LinkedIn profile.