You don’t need Application Performance Monitoring

Klaas van Schelven
Klaas van Schelven; November 5, 2024 - 5 min read
An overweight man with a hamburger in his hands on a scale, looking at measurements
Sometimes, "measure more" is not the answer.

APM tools like Datadog, New Relic, and Dynatrace make a simple promise: just instrument every system, send us all logs, traces and metrics, and you’ll get a full picture of what’s going on, which will help you optimize performance.

No need to do too much thinking about performance, just send us the data and we’ll tell you what’s wrong. This “kitchen sink” approach aligns well with an industry eager to get easy solutions to hard performance problems.

In this article, I’ll argue that APM tools are a trap. They encourage a reactive approach to performance, mask deeper design issues, and come with real costs that often outweigh the benefits. Instead of empowering developers to build performant code from the start, APM fosters a mindset of “measure first”, fixing bottlenecks only as they appear. This approach can lead to a cycle of alerts, reactive fixes, and scattered inefficiencies that could have been avoided with proactive design choices.

There’s no such thing as reactive design

APM tools revolve around alerts and metrics, which means teams focus on fixing issues only as they become visible in the tool. This trains developers to react to immediate problems rather than build with performance in mind from the start. That is, setting up APM nudges your team into a reactive mindset, where insight only comes once issues have already impacted the system.

APM tools also encourage a focus on bottlenecks over holistic design: they highlight the “worst offenders” in performance in spiffy dashboards and charts, and allow for alert thresholds on specific metrics. This can lead to a graph-driven “hotspot” mentality, where teams jump to high peaks in performance rather than examining the underlying architecture and design.

Going bottleneck-to-bottleneck can’t fix deeper design issues, so those remain unresolved. When we design for performance upfront, the benefits go beyond speed: the application becomes simpler, easier to maintain, and far less dependent on monitoring tools to keep it running smoothly. Instead of deferring performance work to a tool, we can address it by making smart architectural decisions, structuring data efficiently, and minimizing dependencies.

There’s another drawback to focussing on bottlenecks: you’ll end up ignoring anything that’s slow, but not quite a bottleneck. Which means you’ll end up with “smeared out slowness”: an application that’s probably still slow, but you won’t know how to optimize.

There’s also the risk of a certain amount of endorphin chasing: APM tools show immediate improvements after a bottleneck is addressed, reinforcing the habit of tackling visible issues while longer-term architectural adjustments remain deprioritized.

The cost of APM

APM tools come with real costs that often go beyond what’s visible on the surface:

  • Lack of Developer “Flow” Each time an alert goes off, you have to triage it, taking you out of your flow. Even if you’re smart enough to ignore your APM-tool during focused work, you’ll have to pick up the work at some point. At that point you have to again understand the code, re-write it, and re-test it. The cost of rework “at some point” is probably higher than the cost of doing it right the first time.

  • Time to Configure: Setting up and maintaining APM takes time, usually developer-time. That time could be used for actual development and proactive performance improvements.

  • Performance Overhead: Every metric tracked, every log stored, and every trace sent to an external service adds latency. It’s ironic that a tool intended to optimize performance can create its own drag, slowing down the very app it’s meant to help.

  • Financial Cost: APM solutions aren’t cheap, and costs tend to balloon with scale. Teams often find themselves paying high subscription fees for insights that a bit of thoughtful planning could have provided up front. It’s no surprise that the cost of tools like Datadog has become a meme.

Meme of a person with a wheelbarrow full of money, titled 'OMW to ask for more budget'

How did we get here?

So if APM isn’t the answer, why is it so prevalent? There’s answers from both the user and the vendor side.

The pitch behind APM is powerful: don’t spend too much time thinking about performance, just send us the data and we’ll tell you what’s wrong. It’s a tempting offer, especially for teams that don’t have the expertise or time to think about performance upfront.

That, and fancy graphs, of course. APM tools come with a lot of fancy graphs and dashboards that make it look like you have full control. And there’s nothing that sells better than a dashboard that says “everything is fine”, or “just fix this one thing and you’re good”. Also: don’t forget the feeling of power that staring at a dashboard gives.

A screenshot of a Reddit post titled 'Who here feels this way?'
The feeling of power that comes with staring at a dashboard. I think the upvotes are unironic.

From the perspective of vendors, APM is an interesting market because it’s a $50 billion market.

They also provide a great “moat”, because it’s hard for your customers to “just do it themselves”. Setting up an end-to-end monitoring system at scale is no small feat, and the more data you collect, the more sticky your product becomes. APM tools are built to cover a wide range of use cases, which means the setup is complex and broad by design – perfect for a SaaS product, since customers are unlikely to bring such a solution back in-house.

Here’s the catch: the actual work of thinking about performance in-house, of proactively designing for it, may be a whole lot easier than relying on an expansive external monitoring system.

Microservices: part of the problem?

The rise of APM solutions coincides with the global shift to microservices. In monolithic systems, performance management was simpler: you could profile the code, analyze bottlenecks, and optimize directly. But as applications split into dozens or hundreds of microservices, the complexity increased. Each service has its own dependencies, network connections, and potential bottlenecks, making it hard to track performance without a tool that can provide a “big picture” view.

APM tools are designed to handle this complexity, offering a single pane of glass to monitor all services and dependencies.

But this complexity is self-imposed. By splitting applications, we created the visibility problem APM aims to solve, adding cost and latency in the process. If our architecture requires complex monitoring just to function, perhaps it’s worth rethinking the approach.

Closing Thoughts

APM tools promise convenience: full visibility, effortless monitoring, and quick fixes for performance problems. But the reality is more complex. By encouraging a reactive, “measure everything” approach, APM solutions often mask deeper design issues, resulting in an ongoing cycle of alerts, reactive fixes, and scattered inefficiencies.

On top of that, APM comes with real costs – in setup time, in performance overhead, and in significant financial investment. For many applications, a bit of upfront thinking and simpler, proactive design choices may be a far better alternative than an all-encompassing APM tool.

I’m sure there’s good use cases for APM: I can rant about it all I want, but you don’t become a $50 billion market without providing some value. I do think it’s overused, and that there’s a lot of value in thinking about performance upfront, rather than relying on a tool to fix it later. I also think that a lot of the value in APM tools is in getting systems that should have been simpler to start with. Better to build applications that are easy to understand and maintain, rather than relying on a tool to keep them running.

The reason for this rant is a personal one: as the builder of Bugsink, an Error-Tracking tool that’s in a space that a lot of APM tools try to “also” cover, I have to ask myself if I want to go down the APM route. I don’t: I think there’s a lot of value in helping teams stay close to their code, and address issues as they arise, rather than relying on an all-encompassing monitoring tool. APM has its value, but it doesn’t have to be the default solution for everyone.