NetApp IT is using microservices monitoring to optimize applications

December 6, 2021 by

By Andy Krajec

NetApp IT relies on microservices to create cloud applications. For us, applications are built from the ground up as a set of services, with each running as its own processes. It’s a tried-and-true approach that many IT organizations use.

Of course, this includes the need to constantly monitor our microservices to ensure quality and efficiency. We take two approaches here – the application and platform – and try to marry the two sides together. For our hybrid cloud Kubernetes environment, we use Instana, an application performance tool, and Splunk, which aggregates information to provide a full few of what we’re working with. This gives us a holistic view of what is happening in the environment.

We actively monitor CloudOne, our hybrid cloud environment, through a robust dashboard. It’s a constant work in progress that is always being revamped, but it makes it easy to see how “busy” we are. Things like CPU or memory utilization appear real time to give us easy and ingestible data that is actionable. For example, storage allocation data can alter where we place mission critical applications.

For applications, we can see how fast they’re talking to themselves and other programs. If they’re peaking consistently across the board, the chances are end users are seeing performance issues with the application. We track that and make performance improvements as necessary. It also readily shows errors that can show issues with the application itself.

There’s also a security component to it. Our monitoring includes security scans that breaks down what potential vulnerabilities exist and how severe they are. This stops issues before they become incidents.
The dashboard actually has seven layers of information deployed on top of each other. When an issue arises, we just have to drill down deep enough to find the root cause.

This is all automated and feeds into ServiceNow to provide immediate visibility. Our team is alerted immediately if something needs to be addressed quickly. We set thresholds for resource usage and when an application begins to reach one, we are able to change resource allocation and research why the application is using so much.

It’s a constant flow of actionable data that we use to improve our applications across the board. It’s unique insights into our environment that let us know what’s working and what can be improved. Our hybrid cloud is self service, so our microservices monitoring serves a variety of uses. If we had an outage, it would be easier to pinpoint. If we want to know how many applications we have running, that’s easy to pull.

Of course, this didn’t happen overnight. Our monitoring is a living organism that is always changing. It’s an iterative process that will likely never end. It’s been a journey to get to where we are and that will guide us where we’re going.

Andy Kranjec

Andy Kranjec is the Senior Manager for Infrastructure Operations for NetApp IT, overseeing the organization’s FinOps operations.