Transcript: DevOps Cloud Platform – Future of Enterprise IT Infrastructure

March 31, 2020 by

DevOps Cloud Platform

Hello, everyone. Definitely very good evening, thank you for taking the time today. I’m an enterprise cloud architect out of the enterprise architecture team, which means I’m not part of any of the product or marketing or sales team like the other guys. Usually, I don’t talk much about NetApp products. The topic we want to cover today is what we internally in NetApp IT are doing about DevOps, how we are approaching it, what is our strategy, how we have built our DevOps platform, what are the goals and vision we have from this platform and what are we trying to do.

I’ve been with NetApp for 14-plus years now. Last five years, it’s all about cloud and DevOps. I lead the strategy for when it comes to which clouds we use, what type of applications we put on this type of cloud, what kind of user experience we want to give to our developers and application teams, and what would be in general our strategy in transforming our organization from a traditional IT company to a newer-age DevOps and digitally-transformed company. That’s essentially my life for the last few years. I have a lot of slides. I think I like what maybe you said is speaker up or ask questions as we go through it because it gets boring. Usually, it’s really good when it’s a room. This is a very broad topic, and there’s a lot of technologies and areas we would cover.

Developers Love Speed, Operations Loves No Risk

This is an old slide, a lot of people might have seen it. Number one reason why you would want to do DevOps is speed. You want to increase the speed of software delivery in your organization. Also, not only just speed. You also want quality along with the speed. If you go fast and you have a lot of errors and mistakes, that usually is not solving what you’re trying to do. Speed and quality is what essentially you are after when you do DevOps. It’s all about how fast you can release code. DevOps is all about code: how fast you can release code in production. When I say code, it’s about features and functionality. That’s the essential value which the end users get, and business allows it.​​

Now, business can deliver things much faster. It can make revenue much faster, customer experience much faster. Business and developers both love speed, but with speed, a lot of risk comes into play. Now we’re going faster. You might have not planned things, or you might have missed some here and there. It might cause impact. That’s what operations doesn’t like.

Two sides of the coin: how do you make both of them happy? DevOps comes into play there. What makes risk minimal in this is automation. You might have heard DevOps is all about people, process, and technology. I’m going to talk about that. Those are still true, but what ultimately gets delivered in the end is your end-to-end automation of your software delivery life cycle so if you have the right people, process, tools, collaboration, they all come together. They will deliver an end-to-end pipeline in terms of automation—how your code can go all the way from development to production in the least amount of time with least defects, automated end-to-end. That’s what ultimately gets delivered.

That also removes the risk element out of the speed. It it’s automated, there’s no human intervention. You’ve done enough QA and testing about your operations. Now, you could remove the risk element out of it. DevOps is all about those three things. Speed, quality, and automation. How fast you can go into production, how effective the users are, and how automated your end-to-end software development life cycle is, but that’s the background. The idea is you really don’t want to do DevOps without a strategy.

DevOps and CloudOne: Focused on Transformative Solutions

Within the enterprise architecture team, everything for us starts with strategies. This a little bit of a precursor to say how we are approaching it. If we looked at our application portfolio, we have about, I don’t know, 300, 400 applications. Then a lot of what I call applications like small applications. If I talk about only the applications, I can easily categorize them into two boxes. One is the applications which I use to run my business and the applications which I use to transform my business.

How we classify that is when the business apps are your common business process the application which help you close your books, report your finances, automate sales force, HR, and things like those. These are common for all our companies, like company A, company B will use email the same way, or payroll would be the same way. The idea over here is or strategy over here is make sure we’re simplifying this type of application. We’re not making them too complex. We’re not customizing them, and ideally in most cases, picking a SAS provider and migrating those applications over there, and consuming it. Here we are kind of the consumers—I do it with no code or DevOps if needed.

Now, the second bucket of application, these are your strategic and innovative type applications. These are unique to your business, and these help you gain strategic advantages over your customers, or your company in terms of customer experience or your product getting better or your revenue getting higher. With these types of applications, the strategy is how can I get speed where I can deliver features and functionality at the pace at what my business needs. How can I get agility where I am quick to adopt new things, at the same time be able to deliver the core capability of the business? How do I make sure the things which are needed are deliver at the speed what we need? We covered that.

Here the strategy for us is to build. Idea is you have full control of the stack. You are not depending on any of the cloud providers like SAS providers, and you are able to deliver things at the speed you wanted to deliver. This is where all the modern technologies like containerization, microservices, all DevOps come into play. This is the type of application where we are focusing on our IT strategy to do DevOps. Here, consuming, and here we are building, so two ways to think about it.

Other way to think about this is DevOps, and we will see through the slide deck, it’s quite an involved process. You really want to put the effort in terms of people, process, and technology, and tools to make sure you are able to do DevOps. The place where you want to put, I really want to do DevOps for all of them. The idea is if you are limited in terms of resources or money, these are the applications where you get the most value out of the investment you make.

That’s another part of the strategy is to say what applications make more sense for DevOps and what are the other applications which we really don’t want to do DevOps. First step for us in strategy is get out of our back end is, okay, these are the applications where we are not doing DevOps. Then here’s the applications where we’re going to do DevOps. For that, we have built our DevOps platform, which we call CloudOne.

Build vs. Buy

Other way to look at this is build versus buy. Here’s the applications we are buying, which are SAS, and here’s the applications we are building on our CloudOne platform. Examples over here, email used to be on Exchange in data centers. Now, it’s kind of Office365. Really, we don’t have to worry about reusing code and features for that. DevOps for that would be done by Microsoft. Same thing over here. Problem chain and incident management used to be remedy or whatever on-prem. Now they’re all service now, so same over there. Our sales force would go into SAP and things like this.

On the other side, on CloudOne, NetApp Support Site is a good example. We get about 500,000 customers every month, built perfectly with all the latest technologies the cloud provides in terms of containerization, microservices, and DevOps. Here, we control the entire stack. If you have to make changes and releases and features available to our customers much faster, we can go at the speed at what business wants and optimize the stack as much as possible. That goes into our build platform here at the cloud.

Cloud Journey @ NetApp IT

What is CloudOne platform? If I go look at CloudOne platform from our perspective, the approach we have taken over here is of a service provider. IT has transformed in NetApp to be a service provider. Everything we want to deliver here is as a service, so CloudOne is our shared self-service platform, which is not just used by one application but all the applications and developers we have in our organization. It’s a one-stop shop for all of our services. You can come to our CloudOne platform, and you can consume either infrastructure as a service, platform as a service, container as a service, or DevOps as a service. These are all fully managed services we provide to our customers. They really don’t have to get distracted or invest their time in any of these technologies. Those are provided by the platform to avoid all the distractions out of the developers and application on our side.

We started this five years ago. You can come and get infrastructure services which will be VM computer storage. You can bring your own containers, or you can get the entire DevOps services, which are all the CI/CD pipelines and everything from the platform. From last 18 months, also we provide full developer experience out of this pack. That’s probably the focus of today’s discussion what and how we have built this component.

If you look from the architecture perspective, CloudOne is truly a multi- hybrid cloud platform, so we leverage most popular cloud providers and also we have converted our internal cloud into Private Cloud, the data centers. Essentially, they also have the same look and feel and model of how Amazon and GCPs would be working, everything through APIs and software code defined.​​

One difference here would be we use irrespective of whichever cloud the workloads or applications gets placed on, the storage technologies are always from NetApp, so we have full control of the data irrespective of the cloud where the application is running. We use technologies like NetApp Cloud Volumes.

In some cases, we still have a NetApp Private Storage, and CloudOne is on top. That makes our data fabric layer. Essentially, the idea is if you have to move a workload from A cloud to B cloud or if you have to have production on-prem and BR off-prem, the data fabric is the layer which helps us move all those, do all those data synchronizations between different clouds … From a technology perspective, we use Private Cloud is built on AFF and HCI, and of course we have NetApp storage, we already talked about, is available on the different clouds.

From a user experience perspective the users come to our central sales portal. We have invested the time in making our own portal. The idea is now users don’t have to worry about all the portals, how to consume stuff from Private Cloud or Amazon Azure. It’s the same self service portal they have. They can go to irrespective of the service or the cloud where they’re trying to consume. They can pick an item from our catalog and get it delivered in any of these clouds based on how the automation and workflow rules are defined.​​

This also we have done some innovation here to say what is the best of clouds to offer? If it’s a IaaS service, GCP, would we put it for that catalog item, or if there is a plain infrastructure as a service, it could be Amazon or Private Cloud. Depending on what catalog item is selected, IT has decided this is the right technology, either we build it through our catalog, or we just expose a service typically from one of these cloud providers, which is best of breed, and leverage that for all those use cases.

It truly lets you use the cloud on your terms and expose the best of catalog items you have to our customers in terms of whether we want to build that service or you just want to bypass the best of the service which is our tech.

Then the last but not the least is the CloudOne platform. It’s delivered on platform. Our definition of CloudOne is not just private data center. If you see it’s now expanded to all different kinds of clouds. Anywhere we get the resources which are software defined at the price point we get are all game for using as resources within the cloud environment.

We are right now in Private Cloud where we use CaaS as well as DevOps, we are using OpenShift today. We are also in the process of looking for alternatives for that case (NKS). Kubernetes have come a lot way since it started, now native Kubernetes makes more sense. Currently, we are looking at a couple of companies when it comes to Rancher or Anthos. By end of this year, we will be moving away from OpenShift to one of those platforms.

For today’s discussion, we will primarily be focusing on DevOps. The key goal for DevOps services to remove out all the distractions out of developers’ life. In this environment, we really want them to focus on the application architecture and writing code. All the tools, processes, and automation what our developers need we want to provide that out of the CloudOne platform. We will see on an upcoming slide how we have built, what kind of tools and technologies and other things we have used to deliver that kind of experience.

Guiding Principles

Just to summarize, what are the guiding principles we are after in this journey? This is more applicable not just to DevOps but the entire platform. If you have key takeaways in IT in general, we are on a path to what we call let’s go and deliver everything as a service. Any service, it could be software or laptop or just a simple VM, all the way to a DevOps or CI/CD pipeline. We want to deliver that as a service to our end customers where they can come to us, our service portal, consume that service on an on-demand and a self-service manner. When they’re done, not only consume but decommission or remove that service in the same manner. That’s one of the principles we have in our journey.

Second, cloud first strategy. As you saw, we are not talking just about moving everything to public cloud. When we say cloud first, we mean adopting a cloud-type model where we use the combinations of everything, SaaS, private cloud, public cloud, to the best of our advantage. Then place the right workflows for the right type of solution. We talked about all of the common business processes. Now, our strategy is to go align them to SaaS. All the innovative, strategic apps we want to build with microservices and DevOps in a cloud-type manner where everything is elastic, metered, pay-as-you-go. What we mean by cloud first is more of a cloud operating model for anything we want to do new or we want to manage. Cloud first is our strategy.

Third, built-in security, we all know the applications we keep hearing in the news. If you mess up on security, what kind of issues you can run into. The whole approach here is provide security and build it into the platform, whether it’s infrastructure as a service, make sure the VMs and the images we have is hardened to the level we are supposed to. If it’s CI/CD pipelines, make sure we have, in those pipelines, security checks for code and vulnerability checking as a step. As soon as a developer submits or downloads a code, if there’s a malware with it, it gets flagged out.

In terms of what stacks do we use, what kind of measures we build, what kind of security firewall routes we put when automated workloads are deployed in different cloud providers, what kind of best practices in terms of Amazon and Azure, let’s say locking your S3 buckets or whatnot, are put into place through the automation.

Security is not an afterthought. The reason is the earlier you fix any security issues, you have less cost and impact on your business. Once you have released something or it’s in production, it’s much harder to fix or even worse, if somebody explodes it and causes you damage, even worse. The idea of built-in security is finding those issues much earlier in the cycle and then fixing it right up there before it becomes into a nightmare.

Last but not least is developer. These are the guys who are really delivering value to your business, so make sure their time is only invested in coding. They’re not really running after Q18s or tools, getting up tools or getting up infrastructure or talking to security architects to get their applications approved. Make sure all of that is covered and delivered by the platform, and all those distractions are removed out of their daily life. They can just focus on writing code and delivering value to the business, which is essentially what business wants them to do.​​

They love doing it, too, but idea is if we used environments which are not before CloudOne or they’re in different alignments, 80% of developer time goes into chasing different teams versus delivering functionality and features. Delivery or productivity is what we are after. Just to make sure, these are the principles we are trying to achieve and making sure that as we make changes or add things into our service, we don’t break any of these and keep aligning to this as much as we can.

Cloud Native Foundation: Tech Landscape

So how do you build a CloudOne platform like that. This is the slide from The Cloud Native Foundation, and these are all the modern tools and technologies available for you to build DevOps platform. It’s kind of all relevant, NetApp’s right here, if you can see my cursor moving. There’s a bunch of other best companies here. The idea is we take a different approach in this stuff. Instead of thinking about what technology you need, when we started, we started thinking with what are the capabilities we want to be delivering to our developers? What do they need?

Start with those in mind, and then choose what’s the right technology for those capabilities to be delivered. There’s no intention to cover all of this, but just to say it can be quite overwhelming, and there’s a lot of churn. Even new, this is again from last year, so you probably look at this year’s won’t fit on this screen.

Key Platform Capabilities

When it comes to capabilities we think there are six core capabilities in this environment. That’s how we have built our platform, and we’ve covered some of those as how a DevOps cycle looks like.

DevOps is all about code. It starts with code, so from a platform perspective, the very first thing a capability the platform should provide is how are we going to manage that code? Where are we going to save that code? How are multiple developers going to make changes in that code? How are they going to collaborate without stepping on each other? How would we save different versions of the code? What would be a branching strategy? What would be a versioning strategy? Version 1, 1.1, 1.2, 1.3. All of those and how would you track a change was made by which developer for which user story or which feature?

If you have the roadmap, you can go between versions, or you can just exactly note the line of code which was written for that one particular person or feature. This is your foundation. The more robust you have this environment, the more capabilities you can deliver in terms of doing multiple A and B type releases when it comes to DevOps capabilities.

You can also do roll backs and roll forwards, either the complete immediate version or even just a small feature. Yeah, the platform needs to provide a robust code and binary management system as the base.

Now, once you have the code, this code needs to make its journey from dev to stage to production. Idea is it has to do that in an automated way. You need to have automated workflows or commonly called as DevOps pipelines which will take this code and do all the steps which are needed for code to make that journey and ultimately release them in production. Idea is, that’s all about what I was talking in terms of automation. How automated that process can be and how defect-free your releases would be. This is all about DevOps pipeline.

The next set of capabilities your platform needs to provide is the CI/CD tools and the pipelines which will take that code. Essentially, the couple we use is Jenkins and Azure DevOps. These are the tools which will be constantly looking into your code and binary management system. As soon as there is a new code detected, the pipeline automatically kicks off. No human intervention needed. They will take the code, run through all the steps which are part of the pipeline. At the end of the pipeline, the code either makes it to the next level, so if it was dev, it goes to stage, or if it’s a complete pipeline, it goes all the way to production, or you get a notification sent back to the developer saying, “Whoever made that change, hey, your code didn’t make it through. It failed because of this reason,” so developer can start writing that code again without any human interaction.

Things like Azure DevOps and other things come into play, and part of the platform, we give this to our developers. As soon as they get onboarded in our environment, the tools and the pipelines are already delivered to them. Also the backend automation. Let’s say in the pipeline, one of the steps could be take the code, fire up a new infrastructure, and run some tests against it. We have to fire up a new infrastructure. The automation which is needed behind the scenes is also provided as part of the platform, so Ansible if you are using VMs or infrastructure-type services. Helm charts if you’re using Kubernetes and container-type workloads.

Depending on which technology you use, the platform also provides all the backend automation which is needed for those pipelines as part of the package as well. Our developers don’t have to worry about writing those code as well. Even NetApp has heavily invested in Ansible modules. If one of the pipelines steps is to fire up a volume so you can run a database on it, we really don’t have to call any of our storage admins. Ansible would make direct calls to NetApp technology and get that delivered as part of the automation needed by the pipeline.

Third capability is quality tools. One of the key principles when we talk about DevOps is finding faults and finding them fast, so fail fast. How do you constantly look for security or functionality issues and flag them right up front and now too late in the game or when you’re at release or when you are after release. One of the steps in the CI/CD pipeline is the code is written. Run a security scan against the code to find out if there is any malware or known issues or liability downloaded from a vendor, if there are any concerns there.

As soon as that is flagged, let the developer know that, hey, you have an open issue. It’s okay for now when you are in development stage. We will enforce a policy on that code. If that’s not fixed, it won’t make it to stage or even to production, depending on what type of issue was found. Including those quality tools in the CI/CD process also gets you the fail fast, and finding those security and functionality issues much faster than later.​​

The next capability here is the platform itself, and this is our CloudOne platform. Ultimately those developed pipelines, either it’s for development or stage or production, but you need some infrastructure around that. For us, that’s our CloudOne platform itself. The key part of it here is the infrastructure here is all software defined. It’s consumed through automation tools using APIs. It provides modern stacks like containers and microservices so you can build cloud-aware applications. It gives you the best of cloud choices which are available for our developers to use across different clouds.

Last but not least capability is observability. These are your eyes and ears into the tools, into the platform. Essentially, as the code is flowing through different processes, observability tools would find out if the change you made or the release you’re trying to make meets the mark or not. There are key metrics and KPIs defined which will serve as feedback loop back to the developer to see how you can go and fix the issues which are not working or optimize them to the next level and start the new DevOps cycle.

In terms of our capabilities, these are the six we think are needed by our developers. The idea is not to get overwhelmed by the slide we saw before, but give them choices. Not the choices in terms of options over here, so if our options had to go on the previous slide and if you have let’s say, I don’t know, hundred applications each, they would come up with their own tools and processes and technology which they want to use for DevOps. While I like flexibility, but if you have to put any governance or processes around it, how do you do security scans, how do you do change management, how do you do release management, how do you do tracking and notifications? It would be nightmare across all of them.​​

In IT, our approach is provide all of those as a service to our customers with all those governance and other processes built around it, and have a couple of choices in there. If not five, at least two in all of those categories which will fit most of our developer use cases. That helps you put some … That helps you operationalize it much easier than trying to deal with technology or tool sprawl which can happen if you just go have your developers have a go at the previous slide.

Developer Experience: Day 1

In CloudOne, the idea is we break our user experience by a couple of days like day one and day two, even though all the stuff we talked about is quite a lot. The good part is it’s all fully end-to-end, automated from our developer’s perspective. With a few clicks of a button on our self-service portal, they can get the entire capabilities landscape in terms of version management system and pipeline and automation scripts and all the scans they need, even the VPCs or VM space created in open shift environment or in Amazon, whatever, automated for their use cases. We call them work spaces. Those spaces delivered to them with full end-to-end integration as well as permissions and authorizations all set up in 15 minutes.​​

If you have latest environment and you’re trying to do DevOps, this itself would take you months if not weeks. Here, we are ready to start doing DevOps in 15 minutes in this model. Only thing they have to now bring is the code. Once they have the code, essentially they have the environment ready for them to start using. That’s essentially our day two. Day two, as long as they make a code change, the pipeline will automatically pick up from dev to stage to production, and finally get that application released in production. Ideally, without human intervention end-to-end.

Some of the applications can do this today, end-to-end, and some applications have to go step-by-step. They can only go up to the CI stage or CD stag depending on how much of a test automation they have done, what kind of technologies they are dealing with, and what kind of delivery times or SLAs they have. Depending on that, they are trying to make the right investments in terms of how much effort or value we want to get out of DevOps. That means how much of an automation we are doing.

If you look from the perspective of things over here so other things we are doing in this environment, we use a lot of public cloud for any temporary-type workload. Dev, QA, test, DR. In our strategy, we are using public cloud for those, it can be any cloud. Idea over here is we really don’t want to deal with temporary capacity-type management in our environment. If you want to run a new development environment, you wrote the code and you want to bring up an environment to test your code against, we just rent the equipment from Amazon for that few hours or days or month, weeks of the time.

Once the development stage is completed, these are FML environment. We just delete them because now we have the capacity … It’s all state-less, your code in your version management. You have the automation script in this environment within 15 minutes, so you really don’t have to have a development environment which runs all year long and gets used only one-third or one-fifth of the time. In previous deployments, we would have app teams which would have five or six development environments for different reasons for different teams. They will run it all year round.

By doing this kind of automation and using public cloud, we’re completely able to remove that dead weight from our data centers and just not worry about those depreciation and other cycles we have to do for maintaining that part of it. Just rent it  for the days and amount of time we need it. For something which is more 24/7 running or production, if it’s the right tools or technology selected, a Private Cloud makes more sense, so we really don’t have to worry about really high cost or bills running in this environment.

Developer Experience: Day 2 and beyond

Also, from day two onwards, developers don’t even need to come to our cloud platform. Tools like Azure DevOps and Jenkins they are used to working on, they can pretty much control the entire pipeline or delivery of code from that tool. They would write a user story. They would submit a code change. As soon as the code change is submitted, the pipeline take care of it all. If at any point the pipeline fails or succeeds, they get a notification right in the Azure DevOps tool. This is a tool which our developers are really familiar using it. Everything they want to do in their entire process is now integrated in the tool.

Kind of extracts the complete infrastructure or the clouds or the technologies we have, even the process for most, like security and governance and change are completely removed. They just have to, from day two onward, they just have to work from the tools what they will be using.

​NetApp Products in CloudOne

As I said, I don’t talk much about the NetApp product. This is the only slide where we hae just to show what NetApp technologies are in play here. For all of our code and container binary management, we use object storage, which is our StorageGRID solution. That’s where all the binaries and images live.

In NetApp, in Amazon or public clouds, we use the cloud volume service essentially for all of our storage. The public cloud for us is built on NetApp HCI and All Flash FAS. I don’t know if you have heard about Trident. This is essentially an open source, fully-managed offering binary app, which will help you bring enterprise-grade storage even now to your Kubernetes-type environment. Essentially, Trident is kind of a plug-in which sits along with OpenShift. If there is any plus systems volume play, then by OpenShift environment, those could be routed and managed for NetApp technology either on-prem or in public cloud leveraging Trident. Even though we, when we say open source, it still is fully supported. Sorry, I mean fully supported. You can even call NetApp and say, hey, I have a problem with it, and help desk would help you support. Just behind the scenes of what technologies are at play here.

We have a couple of areas where Cloud Insight is part of when it comes to … We use heavily on the storage side. All of the NetApp storage, Cloud Insights is our choice of, our tool of choice for measuring what kind of storage consumptions you’re running into. It’s also heavily used in a lot of infrastructure as a space area. There are quite a few tools when it comes to having an overall governance policy, how do you manager container implementation because a lot of times those containers are in the VMs and open shift environments.

They have very flexible scale-up and scale-down model, and how do you charge your customers based on what resources are being consumed. We have other tools into play as well when it comes to complete governance, but yeah, Cloud Insights is primarily used for all the making and measuring of NetApp storage technology today.

“DevOps is like a train rolling down a never-ending track.”

With that said, is it done? Are we done yet? Essentially, what we call DevOps environment is like a train which is rolling down never-ending track. That’s the metaphor we use. What we mean by that is like if you are moving down the track, we would keep adding more cars to our train, which would be in terms of the functionality and features. We added infrastructure as a service, platform as a service, AI, ML. Those are all the cars we are adding, and the passengers which keep coming in are kind of the applications which want to consume those services.​​

DevOps for us is never a destination. In fact, it’s true for everyone. If you are doing DevOps, it’s all about how to build a culture and mindset of continuous improvement. Am I able to do CI which is continuous integration, or am I able to do CD, continuous deployment? Even if I am able to do CI/CD both, are the KPIs and metrics really good? Am I able to, instead of weekly, can I do daily or hourly? Is my defect ratio, I used to have one defect or five defect, can I lower it to two or one? It’s all about that mindset. How do you go to the next level while the train is constantly moving and you are making all of these changes?

Change is a constant, so what mindset do you have to have? We saw the technology six capability slide. We didn’t start with all those capabilities. They kept getting changed or added. They moved in and out of the environment as they made sense. It’s a constantly evolving environment. You have to set yourself up in a way that change is not looked as really hard or it’s not hard to make a change. At the same time, you’re able to release features and functionality what your business and developers need, at the speed what they need. In this journey, what we say is there’s never going to be a point where we say we are all done, but idea is it keeps running down the track.

Levels of DevOps Maturity

With that said how do you measure where you are in terms of your maturity. We have a five-level model we use internal within the NetApp IT, and those are the levels we see over here. Two ways to look at it is what are the capabilities we can deliver from the CloudOne platform perspective. Today, we say our CloudOne platform is at level four, and we are able to deliver our capabilities what level four can deliver.​​

The second is from an application perspective, is your application able to leverage all the capabilities we can provide? Are you at level one, level two, level three, or level four. The levels are about how fast can you release the code in production, how automated your software development life cycle is, and how defect-free your releases are. Those are the metrics and KPIs we use over here.

There’s no one level which fits all the applications, so if you have an application which really is releasing features once in one year or three years or higher, it’s okay for them to be at level two. If you are an application which, like our Support Site or Netflix or Amazon, you really want to be at level five where really all of your process is fully automated, and there’s no human intervention. You’re kind of constantly in a way of optimizing and shrinking your numbers in terms of release velocity, in terms of L rates, in terms of time it takes to get the value delivered, and things like that. You are at the level five level.​​

I’ll try to quickly touch upon what we mean by these levels. At level one, everything is manual. You’re doing big and heavy, risky releases, maybe once a quarter or higher. There’s not much repeatability and predictability in the release. Everything is like the old waterfall model where we make all the developers submit their changes. We make a big release. If something doesn’t work, we do a painful roll-back and restart the process all over again. This is okay if your app is once in a three-year, two-year releasing something.​​

At level two, now you are in DevOps. Your pipeline is automated up to the development stage. When we say the development, it’s about people are able to do continuous integration, CI space, and you are … In the CI, what you need is you are able to do unit testing, functionality testing. You’re able to do code and security scans. Multiple developers can come and make changes at the same time, and at the same time test plans are running. They are merging into your code in the release branch. They’re still not releasing on the automated way, but they are ready for release.​​

Now, at level three, you’re also not only able to do CI, the development stage, you’re able to do your staging stage as well. You’re able to do automated UAPs and LNPs in addition to unit and functional testing, what you were doing over here. You’re able to deliver that artifact which can be pushed into production. It’s ready to go into production, but you don’t have the processes in terms of release management, change management, and all the validations and roll-backs once it’s put into production. There’s a lot of processes and technology which will be needed in going to level three to level four.

Now, if you have all those processes integrated into your life cycle, once the production, the code is released in production, you have automated verifications in terms of whether the new code made the mark or not or do we need to roll back. If it didn’t make the mark, you have tools which will do an automated roll-back for you. Now you are level four if you are there.

Now, if you have completely automated your end-to-end CI/CD pipeline, level five is all about having the KPIs and metrics we talked about and getting to the next level of optimization. Amazon and Netflix we talked about. They released thousands of lines of code in production every hour. There’s no way you can have any human involved in any of those processes. Even without humans, you still need lot of optimization of their stack in terms of hourly or minutes or weekly changes to get those. That’s what we call as optimized, and from CloudOne platform perspective, we are a level four. We have at least two or three of our apps which are already at this level. Then lot of them are making their journey.

That’s why we like to use the word journey. We have 400 applications. They all will work through their cycles to make through this as a platform. We are going to continuously evolve the platform capabilities. At the same time, we will make sure our application also are climbing up the ladder as it makes sense for the business.

Key Takeaways

We talked about speed being the number one reason you want to do DevOps, but that is not at the expense of quality or risk. It’s not an “or” game. It’s an “and” game, at least in our mind. You need to have a strategy in terms of why you are doing DevOps, where you want to do DevOps. As we saw, it’s quite an involved process. It’s not natural, like it won’t happen by itself. You have to make sure you have the tools, processes, technology, automation, all of that in place, so where do you want to invest that money and which applications are key for you to get that value out of that investment.

If you remember buy versus build strategy, a lot of our applications are already on SaaS. We really don’t have to do any of our DevOps with those. Ones that are innovative and strategic, that’s where our focus is for DevOps. Otherwise, a lot of time, we would see app teams depending on if they have skills, they would like to do DevOps for their environment, but it might not give you the value what you are looking for. It might just be a science project. Have everything start with the strategy.

This is all a new operating model. I talked about we are more of a service provider internally so how are we delivering everything as a service? Being an internal service provider, managing each of that service we deliver as a product. We are transitioning all of our technology-delivering teams from technology owners to product owners. We need to think how we are product owners in a company, and every catalog item we have over there in our catalog, we have a product owner assigned to it, which is working day in and day out to say, “How do I improve the efficiency of my product?”

From day one, they want to start with let’s say a charge-back. Day two, they would also have charge-back included in the product, or they want to have a feature to say how it can be elastic. If we started with two VM, now I can change, I can go to four VM. Those kind of features is what each of our product owners maintain. They have a quarterly roadmap of saying, yeah, here’s how the different capabilities and features we want to deliver in this platform moving forward.

Moving away from what we used to be, technology delivery, working in silos, using lot of tickets to now completely self-service model and your interaction to a customer is dedicated to that automation and self-service portal versus trying to overwork the old way. Adopting the new technology model. Then again, we just talked about DevOps is a journey, not a destination. In this journey, you need to have the mindset of constant changes, constant improvement. Then, yeah, it’s just a new way of IT. Think of it as a DNA versus a destination or a project we want to finish. This has to be built into your culture for continuous improvement, a mindset of continuous improvement and culture throughout your organization.

Thank You! Q&A

That brings to the end for the slide deck. We have about five minutes for Q&A. I’ve put together some of the be in touch type contacts. The one I want to point out is NetAppIT.com. We have a lot of the user stories like DevOps, how do we do OCIs and how do we do StorageGRID, hybrid cloud, infrastructure as a service. All of those are up there, so take the time to look through it. If you have to reach out on Twitter or email, feel free to use those handles and addresses. With that, I would say thank you and then open it up for any questions you guys would have.

Q: By the way, excellent presentation, and thank you for the way that you laid everything out. Very interested to hear about the tools and the processes and so on. If one of your apps goes down at 3 o’clock in the morning is that scenario we usually hear about, who’s getting paged? Is the developer getting paged, or is the operator getting paged? What I’m getting at is where the functions come together within NetApp?

Yeah, that’s a journey, an evolution, too. Depending on what app it is, so some apps, not only in terms of technology but also in terms of processes, they are a little different, right. I don’t know if you’ve heard of processes or models like no-ops. Like you built it, you support it, end-to-end. If you are more at level five, you will see yourself doing more of those, where essentially developer team is responsible for operations of their apps, too. Then there are also centralized help desks where you would go to … Most of our apps go to centralized help desk direct. That’s what we would be getting paged, and we have the resources there.

Then what we are also striving to do, and it takes time. The hard part is usually a little bit of work and reasons. Once you are maturity level, then you realize this process is not good enough. Now we need to make next level of optimization here in terms of, yeah, make ops part of dev team. You have completely no ops, and as soon as it breaks, you fix it. You manage it. Then you go with it.

A lot of time, you have the intelligence in the app to say, why even break. It’s immutable. You bring up the infrastructure in a cloud provider whatnot or just bring up that same containers all over again if it’s truly stateless. You won’t get paged that often, but now even if you do, the responsibility will be on the dev side for very high-up-the-draft type apps. 

Thank you for reading. Click here for the recording of this webinar.