Day 2 with Spot Ocean: Optimizing clusters

August 19, 2022 by No Comments

by Scott Stanford, Sr. Automation Engineer

As I’ve written before, Spot Ocean – in conjunction with Amazon Elastic Kubernetes Service (EKS) – has been a boon for NetApp IT. Our ability to rely on spot instances while also making automation simple enables us to deliver more for less. 

That was day one for us and Spot Ocean. 

Day two has been similarly impactful. 

When we first built out our hybrid Kubernetes environment, most workloads were expected to be running on-site in a datacenter. There were only a couple of AWS EKS Ocean clusters. If this had stayed the case, the care and feeding of the Ocean clusters would have been straightforward and simple, even when a cluster roll or Amazon Machine Image (AMI) update was needed.  

When application teams deploy in the clusters, they are able to select the cluster, on-prem or AWS, their applications use. Our team received requests to have more options available in AWS which quickly changed from two Ocean clusters, one for workspaces and dev integration and the other for proof of concept, to eight Ocean clusters.  

The initial clusters were deployed with a modified version of eksctl, which had worked so well for the initial two clusters that it was used for the remaining clusters. As we worked on day two tasks, we quickly learned creating the clusters this way incurred some additional overhead. 

With two clusters, updating the AMI and rolling the clusters was easy, just some clicks in the Spot.io console. Now that we have 12 clusters, each with multiple Virtual Node Groups (VNG), it was more time-consuming to update all the clusters.  

New ways to manage clusters

We could not recreate the clusters using a tool like Terraform because they were already in use. This required another solution.  

We’ve often used Ansible for automation needs, but here, Terraform was the better fit. We could not manage the clusters through Terraform, but we could build and manage VNGs. 

Each of our clusters has multiple VNGs – some relying exclusively on a spot instance and some relying solely on on-demand cloud. Without a tool like Terraform, it would have been a significant burden to update 24+ VNG’s.  

With this new approach, we were able to simplify the management of the clusters. All Terraform code is stored in git repositories, with our Terraform Cloud configured to watch for updates in the repos. 

To perform updates, whether they’re changing the instance types, number of instances, AMI, or other settings, we are able to commit those changes to the git repos, apply them through Terraform Cloud and then roll the clusters using a cli or API call. 

Tags: ,