Transcript: NetApp IT’s ONTAP Ansible Integration with CyberArk
Okay, good afternoon, everyone and welcome to this NetApp on NetApp inform session. I’m going to be your host today, my name is Leyddy Portas and I am a technical partner manager here at NetApp. Just a quick note for those who are new to these informed sessions, we’re hosting these every month. So if you’re not receiving invitations through me, just contact us, me or Elias, and we’ll be happy to add you to our DL. I’ll be posting our emails on the chat just in a few minutes.
So today we have a lot of people on the call, so if you have any doubts or any questions during the presentation and demo, just if you could kindly add it to the Q&A and we’ll have them answer it on the go. So today’s session is entitled NetApp’s IT ONTAP Ansible Module Integration with CyberArk, and today we have presenting, Victor Ifediora, who is our Senior Storage Engineer, and also we have Faisal Salam who’ll be looking after the chat. So Victor, I’ll hand it over to you, thank you for joining us.
Okay. Good afternoon everyone, my name is Victor Ifediora. So I’m going to be presenting Ansible Integration with CyberArk in our NetApp environment. When we started Ansible, we needed to find a way to protect the password that we are using for authentication for the devices that our playbook is running against. And so we decided to use CyberArk to provide us with adequate security for the password. So I’m going to be going over how we did the integration, and how everything is working so far. And I will also show some demo.
ONTAP Management with Ansible History and Mission
In NetApp IT, especially for the storage team, we started our Ansible journey in late 2018, but there are some other teams in NetApp that has been using Ansible for the automation that we started our own journey in the late 2018. So we just built a single server in the lab and we started experimenting with playbooks. We wrote a very, very simple playbooks, and we run them and we saw the effects or the advantage of Ansible.
And the first thing we did was we began converting some of the scripts that we have written in PowerShell or in Shell script into Ansible playbooks. Our mission was to do what we call automate provisioning of our storage, volume, lifs, vserver provisioning, that was one of our missions. Also to maintain configuration standard across all our filer infrastructure. So we wanted a situation when if there’s a configuration drift, we will be able to identify them and configure them, and make sure that all our filers that we maintain consistency across all our storage devices, especially in configuring our NTP or to support our volume parameters, storage efficiency.
And for now, we have been able to accomplish all these with the different types of playbooks that we have written. Also, we wanted to act as a reference for NetApp customers that are seeking to use NetApp Ansible models. So those are our goals why we started using Ansible in our environment. And to this day, this is some of the few things that we have accomplished. This list is not exhaustive, there are some other things.
Completed To Date
So we’ve been able to do what they call the 0 build, this enables individuals to build their filer out from scratch. Once you have a console access or a network connection to a new filer that you’re building, you can run most of our playbooks that will enable you configure the filer. Before, we used to build our filer, it takes about maybe four days to build our filer, but with this playbook, in two hours or in one hour we’re almost done with building a new filer by running different playbooks.
We have a parameter file that we modify, so all we do, we put all the parameters that we need in that file, and we use the parameter file to run the playbook to configure the filer. Also, we have a playbook that enables us to manage our adaptive QX in our environment. This playbook assigns a QX policy to our volumes based on the aggregate that the volume is on. If it’s on a high right aggregate, that volumes gets a high right QX policy. If it is on a performance aggregate, it gets that if it is on a value aggregate.
So we have some standard that we have defined, and also we have a aggregate naming convention that we have implemented in our environment. So based on the type of the aggregate that a volume is on, is automatically assigned to QX policy that we have defined. This policy also runs almost on an hourly basis. If there is any new volume that is created in our environment, it will identify that new volume and it will assign a QX policy on that volume. The same way if we move a volume from one aggregate to another, it will also do that, identify that that volume has been moved and it will assign the right aggregate to that particular volume.
We also have a playbook that enforces a before snapshot policy on all our volumes. We have run into this problem of trying to restore a volume or restore a file from a volume, and we find out that the volume has no snapshot defined. So because of that, we came up with a playbook that enforces a default snapshot policy on all our volumes. Also, we have what they call is SVM Volume Recovery Queue. When we offline a volume, we expect the volume to stay offline for two weeks, and after that, we delete that volume. That has been our standard. And we also make sure that there is no IOPS on that volume before we delete it.
But we have run into some problems with some application team, in that we have offlined a volume, deleted a volume, and after some time the application team will come back to us and say, “Look, this my volume is gone, I’m using it.” So because of that, we developed a playbook that set our recovery queue to… Initially, it used to be 360 hours, now we make it 720 hours. So what this playbook does is that it goes through all the SVM in our environment, and states a volume recovery queue to 720 hours.
So even when we offline and delete a volume, that volume is not actually deleted, it’s stored in the recovery queue. So after one month, if the application team comes back and say, “Look, we still need this volume,” we can go and recover that volume from that queue. So we made sure that every SVM in our environment is set for 720 hours. Also, we have a playbook that enforces our volume space guarantee to non, to all the volumes, both Cifs, NFS and [inaudible] volumes.
We also have another playbook that increases inode. It used to be the highest number of tickets that we used to get in our environment, people running out of inodes in our environment. So we came up with a playbook that will, whenever we get an alert that a volume is running out of inodes, that playbook will go and increase the inode for that volume at a certain percent.
We also have a playbook that enforces our storage efficiency policy. There are different efficiency policy based on the type of disc that you’re using. If you’re using a SSD, there’s a different type of efficiency policy. If you’re using HDD, there is different efficiency policy. If you’re using hybrid volumes, there are different efficiency policy. So this playbook identifies the type of disc that a volume is on, and it will enforce that their efficiency policy for that particular volume.
We also have a playbook that sets access to full control for domain admin, for all our Cif shares. We have another one that does Snapmirror clean up in our environment. We have another one that provisions as SVM as well in our environment. Before, all these things used to be a done manually. We have another playbook also that does auto-grow for all the volumes, so if the volume is I’m running out of space, it will go and increase that particular volume.
We have one that create what they call object store, provision users, an S3 bucket as a service for our CloudOne infrastructure as a service portal. We have the another playbook that enforces what is called Broadcast domain standard, it makes sure that our broker’s domain is named according to our naming standards, make sure that the MTU size and every other thing is set correctly for that particular broadcast domain.
We also have another playbook that does a kind of account maintenance in our environment. It helps us identify all the accounts that has not logged into our filer for a very long time, and we can delete that particular account. So those are all the things we have done. There are so many other things we have done with Ansible, but these are the few I can mention for now.
Password Management History and CyberArk Changes
So for today’s topic, we are talking about integration within Ansible, with CyberArk. So in our history, when we started using Ansible, we used to have a password in clear text in our playbook, and we know that this is a security risk having your password in clear text, meaning anybody with access to that particular playbook will know the password that you are using. Then also after that, we started using Ansible password file that is protected by another password file, but the problem is that the password is still stored in clear text. And also, if anybody has access to the password that you are using to protect that password, it’s going to see the password.
Then when we started using Ansible Tower, we started using the Ansible Tower database to protect our password until we decided to integrate Ansible with CyberArk. With CyberArk, your password is stored safely in a safe, and also the password is stored in one central location. Also, CyberArk help us to rotate our password so now we don’t have a static password. For now, we are rotating our password on a weekly basis. Soon we will start maybe rotating them on a daily basis, but for now we are rotating the password on a weekly basis.
Also, CyberArk will ensure that your password meet compliance, because you can define the type of password that you want in CyberArk. Then also, anytime we are doing any audits on our passwords and reporting, this is able to provide us. So this is what CyberArk gives us. First of all, your password is stored safely. And secondly, you have everything in one central place. Thirdly, the password is being rotated. It depends on you to decide how long you want to do the rotation. Also, you can define your password policy in CyberArk to meet your compliance. Also, you can go there and print any type of report that you want.
How CyberArk Works with Ansible
Now, this is how this works, okay? So here, we have our NetApp clusters. Here, we have our Ansible Tower. All our playbooks run on the Tower, and here you have the CyberArk machine. The CyberArk machine has a safe. It is in the safe that the password is stored. So what you do here, is when you configure your playbook to get the password from CyberArk, when the playbook is about to run, or when you tell the playbook to run, Tower will send a request to the CyberArk Tower. Then the CyberArk Tower, it request for the password, then the CyberArk server will retrieve the password from the safe and return the password to the Tower. Then the Tower will use the password to authenticate the clusters that you want to run your playbook against.
Anyone that is not authenticated, it will not as the playbook on that particular cluster. So this is how this was, the Tower makes a request to CyberArk for the password that it’s looking for. CyberArk retrieves the password from the safe and return it to the Tower, then the Tower goes to the clusters to authenticate all the clusters that are in your inventory that you want to run the playbook against.
Ansible Integration with CyberArk
Now, this is how we accomplished what we did. So there are two configurations, there are configuration on the Tower side, and also there are configuration on the CyberArk side as well. So on the Tower side, the first thing that you need to do is to create what they call a lookup credentials. Ansible comes with a out of the box lookup credentials that you can configure. So that’s the first thing that you need to configure from your [inaudible], the look up credentials is what CyberArk will use to go and look up the credentials in CyberArk.
Creating the Lookup Credential
Then after you create a lookup credential in the Tower, you will create the Ansible credential that your playbook will use, okay? So after you create the credentials that your playbook will use, you will update that credentials to use the lookup credentials. So first of all, you create a lookup credentials, then after that you create the Ansible credentials that your password will use. Then after that, you update this credential that you created here with the lookup credentials that you created here. So you update the lookup credentials with the safe metadata. So you’re going to provide the lookup credentials with the path to the safe.
Then the last thing you do is to update your job template to use the Ansible credentials. So that’s the last part that you do, you update your job template to use the Ansible credentials. On the CyberArk side, the first thing you do is to create an application ID with the necessary permission for access in CyberArk, okay? So the application ID is the user that the Ansible is going to use to access the CyberArk. So that’s the first thing that you create in CyberArk, you create an application ID and that application ID is the one that is going to have the permission to retrieve the password.
And you’re going to use the application ID created here in the Ansible Tower, I will show you how we did all these things. Then you are going to add the Ansible Tower machines to the application that you created in CyberArk. So let’s say I created an application ID called Ansible in my CyberArk machine, then I am going to add all the Ansible Towers that you are using in our environment. Our Ansible Tower is a three node cluster, so we added all the three clusters in the application.
Then after that, we create what you call… You create a policy for access control in the safe itself, then you add your application in your safe, then you will log in into the vault and create or [inaudible] password that you want for that application. So that is how these things is done, there is configuration on the Ansible Tower, there is configuration on the CyberArk, okay?
Update Ansible Credentials to Use Lookup Credential
So this is where you create the look-up credentials in your Ansible Tower. If you go to credentials, first of all, the first thing you give it is a name. So here, we called this Storage CyberArk Lookup Credentials. You provide a description, then you provide… We have an organization name in our Ansible that we call Storage Engineer. Then the credential type we are using is called CyberArk AIM Central Credential Provider Lookup. Now, this is out of the box with Ansible Tower. So this is what will enable you to go and look up a credential in CyberArk.
Then here, we provided the Tower with the URL of our Ansible Tower. Then here, remember I said that you have to create an application ID in CyberArk. This application ID, you will assign it all the permissions in CyberArk that it needs to be able to retrieve the credentials that you need to authenticate your clusters. So here, this is the first configuration tab that you need on the Ansible side. So here, we give it a name, we choose this CyberArk AM credential, then we provide the URL of the CyberArk, then we provide the application ID that we have created inside CyberArk.
Like I said before, this application ID must have all the necessary permissions in CyberArk that it needs to be able to retrieve the password that it needs. So this is how you create the look up credential. You can also apply certificates, but for now we are not doing that, we are just doing it by just the application ID.
Sorry, I don’t know what I… Lookup credentials. Yeah, okay. Then here, this is our credentials that we created. This is the credentials that we created to access the password from CyberArk. So here what we are doing, is that we created the CyberArk credentials. Look at the description I gave it, I said, “This is the credential used by ONTAP playbook. This is our group, then this is the credential type that we are using, ONTAP credentials that we created.
Then this is the user name that we are using. Here is where I give the lookup credentials that I created for CyberArk. So this is where you will now pass on these credential, the lookup credentials that it will use to go to the CyberArk to retrieve the password that you need. So first of all, we created a lookup credentials, then we have a credential that we will pass on the lookup credentials that Ansible is going to use to look up the credentials.
Update the Safe Metadata
Now, this is where you update the Ansible credential to use the lookup credential. So here, you provide the metadata for your Cif. If you look at here, I said Storage CyberArk Lookup Credentials. Then the next is where you’re going to provide the safe information for your sidebar. So here, this is the name of the safe, then this is the application ID. At the end of it, we’ll have the application ID.
Update the Job Template to Use CyberArk
The last part of this will be to configure your job template with the credentials. So here if you look at, this is a job that we have configured. This is our job that configures the recovery queue to 720 hours. So here, we provide the name of the job, we provide the description of the job. We want it to run, then this is our inventory, then this is the credentials that we actually need to use. So this credential contains the lookup credentials, then this is our project, then this is our playbook, okay? Sorry.
Yeah, then this is our playbook. Then these are all the filer, these are all the clusters we want the playbook to run against. So I’m going to go to our Ansible Tower and show some demo. So here, this is our Ansible Tower, okay? Here, we have our credentials, we have all the different credentials that we have configured. This is our Ansible storage, CyberArk lookup credentials. Then this our ONTAP CyberArk credentials. So this is one, ONTAP CyberArk… This is the actual credential that we pass on to the playbook that we update our playbooks with.
But this one, we use this one, storage CyberArk lookup credentials to update this particular one. Let me open this one. Yeah, you see that the other one exists inside here. Let me go back. So here, this is the credentials that will pass on to the playbook, then this is the lookup credential that goes to the CyberArk to retrieve the credential. So this one will pass it on to this. So here if I go back and I open this… Oh, sorry. Yeah, if I go and I open this, yeah, you will see these credentials inside here, okay?
Then if I go to this one, this is the one I showed before. You see the name, then you see the credential type that we are choosing. Like I said, this is out of the box from Ansible Tower, then this is the URL of the CyberArk machine, then here is the application ID that has all the permission to retrieve the password from our CyberArk machines. I go back here again, then I go to this one. I go to this one again, we can look up this. See, it gives us the lookup credentials that we created. We can say next.
So this is our safe information. This is the name of the safe, this is the object. Now if I click on test, it’s going to tell me, “Yeah, CyberArk AIM control provider lookup: Test Passed,” meaning that it’s able to retrieve our password from the CyberArk. So once you create your lookup credential, you have to provide it with information about where your safe is. You have to supply it with your safe name and with the object name, which is the application name that it’s going to use.
So when it goes to the CyberArk machine, it’s going to ask for a password that is aligned with ANS user account, or ANS user object, then it will retrieve that password and it will use the password to do your authentication. And like I said also, we rotate the password every Friday night on a weekly basis. And later, we will start rotating them on a daily basis. Again, let me go… So these are some of our playbooks. This one showing that they’ve run successful, and these are the different inventories, our cluster name that it has run against and is able to retrieve the password. You can see from here, it shows that everything is successful. All the filers are authenticated and all the playbook is run. The playbook, they run for almost 15 minutes, and they run against about 25 or 30 clusters.
Let’s see. If you look at this one, this one is running. This playbook is running now. These playbooks, they run in an hourly basis, I can click on this to see what is going on here. So it’s making some changes, it can go here. So these are all the different clusters, you can see all of them is authenticated. If anyone is not authenticated, if there’s an error, the color is going to be red. And if you look at here, it shows that everything is just running. It started by 7:30, it has finished running, it says it’s successful.
Yeah, you can go through your logs… Yeah. Just finished running. So you can always go back your log to see whether there is any error, whether it’s successful, whether it’s failed or you can also ask it to send you email if your playbook fails, or if it succeed, depends on whatever you want to do. You can supply it, you can tell it what you wanted to do, either send you email or yeah, you can enable that. See, on failure, if you turn this green on failure, send email, we already configured this, our storage alert, and we’ve provided it the email address of people or DL that it will send email if anything succeeds or fail.
So now I think that’s the end of my presentation. I don’t know whether anybody has any question. We are going into a Q&A session, you have any question?
I think the questions have been answered as we have gone along. I can’t see if there’s any more, maybe give it a couple of minutes or so.
Faisal, was there any question on the chat?
There was one. So Peter asked if we use a CyberArk credential provider?
Yeah, that’s what we use, yeah. Lookup credential provider, that’s what we use. There are so many different options, but this is the one that we use, yeah.
And Manford was looking for some documentation of what we’ve configured here, and if there was a GitHub page. They said we can share whatever documentation we have.
Okay. Yeah, at the end of this, I will provide the organizers with some of the documentations. Yeah, you can go there and look at them.
Okay, I think… Oh okay, it’s not a question. It’s just Peter saying that we can download the documentation from CyberArk marketplace. Okay, I think we’re done then. Okay. Well thank you so much, Victor, thank you so much Faisal as well for helping with the Q&A. Wait, let me see if there’s another question, no. Okay.
Well, thank you everyone for participating today, and I will be sending the recording and the slide deck after the session is over. But yes, please don’t hesitate to get in contact with me or Elias if you have any further questions. Thank you, everyone. Thank you.