Webinar Transcript: NetApp IT’s Storage Security Program

October 5, 2021 by No Comments

Faisal Salam:                    Thank you. All organizations need protection against cyber attacks [00:01:30] and security threats, and investing in these protections is very important. Hello, everyone. My name is Faisal Salam, and today I will be providing an overview of the Storage Security Program. Let’s look at why we envisioned a program that’s dedicated to security. This goes back to FY ’20 when we began to see an increased focus on security at the corporate level. We were being asked by the IT security [00:02:00] team of any security initiatives that we were pursuing specifically on storage. There were also similar questions that we were asked from our upper management teams. So as we introspected, we found that although we were working on a few items like account management using CyberArk or configuring audit log forwarding to Splunk, there wasn’t an organized effort around storage security. To be realized [00:02:30] that we needed to take a hard look at security and implement immediate measures to protect the data that decides on all our storage systems.

Now, let me pause and ask our partners, could you talk about how our customers are looking at security? Are they keen on implementing the latest solutions specific to security, or is that something that they’re not focused at? (silence)

Speaker 3:                        Guys, [00:03:00] you can post your questions on the chat, or you can just unmute yourselves and answer the questions if you want to.

Faisal Salam:                    That’s okay. All right, I’ll keep going. All of what we just-

Leyddy Portas :                They are saying that they’re very focused though. [00:03:30] On the chat, they said they’re very focused.

Faisal Salam:                    Sure, thank you for that. All of that, that we just saw gave birth to our storage security program, and for us the goal of the program was specifically two things. One, create a roadmap of all items that we considered important and implement all these security initiatives and also establish an ongoing dialogue with our IT and product security teams. Now, this would serve as a platform for us [00:04:00] to present our initiatives on an ongoing basis, and it’s important that we obtain their feedback around what they consider important, and so that we are better aligned.

As we were working through this initiative, one of our first steps was to list out the various risks and categorize them into tracks. Now we obtained these using a mixture of tools that we have, like ERS or [00:04:30] Active IQ. It could be AIQUM, it could be the NetApp communication emails that customers receive. So we used a mixture of all the sources and we came up with seven tracks. So let’s look at what these tracks are.

So the first track is access management, which deals with enabling things like domain-tunnels, audit log management, to name a few things. The second track is data protection, [00:05:00] which is extremely important. Some of the identified risks were SnapVault standards, making sure they’re same across the board, missing SnapVault relationships, where we identified there were volumes in production that may not have a consistent relationship set up with respect to XDP. The third track is automation, which relates to reviewing our current automation standards. It could be [00:05:30] any automation methodology, it could be Ansible, it could be PowerShell, it could be anything. And making sure that they are run in as secure a fashion as possible. The next track is setting up a monitoring framework for security-related events. Now, we have unified manager that’s monitoring all around track systems all the time, generating events and alerts for our operations team to take care of, but we do not have something similar for security. [00:06:00] So it’s prudent that we have an established framework that we can constantly look at the security aspects of storage and alert, and take actions on them.

The fifth track relates to risks such as identifying infected files, checking if there are shares with personal identifiable data and calling those out. So we look at solutions like antivirus scanning, cloud secure, data sense, [00:06:30] to name a few. Now sixth track is important, that relates to storage encryption, and that covers both inflight and address encryption that’s available within ONTAP. And the last track there is an ongoing security review that involves our IT security, the product teams, and also webinars such as these, where we talk to our customers. So let me pause [00:07:00] there. Were there any questions around any of these risks, or the tracks, any comments?

Leyddy Portas :                I believe someone in the chats said, security is a big concern from the storage side ransomware attack on the storage, and we’ll be a bit concerned for any intuition.

Faisal Salam:                    Sure. We’ll be talking about ransomware type attacks, and what we’re doing to deal with those on an upcoming slide.

Leyddy Portas :                [00:07:30] All right. And also, Lara Matthew said the customers are keen on seeing the latest security options.

Faisal Salam:                    Sure. We’ll go through some of those, thank you.

Leyddy Portas :                Thanks.

Faisal Salam:                    For each of those risks that we just saw, we created many efforts that’s shown as part of our larger roadmap here. Now, it [00:08:00] looks like a lot of things, but we started small. We had very few items on this roadmap. And as time went by, we started working on several initiatives, and this is what it looks like today. So the items in green are all the things that we’ve completed, and we’re going to take a look at some of those in more depth.

So we use CyberArk a lot. We use it for things like vaulting, the admin [00:08:30] credentials, the storing encryption keys, and there’s also an integration that we have between CyberArk and Ansible Tower, and we’ll be looking at more of that in an upcoming slide. Some of the other things that we have on our roadmap for audit log and syslog forwarding. So ONTAP logs management events on the cluster, for example, what request was issued, the user that triggered the request, the user’s access method, [00:09:00] and the time of the request. Now, it’s best practice to forward these logs to a secure retention location, and in our case we are forwarding all of these disciplines.

Another thing that we have completed is the validation of domain-tunnels. Now, what is this configuration? This is what allows administrators to log into the storage system with their domain credentials. How is this important? As part of our efforts to harden security, this is one of the [00:09:30] preferred methods to log into the storage system using an administrators’ SSO, or domain account. And it does become important to make sure that these standards are up and working all the time, so what did we do to ensure that we have monitoring turned on for domain-tunnels? We use Zenoss as a monitoring tool that periodically trying to log into our clusters with a particular account, [00:10:00] but that’s a domain account. And if for some reason, Zenoss is not able to log in, it detects that as an issue with the domain-tunnel and it then creates a ticket. That ticket goes to our operations team, and then they go through certain troubleshooting steps. At the end of the day, we make sure that the domain-tunnels are up all the time.

Another effort is encryption, NAE, NetApp Aggregate Encryption, or NVE, NetApp Volume Encryption. [00:10:30] At-rest ONTAP encryption is enabled on their volumes in our environment. We will look at more details in the upcoming slide. The quarterly security review. Now, we often meet with our product teams that are working to create new products and making our solutions better, and so we receive feedback directly from our product teams as to what they consider important. What are the new solutions that’s [00:11:00] available in a new version of ONTAP, and we make sure that we have a PoC in one of our lower environments, and if that’s something we are interested in, we go ahead and enable it in our [inaudible 00:11:14].

Account clean up, as part of our efforts to prune any unused accounts, now, if you take a look at an ONTAP system, you’d see accounts there, you’d see local accounts, domain [00:11:30] accounts, service accounts. Sometimes you don’t know what these accounts are used for, or who added them. So we had an effort to go through every single cluster, look at all the accounts and making sure the ones that are not needed are removed. We put in NGs as a preferred method of authentication, and we also hardened the permissions of individual accounts by looking at the roles and modifying the roles accordingly. So [00:12:00] we will look at more details later. Configuring SnapVault backups for production volumes. There was an effort to make sure that all production volumes have an XDP backup set up. We found there were volumes which didn’t, so we’ve configured it. And we’re also working on enabling automation, so if you create a new volume this automation would automatically create a XDP backer for you [00:12:30] onto an offsite backup cluster, so that’s something that we’re working on currently.

Another highlight is the integration between Ansible Tower and CyberArk. Now, most of our automation is run from Ansible Tower. If you’re familiar with Ansible Tower, that’s where you can host all of your Ansible playbooks, which you can then connect to your storage systems and run automation. Now, all of this automation was [00:13:00] using a single account with an unmanaged password. So as part of this integration we making sure that the automation user password is rotated on a weekly basis by making the account more secure, and every time tower has to run a playbook it connects to CyberArk and retrieves the most up to date password and then runs the automation, thereby making the automation methodology much more secure. [00:13:30] Now, in blue are the efforts that are underway and in orange are the efforts that are planned for the future. So some of the efforts that we are currently working on are auditing NFS exports.

Now, this is of two phases. One is, any open exports are risky because you don’t know what clients can connect, pretty much anybody on the network and connect, and it is a [00:14:00] security hole. So we are working on identifying all open exports, and tying down permissions to specific clients only, The second phase to that is root access. Now, as you know, the root access can be controlled from within the export policy rules. So this effort is to go in and find all the export policies that have root access enabled, and [00:14:30] hardening it and removing root access where it’s not required.

Another effort that we’re working on is configuring the unified manager security dashboard. So if you’ve looked at the new versions of unified manager, you find a very nice security dashboard in there that specifically calls out security risks, and it also gives you suggestions on what needs to be done to fix these issues. So we are currently working on [00:15:00] looking at the entire list of risks that are called out and identifying ones that we need to fix. Now, all of this is based off of the NetApp security hardening guide, and that’s what it uses as the source of truth, so that’s something I would highly encourage for you and customers to take a look at. I know that’s a lot to take in, but was there anything in here that’s of interest, or we need to talk [00:15:30] in depth about? I’ll look at the chat. (silence) I don’t see anything in the chat, so I’ll just keep going.

Leyddy Portas :                Actually, someone has said, can you cover [00:16:00] CyberArk vault.

Faisal Salam:                    I do have a slide on CyberArk, where I talk about how we’ve used CyberArk to rotate or manage the admin pass, the admin credentials specifically. So we look at that in an upcoming slide. Onboard virus scanning is coming back. I don’t believe the onboard virus scanning, this is something you’d have to set up using an external host, [00:16:30] and that’s, again, something that’s upcoming for us, so yes, you’re using using F policy. Any other questions?

Leyddy Portas :                I think that’s it for now.

Faisal Salam:                    All right. Data at rest encryption. Now, at rest encryption and ONTAP is [00:17:00] of two types, NVE that’s at the volume level, and that’s available starting 9.1, and then there’s any NAE, NetApp Aggregate Encryption, which is available starting 9.6. Now, the way it works, as you see in this diagram, the crypto mode, which is what performs the actual encryption and generates the encryption keys for the volumes, that’s how data access happens once the volumes are encrypted. [00:17:30] Now, in case you’re interested, the crypto mode that’s used is a FIPS 140-2 level 1 compliant. There are different levels of compliance based on if you’re using a software or a hardware module. So the onboard key manager is level one compliant, and that’s the highest attainable level for a software module. If you wanted want it to go higher than that you would have to either use the NetApp self-encrypting drives, or you’d have to use an external [00:18:00] key manager with a hardware module.

And NetApp IT has taken a phased approach to encryption. So we started by enabling both NVE and then NAE because it is a phased approach, that’s the only way you could do it for existing data. SO we kicked it off in lower environments, and we observed the performance, and see if you’re hitting any issues. And now we are in the process of enabling it [00:18:30] across the board on higher environment, so our goal is to have every single volume protected by encryption, be it NVE or NAE. And the deployment at a high level consists of three phases. If it is a brand new cluster, it’s very easy. You just turn it on, on the aggregates, and you’re good to go. But if you’re doing existing data, then it can be a bit time-consuming because you’d have to go [00:19:00] in and encrypt, phase, the step one would be to encrypt all the volumes using NVE, which is using the volume-level key.

And once all the volumes in that particular aggregate are encrypted, you can turn on NAE for that particular aggregate, and then you’d have to go back in and reencrypt all the volumes again, using the aggregate level key, so it’s a three-step process for existing data. For a brand new [00:19:30] empty aggregates, you just need to turn on NAE and all the volumes that are created would be NAE enabled by default. So that’s a bit about encryption. Any comments or questions around encryption? Is this something that’s enabled in all your customer environments, or something they’re looking at?

Leyddy Portas :                There’s a question in the chat from, Roger Singh. He said, if you [00:20:00] implement, would it be worthwhile to transfer your knowledge into TRs and best practice guides, especially concerning performance and sizing?

Faisal Salam:                    So I believe there are TRs available out there. There are multiple NetApp TRs around encryption or performance. So this is just the way NetApp IT has implemented it, and our learnings as we went along this journey, like any other customer. So there was a time when [00:20:30] we didn’t have a security program and we were just trying things out, but now we have a very focused approach, and we like every other customer, we are implementing the various solutions that are available from NetApp. And this is our story that we’re sharing here. So TRs are definitely out there. And maybe if we can meet offline, I can share some of those TRs.

Leyddy Portas :                Thank you, [00:21:00] Faisal.

Faisal Salam:                    All right. So, go to the next slide. This is another highlight from our side, which is the ONTAP account cleanup. Now, we touched upon this. So this effort involved reviewing the list of all security log in accounts on the admin SVM, and the admin SVM is phase one for us. And once we had a list, we had thousands of accounts [00:21:30] across multiple clusters. They were a concept were not needed, and so we sat together with our internal stakeholders, and working with them and figuring out if there are accounts that are not needed, or could be moved to the SVM level, because for an application, the data is on the SVM, it makes sense for you to connect to the SVM, you don’t need to connect to the cluster itself.

So we created a list of accounts that [00:22:00] could be purged, and furthermore, we looked at the remainder of the accounts, looked at ways if we could harden access, meaning tightening permissions, removing permissions that’s not needed for a particular account. If you have an account that you’re using for harvest, right, you may not need an admin level permissions for that account, or if you’re using AQM. Depending upon the tools that you’re using [00:22:30] to connect, we studied and reviewed the deployment guides, and we made sure that each of those accounts have exactly what they need and nothing more. So once we had that list, we went ahead and deleted all the accounts that’s not needed, and we had a very consistent list of accounts today. Further more we implemented automation that runs every day, and it does a couple of things. It makes sure that we have the consistent [00:23:00] set of accounts across clusters.

It looks at the permissions under on those accounts and sees did somebody tried to elevate permissions? And if that has happened, the automation is simply going to revert it and send us an email, saying that there’s an account with a permission that’s not expected, so I’ve reverted it. It also looks at any new accounts that were created. It deletes the new accounts that were created by an administrator, or however so it was created. And [00:23:30] if there are accounts that are missing, it also creates those accounts. Now, if there is a new account that might be a genuine request, where somebody wants to add a new account, that has to go through approvals, and then we put it into the automation so that it’s the automation that’s creating and managing the accounts and not the admin themself.

So today we have a very consistent list of accounts that’s managed by automation across all our ONTAP clusters. And, [00:24:00] yes, it’s worked very well for us, and there’ve been instances where somebody would have inadvertently tried to delete an account, or elevate permissions, and that was rolled back and we’ve got an email, so we know that the automation is working. Now, there’s something that’s of interest or something that’s been implemented specific to security login accounts on the admin SVM, which is what’s very critical, because [00:24:30] if you have a bad actor that gains access to the admin SVM with an admin role, there’s so many bad things that can happen. So it’s very important that we review the access to the admin SVM, and make sure that’s tied down.

I’ll go to the next slide. [00:25:00] So this is about CyberArk. So we use CyberArk for various things that I touched upon. We use it to store our encryption keys. So when you enable encryption for the first time, and you enable the key manager, there is a pass phrase that you use, and there’s a key that’s generated. So you have to securely [00:25:30] store these two bits of information if you have to recover, or you have to rethink the encryption keys, so we’re using CyberArk as a means to secure that information. Now, this slide specifically talks about how we using CyberArk to manage the admin credentials specifically. So what we found out when we worked with the CyberArk team is there wasn’t an out of the box integration that’s available with [00:26:00] ONTAP, but CyberArk has been in use at least in our environment for several years with Unix boxes.

So Unix uses ssh, and even we could leverage that for our filers, but there were certain modules that had to be written and tweaked for it to work with the filer. So one of the things that we had to do is there is a reconciler account that you have to create, that’s a backdoor [00:26:30] if the primary admin account gets out of sync for some reason, then you would have to use those reconcile account to re-sync the credential. So what’s happening is CyberArk connects to the storage system via ssh, that’s the protocol that’s used for the integration. And there is the command security login password that gets executed to rotate the admin password. And that’s, that’s pretty much it. That’s all [00:27:00] that’s happening here. There is ssh connection that is made, and there is the security login password that gets run.

And once that password is rotated, the same is stored securely in the CyberArk vault. Now, how do you access the vault? Now, in CyberArk, there is a concept called safe, and that’s where all the storage clusters are onboarded. They might be a storage [00:27:30] safe, they could be a windows safe, they could be a Unix safe, and people that have access to one safe, may not have access to the other, so the access is controlled that way. Also, to log into CyberArk, you would have to be granted access, and it is secured with multifactor authentication. So you log into CyberArk with MFA, and once that authentication is successful, then if you have access to the safe you would be able to query for that particular storage system. [00:28:00] You can just use the search with the storage system name, and then you can retrieve your admin credential and then, then log into the cluster as you normally would.

Now, there are different options that’s available within CyberArk specific to the password itself. Say you wanted the password rotated every time someone accessed it, that’s possible. If admin A retrieves the password, and then he uses it, and then after 15 minutes [00:28:30] you could have it rotate, or you could have it rotate immediately as soon as admin releases that account, that’s one option. Another option is you could set it to rotate on a predefined interval, say the credential only rotates every week or it rotates every two weeks, you can set that up. So there’s a lot of options that’s available within CyberArk for password management.

You could also set it to lock [00:29:00] the account as long as somebody is accessing it. If an admin A is viewing a particular password, viewing the password of filer A, nobody else can use that same password at the same time, so you have to wait until the admin releases the account for somebody else to look at the same passwords, so that’s something that’s configurable as well. So there’s a lot of functionality within CyberArk and it’s all licensed. So based on your requirement, you’d have to work with your CyberArk team and [00:29:30] don on the additional capabilities. So let me just pause and ask if there’s any comments, or questions around CyberArk specifically, just something majority using? ( [00:30:00] silence)

All right. CIFS/SMB auditing. ONTAP provides the ability to enable auditing on the SPMS. We all know about auditing at the cluster level, which is when you log in via any protocol [inaudible 00:30:21] ssh, or HTTP, and the commands run against the filers. The filers is constantly logging that into the audit [00:30:30] logs, which you can go in and view. And you also have the ability to forward that to Splunk. Now, here we’re talking about at the SVM level, you have a similar functionality that’s not enabled by default. You would have to run through some steps to set it up. And basically what you’re doing is you’re turning on auditing on the V-servers themselves. And once they’re enabled, there are staging volumes as you can see [00:31:00] in this screenshot over here, the MDV_ [inaudible 00:31:03], so these volumes get created on all the aggregates that are managed or associated with the SVM.

The admin does not have any control over these volumes. They are completely managed by [MW 00:31:17], you never have to access any of these volumes unless you’re decommissioning the node or doing something like that. So based on what SVMs you have auditing enabled, each [00:31:30] SVM gets a separate directory in these staging volumes, and that’s the raw data that gets collected. Periodically this raw data is consolidated, and then they returned to in an XML or EVTX format to a volume that you specify when you turn on auditing, so that’s what you need access to. There is this auditing volume where you have all the logs, which you can access [00:32:00] via SMB, or you can mount it on a Unix box and access it via NFS.

So what’s worth mentioning here is in addition to logging all of this data, you would also want to as best practice you would want to forward that to a log server. So in our case, we’re using Splunk as a lot server, so that you have historical logs that are saved out for compliance or other reasons. If you want to go back and check who did what step, or if there was some file deletes, you could go and look at that. [00:32:30] So this is something that we are enabling for critical application shares in our environment, and something that I would recommend that you try it out for yourself as well.

And to talk little bit about ransomware type attacks, so NetApp provides NetApp SnapLock compliance with SnapVault that you could [00:33:00] leverage. So an example of an attack would be when an attacker gains access to a file system, and he encrypts all the data in the file system that’s mounted on an application servers, all the data’s encrypted. Now, what he also does is, he probably has access to all your ONTAP systems and he has admin privileges on them. So he could log in and he could destroy all the snapshots and the backups. What that means is then you can’t restore [00:33:30] your data from backup or snapshots. But if you had your volumes protected by SnapLock compliance, then there’s no way anyone could go in and delete or modify any of that, as long as the retention period is in effect. So if there were to be an attack, you could very easily recover the application volumes from the [00:34:00] SnapVault backups that are protected with SnapLock. So that’s something about ransomware and ransomware related attacks and how you’re protected against that.

So as a result of the storage security program, we’ve been able to foster a great partnership between the storage, the ID and product security teams. And we collaborate with them on the various efforts that we are [00:34:30] working on. And our storage ecosystem is much more secure than it ever was. That point, I can open it up for any questions. Is SnapLock’s license and a controller level, and do you have to enable the whole aggregate with SnapLock? Yes, you do. It’s enabled on the aggregate. On comparison with NVE, where you can [00:35:00] simply enable the encryption. NVE, you just encrypt, you could enable adjusted the volume level, if you so desire, but you do lose aggregate deduplication savings if you enable it just to the volume. So that’s where we are enabling it at the aggregate level, using the aggregate key to encrypt all the volumes within, so we also get the aggregate level we do savings. [00:35:30] Are there any other questions?

Leyddy Portas :                There’s a question from Laura Matthews, she’s [00:36:00] saying, is SnapLock available for every type of volume or is it only file shares?

Faisal Salam:                    So once you SnapLock at the aggregate, pretty much any volume that’s created within would get Snaplocked. And you have two types of SnapLock, its enterprise and compliance. So enterprise, even though you set up your lock on it for a specific [00:36:30] time, you can undo it with an admin account, but compliance you wouldn’t be able to. Compliance once you set it up, say for 20 years, there is nothing you can do for that timeframe. So there are two types of compliance that you could enable, and use our license.

Leyddy Portas :                Thank you.

Faisal Salam:                    No problem. [00:37:00] So that’s all I really had.

Leyddy Portas :                I don’t see any more questions on the group chat. Well, it’s [00:37:30] time to go. Thank you so much for attending today. And we will be sending the recording out and the slide deck. Thank you so much, Faisal. And please don’t hesitate to contact us if you have any further questions. Thanks, Faisal.

Faisal Salam:                    Thank you everyone.