Storage snapshots capture a moment in time. For many organisations, they provide an easy way to augment traditional backups, ensuring rapid recovery of IT infrastructure.
But, since storage snapshots are only as good as the people, processes and technology they are associated with, they could be risking their data and business by relying on them, writes Johan Scheepers, Commvault systems engineering director for MESAT.
There are five common reasons for snapshot failure – knowing how to overcome them will minimise risk.
Yet the complexities of snapshots – hardware array-based, hypervisor-based or virtualisation snapshots and software snapshots – pose a number of challenges.
The top five reasons snapshot fail?
* Virtual machines (VMs) and virtualisation;
* Storage integration;
* Infrastructure changes;
* Lack of app integration; and
* The human factor.
IT can upgrade an OS or add a virtual machine. There might be issues with application integration. There could be multiple storage array vendors in an environment each of which requires its own unique snapshot solution. IT may use multiple tools to manage snapshots. Addressing each of these situations typically requires updating complex manual scripts which can easily break. Finally, there’s the human error factor to consider.
Yet snapshots remain a vital part of a data protection strategy.
Why are snapshots useful?
Snapshots can roll a system back to a specific point in time without needing to restore backups. While they are not as robust as traditional backup, they are highly efficient. They offer fast protection – they can be created in seconds with very little impact on the production environment. They can also be more convenient to use than traditional backups in a number of situations, such as when a patch deployment goes bad, or for recovery from a malware infection.
Snapshots have caught on in the enterprise for a number of reasons: they are faster, as recovery is internal to the array which is much faster than traditional backups that have to go through multiple hops to recover the data. Snapshots also offer more accurate point-in-time views of the data which meet recovery time and recovery point objectives. And they can be cost effective in that the infrastructure investment in the storage array has already been made and there is little to no additional investment to turn on the snapshot capabilities.
Challenges with snapshots often occur when processes are manual.
VMs and virtualisation
Storage virtualisation is mainstream but VMs are one of the main reasons snapshots fail. Why? Well, back when we had physical servers that all had their own disks, IT could revert that disk to a previous state without impacting anything else.
Now, with virtualisation, whole servers are connected to a central point of storage. That central storage array will typically have multiple virtual machines stored on it in the form of virtual hard disks. So, if you are implementing a snapshotting solution at the storage level that is not virtualisation aware, you will be affecting not one but multiple virtual machines on that storage volume. That could have some nasty effects.
For example, if you move a VM to another storage repository then restore a snapshot from an earlier time using a non-virtualisation-aware solution, that VM will be gone. Snapshot solutions thus have to be virtualisation aware.
Digital data is growing at an exponential rate. This impacts snapshots in a number of ways. More data makes it difficult for enterprises to meet their backup SLAs using legacy solutions as the backup windows simply are not long enough to complete the process. Snapshots offer a faster solution.
In addition, since a lot of the data growth is from non-production environments, it enterprises are starting to look at using snapshots for copy data management purposes to reduce storage requirements.
However, with storage growing at a tremendous rate, there are changes to infrastructure and that impacts snapshot management. Changes in your storage environment, such as a storage hardware refresh or adding new disk shelves to accommodate capacity requirements, can break scripts.
Storage and infrastructure changes
The number and type of different storage platforms are a major part of the problem. While many of the storage arrays have similar capabilities, the APIs are different across vendors and even within vendors, which drives the need for customised scripts for each application and array. This increases complexity, particularly in multi-storage platform environments. Enterprises have to know where all those scripts are because scripts have to be updated as the environment changes. If you miss a script, it will break a snapshot..
Finally, the number of snapshots that you can manage on an array is important and can vary greatly by vendor. As a result, companies must build to the lowest common denominator.
Ultimately, what you need is a tool that can translate the different APIs and bring them together into one consistent interface. Then, as you upgrade or change vendors, etc., these changes won’t end up impacting operations because the tool knows how to talk to the old and the new storage. Cloud adds another dimension.
With the growth of cloud strategies and cloud storage, which for some organisations include multiple cloud providers, you need a tool that understands how to move data to and from the cloud, but also within and between clouds.
Ideally, you also want a solution that doesn’t require separate cloud gateways to get data to the cloud because that adds unnecessary cost and complexity.
Lack of app and hardware integration
App and hardware integration and manual scripts are a challenge because if storage capacity is increased or the application is updated, the script must be updated to keep up with the changes. However, once you are dealing with more than a handful of applications, updating the script can become cumbersome and unmanageable.
You need a solution that is application aware and that can fully automate snapshot execution, with all standard operations built in. The solution needs to be able to handle all the integration points, including for applications natively without customers having to build or maintain anything.
The human factor
When it comes to snapshot challenges, the human factor is sometimes understated. Scripts require cross-functional coordination (across storage, OS’ and app experts), which increases risks. Manual processes for scripting increases risk further. Security is also an issue. You need to maintain security boundaries from an access standpoint to, for example, databases and storage, while still maintaining functional processes.
Ideally, what you want is a solution that understands application, OS’, hypervisors, storage, and the like, so you don’t have to rely on manual intervention. You also want a solution that can provide control and access to the right people at the right time without having access to the infrastructure.
People are just as important as process when it comes to creating snapshots.
The right solution – technology can help improve snapshots
With the right software, organisations can improve snapshot creation and management capabilities, and gain control. This will make backup and recovery more intelligent and more likely to succeed. Organisations today have lots of moving pieces – they need something that is aware of all the apps running and can talk to all the different types of storage, including future storage, and manage all disparate systems.
To get to the next level with storage challenges, look for a comprehensive solution with simplified management.