Understanding Advanced Snapshot Management


Deleting virtual machine snapshots without wasting disk space

Before using snapshots on your VM, analyzing free disk space on the VMFS volume is very important.  As a best practice or thumb rule you should have least 20% of the virtual machine’s total disk size as free disk space before using snapshots. But this amount can vary depending upon the type of server or how long you will keep the snapshots or if you are planning to use multiple snapshots.

If you are planning to use snapshots on servers like database servers or file servers the amount of free space that should be present on underlying datastore or VMFS volume will change drastically as comparison to using snapshots on servers like web servers or say DNS server because the amount of data written on disks in case of file or database server is much more than any other type of servers.

More importantly if you are planning to include the memory state of the VM’s with snapshots, you’ll also need to allow for extra disk space equal to amount of RAM assigned to the VM.

VM’s with only one snapshot requires no extra disk space when deleting, or committing the snapshots. An extra helper delta file also is created at the time of deleting the snapshots. This helper delta file contains any changes that are made to the VM’s disk while the snapshot is deleted. The size of the helper delta file varies and it’s based on how long the snapshot takes to delete. But in general this file is small in size as most snapshots are deleted in less than an hour.

The amount of extra disk space that is required while deleting multiple snapshot depends on the vSphere version in use because of the way they are merged into the original disk file. The process for deleting multiple snapshots has changed across vSphere versions.

In vSphere 4.0 versions and VMware Infrastructure 3, if a VM has 3 active snapshots and delete operation is performed then the following process occurs:

Snapshot 3 is copied to Snapshot 2, which is then copied to Snapshot 1. Next, Snapshot 1 is copied to the original disk file, and the helper snapshot is copied to the original disk file, as outlined below.

deletesnaphots1

Graphic Thanks to searchvmware.techtarget.com

This process requires extra disk space because each snapshot grows as the previous snapshot is added to it. If there isn’t sufficient free disk space on the data store, the snapshots cannot be committed.

In later vSphere 4.0 versions and vSphere 4.1, each snapshot is merged directly into the original disk, instead of merging with the previous snapshot. The figure below explains what happens when a VM has 3 snapshots active and you deleted them.

deletesnaphots2

Graphic Thanks to searchvmware.techtraget.com

Because each snapshot is directly merged into the original one at a time, no extra disk space is needed, except for the helper file.

Eric Siebert has mentioned one very good word of caution regarding snapshot operation on searchvmware.techtarget.com which is as follows:

Don’t run a Windows disk defragmentation while the VM has a snapshot running. Defragment operations change many disk blocks and can cause very rapid growth of snapshot files

How long does it take to delete a snapshot?

When deleting snapshots through the vSphere Client, the task status bar can be misleading. Generally, the task status jumps to 95% complete fairly quickly, but you’ll notice it will stay at 95% without changing until the entire commit process is completed. vCenter Server has a default 15-minute timeout for all tasks, which can be increased. Thus, even though your files are still committing, vCenter Server will report that the operation has timed out.

One simple method for finding out when a task completes is to look at the VM’s directory using the Datastore Browser in the vSphere Client. When the delta files disappear you know that the snapshot deletion has completed.

There is also command-line method available in ESXi that you can use to monitor the status of snapshot deletions. It is explained in this VMware KB article

Snapshots that have been active for a very long time becomes extremely large in size and can take a very long time to commit when deleted. The amount of time the snapshot takes to commit varies depending on the VM’s activity level; it will commit faster if it is powered off. The amount of activity your host’s disk subsystem is engaging also affects the time the snapshot takes to commit.

A 100 GB snapshot can take hours to merge into the original disk, which can affect VM and host performance. For this reason you should limit the length of time you keep snapshots and delete them as soon as you no longer need them.

Effect of Snapshots and metadata locks on host performance

Snapshots have a negative impact on the performance of your host and virtual machines in several ways.

When the snapshot is taken for the first time activities on the VM activity are paused briefly. Even you will experience a few ping timeouts on your VM when snapshot creation is in progress. Also, creating a snapshot causes metadata updates, which can cause SCSI reservations conflicts that briefly lock your LUN. As a result, the LUN will be available exclusively to a single host for a brief period of time.

When a VM has an active snapshot, the performance of the VM is degraded because of the fact that the host writes to delta files differently and less efficiently than it does to standard VMDK files.

Also, as the delta file grows by each 16MB increment it will cause another metadata lock. This can affect your VMs and hosts. How big an impact on performance this will have varies based on how busy your VM and hosts are.

deletesnapshot3

Deleting/committing a snapshot also creates a metadata lock. In addition, the snapshot you are deleting can create greatly reduced performance on its VM while the delta files are being committed; this will be more noticeable if the VM is very busy. To avoid this problem, it’s better to delete large/numerous snapshots during off-peak hours when the Esxi host is less busy.

Snapshot Best Practices

There are certain things which should be kept in mind while using snapshots. These are discussed as below:

Never expand a disk file with a snapshot running

You should never expand a virtual disk while snapshots are active. You can expand disks using the vmkfstools –X command or the vSphere Client.

In VI3, if you expand a disk using the VI Client, it reports that the task completes successfully. But it won’t actually expand the disk file. And if the virtual disk is expanded with vmkfstools command while a snapshot is active, the VM will no longer start, and you will receive an error:

” Cannot open the disk “.vmdk” or one of the snapshot disks it depends on. Reason: The parent virtual disk has been modified since the child was created”

In later version of vSphere, it is not possible to expand a VM’s virtual disk while a snapshot is running. Also vmkfstools command fails with an error:

” Failed to extend the disk. Failed to lock the file”

The option to resize the disk of VM (select VM disk in edit settings) is grayed out in vSphere Client when a snapshot is running . But once the snapshot is deleted, you can resize the virtual disk.

If a VM has a RDM disk attached, the disk size is managed by the physical storage system and not by vSphere. As a result, you can increase the disk size of an RDM disk while snapshots are active.

Caution: But this action can corrupt the RDM disk, so always ensure that you delete snapshots before increasing the size of an RDM disk.

Excluding virtual disks from using snapshots

If a VM has more than one disk then it is possible to exclude a disk from being included in a snapshot. For this you have to edit the VM’s settings and change the disk mode to Independent (make sure you select Persistent). The independent setting provides you the means to control how each disk functions independently, there is no difference to the disk file or structure. Once a disk is Independent it will not be included in any snapshots.

Note: You will not be able to include memory snapshots on a VM that has independent disks. This is done to protect the independent disk in case you revert back to a previous snapshot with a memory state that may have an application running which was writing to the independent disk. Since the independent disk is not reverted when the other disks are it could potential corrupt data on it.

For VMs that have RDM disks, if the RDM was configured in physical compatibility mode, it will not be included in any VM snapshots. But if the RDM was configured in virtual compatibility mode, it will be included in snapshots.

About Alex Hunt

Hi All I am Manish Kumar Jha aka Alex Hunt. I am currently working in VMware Software India Pvt Ltd as Operations System Engineer (vCloud Air Operations). I have around 5 Years of IT experience and have exposure on VMware vSphere, vCloud Director, RHEL and modern data center technologies like Cisco UCS and Cisco Nexus 1000v and NSX. If you find any post informational to you please press like and share it across social media and leave your comments if you want to discuss further on any post. Disclaimer: All the information on this website is published in good faith and for general information purpose only. I don’t make any warranties about the completeness, reliability and accuracy of this information. Any action you take upon the information you find on this blog is strictly at your own risk. The Views and opinions published on this blog are my own and not the opinions of my employer or any of the vendors of the product discussed.
This entry was posted in Snapshots, Storage, Vmware. Bookmark the permalink.

6 Responses to Understanding Advanced Snapshot Management

  1. Hi Manish,

    Thank you for the article.

    There are couple of things that I wanted to verify with you.

    You mentioned “””””When the snapshot is taken for the first time activities on the VM activity are paused briefly””””””” if I am taking a non memory / non quiesce snapshot then also will the VM’s activity will be paused? I guess its only during the quiesce snapshot the IO of the VM is paused so that it can take a clean snapshot.

    Secondly, you mentioned “”””””””creating a snapshot causes metadata updates, which can cause SCSI reservations conflicts that briefly lock your LUN”””””””””””” I guess with the introduction of ATS only a sector on the LUN is blocked instead of the entire LUN.

    Please share you views on this.

    Thanks

    Like

    • Alex Hunt says:

      Hi Vaibhav,

      Yes the points mentioned by you are correct.

      VM activity will be paused only if you are going for quiesced snapshot.

      Also ATS was introduced with vSphere version 5.x. Before that SCSI-2 disk locking mechanism was used. A SCSI-2 disk lock command
      locks out other hosts from performing I/O on the entire LUN, while ATS enables modification of
      the metadata or any other sector on the disk without the use of a full SCSI-2 disk lock.

      Like

  2. Tony says:

    Nice deeper dive into snapshot info. This will go in my book marks and passed on as people ask about them.

    Like

  3. deepakkanda says:

    Hi Alex,

    Well written article,I’ve three question for you
    1.Do we able to limit the child disk growth?
    2.Why packet drops occur during snapshot creation and deletion?
    3.do we able to preform svmotion during snapshot process?

    Like

    • Alex Hunt says:

      Hi Deepak the answers to your questions is as follows:

      1: I guess there is no direct setting available which could limit the growth of the delta disk growth. Delta disk grows in the increment of 16 MB and can reach max upto the size of vmdk present in your VM (if there is sufficient space available on datastore). The only way which I can think to limit the growth is do not let snapshots run for long hours or days/week. The more is the time elapsed after taking the snapshot the more will be the size of delta disk.

      2: Generally the packet drop issue happens while committing the snapshot and that too if the VM is very IO intensive and at the time when snapshot deletion was in progress high IO activity is happening. Explaination to this can be summarized as below:

      How VMware snapshot deleting works

      First, VMware creates a second snapshot, which is a child of the first snapshot. All writes that the VM performs now, go into the second snapshot. The first snapshot is then committed to the base disk.

      When the first snapshot is deleted successfully, you end up with kind of the starting problem: a VM with a snapshot attached. This second snapshot will be smaller than the previous one. VMware simply repeats this process, right until the one remaining snapshot is small enough (16MB at max). Then VMware freezes IO to the VM, commits the final snapshot, and unfreezes the VM. Because the snapshot was so small, the time the IO is frozen remains acceptable.

      However snapshot deletion could actually fail if the VM writes too much data. In this case, the second snapshot grows faster than VMware itself can commit the first one to the base disk. This can actually mean your snapshot effectively will grow every iteration instead of getting smaller!

      Now coming to the thrid question. Answer is yes. A VM with snapshots can be svMotion’ed if your environment is running on vSphere 5.x. Before 5.x it was not possible to do so.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s