cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darren Shepherd <darren.s.sheph...@gmail.com>
Subject Re: [DISCUSS] Pluggable VM snapshot related operations?
Date Tue, 08 Oct 2013 21:54:28 GMT
A hypervisor snapshot will snapshot memory also.  So determining
whether do to the hypervisor snapshot from the quiesce option does not
seem proper.

Sorry, for all the questions, I'm trying to get to the point of
understand if this functionality makes sense at this point of code or
if maybe their is a different approach.  This is what I'm seeing, what
if we state it this way

1) VM snapshot, AFAIK, are not backed up today and exist solely on
primary.  What if we added a backup phase to VM snapshots that can be
optionally supported by the storage providers to possibly backup the
VM snapshot volumes.

2) Additionally you want to be able to backup multiple disks at once,
regardless of VM snapshot.  Why don't we add the ability to put
volumeIds in snapshot cmd that if the storage provider supports it
will get a batch of volumeIds.

Now I know we talked about 2 and there was some concerns about it
(mostly from me), but I think we could work through those concerns
(forgot what they were...).  Right now I just get the feeling we are
shoehorning some functionality into VM snapshot that isn't quite the
right fit.  The "no quiesce" flow just doesn't seem to make sense to
me.

Darren

On Tue, Oct 8, 2013 at 2:05 PM, SuichII, Christopher
<Chris.Suich@netapp.com> wrote:
> Whether the hypervisor snapshot happens depends on whether the 'quiesce' option is specified
with the snapshot request. If a user doesn't care about the consistency of their backup, then
the hypervisor snapshot/quiesce step can be skipped altogether. This of course is not the
case if the default provider is being used, in which case a hypervisor snapshot is the only
way of creating a backup since it can't be offloaded to the storage driver.
>
> --
> Chris Suich
> chris.suich@netapp.com
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
>
> On Oct 8, 2013, at 4:57 PM, Darren Shepherd <darren.s.shepherd@gmail.com>
>  wrote:
>
>> Who is going to decide whether the hypervisor snapshot should actually
>> happen or not? Or how?
>>
>> Darren
>>
>> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
>> <Chris.Suich@netapp.com> wrote:
>>>
>>> --
>>> Chris Suich
>>> chris.suich@netapp.com
>>> NetApp Software Engineer
>>> Data Center Platforms – Cloud Solutions
>>> Citrix, Cisco & Red Hat
>>>
>>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd <darren.s.shepherd@gmail.com>
wrote:
>>>
>>>> So in the implementation, when we say "quiesce" is that actually being
>>>> implemented as a VM snapshot (memory and disk).  And then when you say
>>>> "unquiesce" you are talking about deleting the VM snapshot?
>>>
>>> If the VM snapshot is not going to the hypervisor, then yes, it will actually
be a hypervisor snapshot. Just to be clear, the unquiesce is not quite a delete - it is a
collapse of the VM snapshot and the active VM back into one file.
>>>
>>>>
>>>> In NetApp, what are you snapshotting?  The whole netapp volume (I
>>>> don't know the correct term), a file on NFS, an iscsi volume?  I don't
>>>> know a whole heck of a lot about the netapp snapshot capabilities.
>>>
>>> Essentially we are using internal APIs to create file level backups - don't worry
too much about the terminology.
>>>
>>>>
>>>> I know storage solutions can snapshot better and faster than
>>>> hypervisors can with COW files.  I've personally just been always
>>>> perplexed on whats the best way to implement it.  For storage
>>>> solutions that are block based, its really easy to have the storage
>>>> doing the snapshot.  For shared file systems, like NFS, its seems way
>>>> more complicated as you don't want to snapshot the entire filesystem
>>>> in order to snapshot one file.
>>>
>>> With filesystems like NFS, things are certainly more complicated, but that is
taken care of by our controller's operating system, Data ONTAP, and we simply use APIs to
communicate with it.
>>>
>>>>
>>>> Darren
>>>>
>>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
>>>> <Chris.Suich@netapp.com> wrote:
>>>>> I can comment on the second half.
>>>>>
>>>>> Through storage operations, storage providers can create backups much
faster than hypervisors and over time, their snapshots are more efficient than the snapshot
chains that hypervisors create. It is true that a VM snapshot taken at the storage level is
slightly different as it would be psuedo-quiesced, not have it's memory snapshotted. This
is accomplished through hypervisor snapshots:
>>>>>
>>>>> 1) VM snapshot request (lets say VM 'A'
>>>>> 2) Create hypervisor snapshot (optional)
>>>>> -VM 'A' is snapshotted, creating active VM 'A*'
>>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of 'A*'
>>>>> 3) Storage driver(s) take snapshots of each volume
>>>>> 4) Undo hypervisor snapshot (optional)
>>>>> -VM snapshot 'A' is rolled back into VM 'A*' so the hypervisor snapshot
no longer exists
>>>>>
>>>>> Now, a couple notes:
>>>>> -The reason this is optional is that not all users necessarily care about
the memory or disk consistency of their VMs and would prefer faster snapshots to consistency.
>>>>> -Preemptively, yes, we are actually taking hypervisor snapshots which
means there isn't actually a performance of taking storage snapshots when quiescing the VM.
However, the performance gain will come both during restoring the VM and during normal operations
as described above.
>>>>>
>>>>> Although you can think of it as a poor man's VM snapshot, I would think
of it more as a consistent multi-volume snapshot. Again, the difference being that this snapshot
was not truly quiesced like a hypervisor snapshot would be.
>>>>>
>>>>> --
>>>>> Chris Suich
>>>>> chris.suich@netapp.com
>>>>> NetApp Software Engineer
>>>>> Data Center Platforms – Cloud Solutions
>>>>> Citrix, Cisco & Red Hat
>>>>>
>>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd <darren.s.shepherd@gmail.com>
wrote:
>>>>>
>>>>>> My only comment is that having the return type as boolean and using
to
>>>>>> that indicate quiesce behaviour seems obscure and will probably lead
>>>>>> to a problem later.  Your basically saying the result of the
>>>>>> takeVMSnapshot will only ever need to communicate back whether
>>>>>> unquiesce needs to happen.  Maybe some result object would be more
>>>>>> extensible.
>>>>>>
>>>>>> Actually, I think I have more comments.  This seems a bit odd to
me.
>>>>>> Why would a storage driver in ACS implement a VM snapshot
>>>>>> functionality?  VM snapshot is a really a hypervisor orchestrated
>>>>>> operation.  So it seems like were trying to implement a poor mans
VM
>>>>>> snapshot.  Maybe if I understood what NetApp was trying to do it
would
>>>>>> make more sense, but its all odd.  To do a proper VM snapshot you
need
>>>>>> to snapshot memory and disk at the exact same time.  How are we going
>>>>>> to do that if ACS is orchestrating the VM snapshot and delegating
to
>>>>>> storage providers.  Its not like you are going to pause the VM....
or
>>>>>> are you?
>>>>>>
>>>>>> Darren
>>>>>>
>>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <Edison.su@citrix.com>
wrote:
>>>>>>> I created a design document page at https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+snapshot+related+operations,
feel free to add items on it.
>>>>>>> And a new branch "pluggable_vm_snapshot" is created.
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
>>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
>>>>>>>> To: <dev@cloudstack.apache.org>
>>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>>>>>>>>
>>>>>>>> I'm a fan of option 2 - this gives us the most flexibility
(as you stated). The
>>>>>>>> option is given to completely override the way VM snapshots
work AND
>>>>>>>> storage providers are given to opportunity to work within
the default VM
>>>>>>>> snapshot workflow.
>>>>>>>>
>>>>>>>> I believe this option should satisfy your concern, Mike.
The snapshot and
>>>>>>>> quiesce strategy would be in charge of communicating with
the hypervisor.
>>>>>>>> Storage providers should be able to leverage the default
strategies and
>>>>>>>> simply perform the storage operations.
>>>>>>>>
>>>>>>>> I don't think it should be much of an issue that new method
to the storage
>>>>>>>> driver interface may not apply to everyone. In fact, that
is already the case.
>>>>>>>> Some methods such as un/maintain(), attachToXXX() and takeSnapshot()
are
>>>>>>>> already not implemented by every driver - they just return
false when asked
>>>>>>>> if they can handle the operation.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Chris Suich
>>>>>>>> chris.suich@netapp.com
>>>>>>>> NetApp Software Engineer
>>>>>>>> Data Center Platforms - Cloud Solutions
>>>>>>>> Citrix, Cisco & Red Hat
>>>>>>>>
>>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski <mike.tutkowski@solidfire.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Well, my first thought on this is that the storage driver
should not
>>>>>>>>> be telling the hypervisor to do anything. It should be
responsible for
>>>>>>>>> creating/deleting volumes, snapshots, etc. on its storage
system only.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <Edison.su@citrix.com>
wrote:
>>>>>>>>>
>>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver.
The current
>>>>>>>>>> workflow will be like the following:
>>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl:
creatVMSnapshot ->
>>>>>>>>>> send CreateVMSnapshotCommand to hypervisor to create
vm snapshot.
>>>>>>>>>>
>>>>>>>>>> If anybody wants to change the workflow, then need
to either change
>>>>>>>>>> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl.
>>>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl
should be
>>>>>>>>>> able to handle different ways to take vm snapshot,
instead of hard code.
>>>>>>>>>>
>>>>>>>>>> The requirements for the pluggable VM snapshot coming
from:
>>>>>>>>>> Storage vendor may have their optimization, such
as NetApp.
>>>>>>>>>> VM snapshot can be implemented in a totally different
way(For
>>>>>>>>>> example, I could just send a command to guest VM,
to tell my
>>>>>>>>>> application to flush disk and hold disk write, then
come to hypervisor to
>>>>>>>> take a volume snapshot).
>>>>>>>>>>
>>>>>>>>>> If we agree on enable pluggable VM snapshot, then
we can move on
>>>>>>>>>> discuss how to implement it.
>>>>>>>>>>
>>>>>>>>>> The possible options:
>>>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy
interface,
>>>>>>>>>> which has the following interfaces:
>>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>>> Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>>> Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>>>
>>>>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>>>>> VMSnapshotManagerImpl:
>>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>>>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the
sanity check,
>>>>>>>>>> then will handle over to VMSnapshotStrategy.
>>>>>>>>>> In VMSnapshotStrategy implementation, it may just
send a
>>>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor
host, or do
>>>>>>>>>> anything special operations.
>>>>>>>>>>
>>>>>>>>>> 2. fine-grained interface. Not only add a VMSnapshotStrategy
>>>>>>>>>> interface, but also add certain methods on the storage
driver.
>>>>>>>>>> The VMSnapshotStrategy interface will be the same
as option 1.
>>>>>>>>>> Will add the following methods on storage driver:
>>>>>>>>>> /* volumesBelongToVM  is the list of volumes of the
VM that created
>>>>>>>>>> on this storage, storage vendor can either take one
snapshot for this
>>>>>>>>>> volumes in one shot, or take snapshot for each volume
separately
>>>>>>>>>>    The pre-condition: vm is unquiesced.
>>>>>>>>>>    It will return a Boolean to indicate, do need
unquiesce vm or not.
>>>>>>>>>>    In the default storage driver, it will return
false.
>>>>>>>>>> */
>>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>>>> VMSnapshot vmSnapshot);
>>>>>>>>>> Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>>>> VMSnapshot vmSnapshot);
>>>>>>>>>> Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>>>> VMSnapshot vmSNapshot);
>>>>>>>>>>
>>>>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>>>>> VMSnapshotManagerImpl:
>>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
-> storage
>>>>>>>>>> driver:takeVMSnapshot In the implementation of VMSnapshotStrategy's
>>>>>>>>>> takeVMSnapshot, the pseudo code looks like:
>>>>>>>>>>    HypervisorHelper.quiesceVM(vm);
>>>>>>>>>>    val volumes = vm.getVolumes();
>>>>>>>>>>    val maps = new Map[driver, list[VolumeInfo]]();
>>>>>>>>>>    Volumes.foreach(volume => maps.put(volume.getDriver,
volume ::
>>>>>>>>>> maps.get(volume.getdriver())))
>>>>>>>>>>    val needUnquiesce = true;
>>>>>>>>>>     maps.foreach((driver, volumes) => needUnquiesce
 =
>>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
>>>>>>>>>>   if (needUnquiesce ) {
>>>>>>>>>>    HypervisorHelper.unquiesce(vm);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> By default, the quiesceVM in HypervisorHelper will
actually take vm
>>>>>>>>>> snapshot through hypervisor.
>>>>>>>>>> Does above logic makes senesce?
>>>>>>>>>>
>>>>>>>>>> The pros of option 1 is that: it's simple, no need
to change storage
>>>>>>>>>> driver interfaces. The cons is that each storage
vendor need to
>>>>>>>>>> implement a strategy, maybe they will do the same
thing.
>>>>>>>>>> The pros of option 2 is that, storage driver won't
need to worry
>>>>>>>>>> about how to quiesce/unquiesce vm. The cons is that,
it will add
>>>>>>>>>> these methods on each storage drivers, so it assumes
that this work
>>>>>>>>>> flow will work for everybody.
>>>>>>>>>>
>>>>>>>>>> So which option we should take? Or if you have other
options, please
>>>>>>>>>> let's know.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Mike Tutkowski*
>>>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>>>>>>> e: mike.tutkowski@solidfire.com
>>>>>>>>> o: 303.746.7302
>>>>>>>>> Advancing the way the world uses the
>>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>>>>>>> *(tm)*
>>>>>>>
>>>>>
>>>
>

Mime
View raw message