cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darren Shepherd <darren.s.sheph...@gmail.com>
Subject Re: [DISCUSS] Pluggable VM snapshot related operations?
Date Thu, 10 Oct 2013 04:12:56 GMT
Edison,

I would lean toward doing the coarse grain interface only.  I'm having
a hard time seeing how the whole flow is generic and makes sense for
everyone.  With starting with the coarse grain you have the advantage
in that you avoid possible upfront over engineering/over design that
could wreak havoc down the line.  If you implement the
VMSnapshotStrategy and find that it really is useful to other
implementations, you can then implement the fine grain interface later
to allow others to benefit from it.

Darren

On Wed, Oct 9, 2013 at 8:54 PM, Mike Tutkowski
<mike.tutkowski@solidfire.com> wrote:
> Hey guys,
>
> I haven't been giving this thread much attention, but am reviewing it
> somewhat now.
>
> I'm not really clear how this would work if, say, a VM has two data disks
> and they are not being provided by the same vendor.
>
> Can someone clarify that for me?
>
> My understanding for how this works today is that it doesn't matter. For
> XenServer, a VDI is on an SR, which could be supported by storage vendor X.
> Another VDI could be on another SR, supported by storage vendor Y.
>
> In this case, a new VDI appears on each SR after a hypervisor snapshot.
>
> Same idea for VMware.
>
> I don't really know how (or if) this works for KVM.
>
> I'm not clear how this multi-vendor situation would play out in this
> pluggable approach.
>
> Thanks!
>
>
> On Tue, Oct 8, 2013 at 4:43 PM, Edison Su <Edison.su@citrix.com> wrote:
>
>>
>>
>> > -----Original Message-----
>> > From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com]
>> > Sent: Tuesday, October 08, 2013 2:54 PM
>> > To: dev@cloudstack.apache.org
>> > Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>> >
>> > A hypervisor snapshot will snapshot memory also.  So determining whether
>> The memory is optional for hypervisor vm snapshot, a.k.a, the "Disk-only
>> snapshots":
>> http://support.citrix.com/proddocs/topic/xencenter-61/xs-xc-vms-snapshots-about.html
>> It's supported by both xenserver/kvm/vmware.
>>
>> > do to the hypervisor snapshot from the quiesce option does not seem
>> > proper.
>> >
>> > Sorry, for all the questions, I'm trying to get to the point of
>> understand if this
>> > functionality makes sense at this point of code or if maybe their is a
>> different
>> > approach.  This is what I'm seeing, what if we state it this way
>> >
>> > 1) VM snapshot, AFAIK, are not backed up today and exist solely on
>> primary.
>> > What if we added a backup phase to VM snapshots that can be optionally
>> > supported by the storage providers to possibly backup the VM snapshot
>> > volumes.
>> It's not about backup vm snapshot, it's about how to take vm snapshot.
>> Usually, take/revert vm snapshot is handled by hypervisor itself, but in
>> NetApp(or other storage vendor) case,
>> They want to change the default behavior of hypervisor-base vm snapshot.
>>
>> Some examples:
>> 1. take hypervisor based vm snapshots, on primary storage, hypervisor will
>> maintain the snapshot chain.
>> 2. take vm snapshot through NetApp:
>>      a. first, quiesce VM if user specified. There is no separate API to
>> quiesce VM on the hypervisor, so here we will
>> take a VM snapshot through hypervisor API call, hypervisor will take
>> volume snapshot  on each volume of the VM. Let's say, on the primary
>> storage, the disk chain looks like:
>>            base-image
>>                     |
>>                     V
>>                 Parent disk
>>             /                         \
>>           V                            V
>>         Current disk        snapshot-a
>>      b. from snapshot-a, find out its parent disk, then take snapshot
>> through NetApp
>>      c. un- quiesce VM, here, go to hypervisor, delete snapshot
>> "snapshot-a", hypervisor should be able to consolidate current disk and
>> "parent disk" into one disk, thus from hypervisor point of view
>> , there is always, at most, only one snapshot for the VM.
>>     For revert VM snapshot, as long as the VM is stopped, NetApp can
>> revert the snapshot created on NetApp storage easily, and efficiently.
>>    The benefit of this whole process, as Chris pointed out, if the
>> snapshot chain is quite long, hypervisor based VM snapshot will get
>> performance hit.
>>
>> >
>> > 2) Additionally you want to be able to backup multiple disks at once,
>> > regardless of VM snapshot.  Why don't we add the ability to put
>> volumeIds in
>> > snapshot cmd that if the storage provider supports it will get a batch of
>> > volumeIds.
>> >
>> > Now I know we talked about 2 and there was some concerns about it (mostly
>> > from me), but I think we could work through those concerns (forgot what
>> > they were...).  Right now I just get the feeling we are shoehorning some
>> > functionality into VM snapshot that isn't quite the right fit.  The "no
>> quiesce"
>> > flow just doesn't seem to make sense to me.
>>
>>
>> Not sure above NetApp proposed work flow makes sense to you or to other
>> body or not. If this work flow is only specific to NetApp, then we don't
>> need to enforce the whole process for everybody.
>>
>> >
>> > Darren
>> >
>> > On Tue, Oct 8, 2013 at 2:05 PM, SuichII, Christopher
>> > <Chris.Suich@netapp.com> wrote:
>> > > Whether the hypervisor snapshot happens depends on whether the
>> > 'quiesce' option is specified with the snapshot request. If a user
>> doesn't care
>> > about the consistency of their backup, then the hypervisor
>> snapshot/quiesce
>> > step can be skipped altogether. This of course is not the case if the
>> default
>> > provider is being used, in which case a hypervisor snapshot is the only
>> way of
>> > creating a backup since it can't be offloaded to the storage driver.
>> > >
>> > > --
>> > > Chris Suich
>> > > chris.suich@netapp.com
>> > > NetApp Software Engineer
>> > > Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>> > >
>> > > On Oct 8, 2013, at 4:57 PM, Darren Shepherd
>> > > <darren.s.shepherd@gmail.com>
>> > >  wrote:
>> > >
>> > >> Who is going to decide whether the hypervisor snapshot should
>> > >> actually happen or not? Or how?
>> > >>
>> > >> Darren
>> > >>
>> > >> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
>> > >> <Chris.Suich@netapp.com> wrote:
>> > >>>
>> > >>> --
>> > >>> Chris Suich
>> > >>> chris.suich@netapp.com
>> > >>> NetApp Software Engineer
>> > >>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red
Hat
>> > >>>
>> > >>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd
>> > <darren.s.shepherd@gmail.com> wrote:
>> > >>>
>> > >>>> So in the implementation, when we say "quiesce" is that actually
>> > >>>> being implemented as a VM snapshot (memory and disk).  And
then
>> > >>>> when you say "unquiesce" you are talking about deleting the
VM
>> > snapshot?
>> > >>>
>> > >>> If the VM snapshot is not going to the hypervisor, then yes, it
will
>> > actually be a hypervisor snapshot. Just to be clear, the unquiesce is
>> not quite
>> > a delete - it is a collapse of the VM snapshot and the active VM back
>> into one
>> > file.
>> > >>>
>> > >>>>
>> > >>>> In NetApp, what are you snapshotting?  The whole netapp volume
(I
>> > >>>> don't know the correct term), a file on NFS, an iscsi volume?
 I
>> > >>>> don't know a whole heck of a lot about the netapp snapshot
>> > capabilities.
>> > >>>
>> > >>> Essentially we are using internal APIs to create file level backups
>> - don't
>> > worry too much about the terminology.
>> > >>>
>> > >>>>
>> > >>>> I know storage solutions can snapshot better and faster than
>> > >>>> hypervisors can with COW files.  I've personally just been
always
>> > >>>> perplexed on whats the best way to implement it.  For storage
>> > >>>> solutions that are block based, its really easy to have the
storage
>> > >>>> doing the snapshot.  For shared file systems, like NFS, its
seems
>> > >>>> way more complicated as you don't want to snapshot the entire
>> > >>>> filesystem in order to snapshot one file.
>> > >>>
>> > >>> With filesystems like NFS, things are certainly more complicated,
>> but that
>> > is taken care of by our controller's operating system, Data ONTAP, and we
>> > simply use APIs to communicate with it.
>> > >>>
>> > >>>>
>> > >>>> Darren
>> > >>>>
>> > >>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
>> > >>>> <Chris.Suich@netapp.com> wrote:
>> > >>>>> I can comment on the second half.
>> > >>>>>
>> > >>>>> Through storage operations, storage providers can create
backups
>> > much faster than hypervisors and over time, their snapshots are more
>> > efficient than the snapshot chains that hypervisors create. It is true
>> that a VM
>> > snapshot taken at the storage level is slightly different as it would be
>> psuedo-
>> > quiesced, not have it's memory snapshotted. This is accomplished through
>> > hypervisor snapshots:
>> > >>>>>
>> > >>>>> 1) VM snapshot request (lets say VM 'A'
>> > >>>>> 2) Create hypervisor snapshot (optional) -VM 'A' is snapshotted,
>> > >>>>> creating active VM 'A*'
>> > >>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot
of 'A*'
>> > >>>>> 3) Storage driver(s) take snapshots of each volume
>> > >>>>> 4) Undo hypervisor snapshot (optional) -VM snapshot 'A'
is rolled
>> > >>>>> back into VM 'A*' so the hypervisor snapshot no longer
exists
>> > >>>>>
>> > >>>>> Now, a couple notes:
>> > >>>>> -The reason this is optional is that not all users necessarily
>> care about
>> > the memory or disk consistency of their VMs and would prefer faster
>> > snapshots to consistency.
>> > >>>>> -Preemptively, yes, we are actually taking hypervisor snapshots
>> which
>> > means there isn't actually a performance of taking storage snapshots when
>> > quiescing the VM. However, the performance gain will come both during
>> > restoring the VM and during normal operations as described above.
>> > >>>>>
>> > >>>>> Although you can think of it as a poor man's VM snapshot,
I would
>> > think of it more as a consistent multi-volume snapshot. Again, the
>> difference
>> > being that this snapshot was not truly quiesced like a hypervisor
>> snapshot
>> > would be.
>> > >>>>>
>> > >>>>> --
>> > >>>>> Chris Suich
>> > >>>>> chris.suich@netapp.com
>> > >>>>> NetApp Software Engineer
>> > >>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco &
Red Hat
>> > >>>>>
>> > >>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd
>> > <darren.s.shepherd@gmail.com> wrote:
>> > >>>>>
>> > >>>>>> My only comment is that having the return type as boolean
and
>> > >>>>>> using to that indicate quiesce behaviour seems obscure
and will
>> > >>>>>> probably lead to a problem later.  Your basically saying
the
>> > >>>>>> result of the takeVMSnapshot will only ever need to
communicate
>> > >>>>>> back whether unquiesce needs to happen.  Maybe some
result
>> > object
>> > >>>>>> would be more extensible.
>> > >>>>>>
>> > >>>>>> Actually, I think I have more comments.  This seems
a bit odd to
>> me.
>> > >>>>>> Why would a storage driver in ACS implement a VM snapshot
>> > >>>>>> functionality?  VM snapshot is a really a hypervisor
orchestrated
>> > >>>>>> operation.  So it seems like were trying to implement
a poor mans
>> > >>>>>> VM snapshot.  Maybe if I understood what NetApp was
trying to do
>> > >>>>>> it would make more sense, but its all odd.  To do a
proper VM
>> > >>>>>> snapshot you need to snapshot memory and disk at the
exact same
>> > >>>>>> time.  How are we going to do that if ACS is orchestrating
the VM
>> > >>>>>> snapshot and delegating to storage providers.  Its
not like you
>> > >>>>>> are going to pause the VM.... or are you?
>> > >>>>>>
>> > >>>>>> Darren
>> > >>>>>>
>> > >>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <Edison.su@citrix.com>
>> > wrote:
>> > >>>>>>> I created a design document page at
>> > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+s
>> > napshot+related+operations, feel free to add items on it.
>> > >>>>>>> And a new branch "pluggable_vm_snapshot" is created.
>> > >>>>>>>
>> > >>>>>>>> -----Original Message-----
>> > >>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
>> > >>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
>> > >>>>>>>> To: <dev@cloudstack.apache.org>
>> > >>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot
related operations?
>> > >>>>>>>>
>> > >>>>>>>> I'm a fan of option 2 - this gives us the most
flexibility (as
>> > >>>>>>>> you stated). The option is given to completely
override the way
>> > >>>>>>>> VM snapshots work AND storage providers are
given to
>> > >>>>>>>> opportunity to work within the default VM snapshot
workflow.
>> > >>>>>>>>
>> > >>>>>>>> I believe this option should satisfy your concern,
Mike. The
>> > >>>>>>>> snapshot and quiesce strategy would be in charge
of
>> > communicating with the hypervisor.
>> > >>>>>>>> Storage providers should be able to leverage
the default
>> > >>>>>>>> strategies and simply perform the storage operations.
>> > >>>>>>>>
>> > >>>>>>>> I don't think it should be much of an issue
that new method to
>> > >>>>>>>> the storage driver interface may not apply
to everyone. In fact,
>> > that is already the case.
>> > >>>>>>>> Some methods such as un/maintain(), attachToXXX()
and
>> > >>>>>>>> takeSnapshot() are already not implemented
by every driver -
>> > >>>>>>>> they just return false when asked if they can
handle the
>> operation.
>> > >>>>>>>>
>> > >>>>>>>> --
>> > >>>>>>>> Chris Suich
>> > >>>>>>>> chris.suich@netapp.com
>> > >>>>>>>> NetApp Software Engineer
>> > >>>>>>>> Data Center Platforms - Cloud Solutions Citrix,
Cisco & Red Hat
>> > >>>>>>>>
>> > >>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski
>> > >>>>>>>> <mike.tutkowski@solidfire.com>
>> > >>>>>>>> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> Well, my first thought on this is that
the storage driver
>> > >>>>>>>>> should not be telling the hypervisor to
do anything. It should
>> > >>>>>>>>> be responsible for creating/deleting volumes,
snapshots, etc.
>> on
>> > its storage system only.
>> > >>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison
Su <
>> Edison.su@citrix.com>
>> > wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver.
The
>> > >>>>>>>>>> current workflow will be like the following:
>> > >>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl:
>> > >>>>>>>>>> creatVMSnapshot -> send CreateVMSnapshotCommand
to
>> > hypervisor to create vm snapshot.
>> > >>>>>>>>>>
>> > >>>>>>>>>> If anybody wants to change the workflow,
then need to either
>> > >>>>>>>>>> change VMSnapshotManagerImpl directly
or subclass
>> > VMSnapshotManagerImpl.
>> > >>>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl
>> > >>>>>>>>>> should be able to handle different
ways to take vm snapshot,
>> > instead of hard code.
>> > >>>>>>>>>>
>> > >>>>>>>>>> The requirements for the pluggable
VM snapshot coming from:
>> > >>>>>>>>>> Storage vendor may have their optimization,
such as NetApp.
>> > >>>>>>>>>> VM snapshot can be implemented in a
totally different way(For
>> > >>>>>>>>>> example, I could just send a command
to guest VM, to tell my
>> > >>>>>>>>>> application to flush disk and hold
disk write, then come to
>> > >>>>>>>>>> hypervisor to
>> > >>>>>>>> take a volume snapshot).
>> > >>>>>>>>>>
>> > >>>>>>>>>> If we agree on enable pluggable VM
snapshot, then we can
>> > move
>> > >>>>>>>>>> on discuss how to implement it.
>> > >>>>>>>>>>
>> > >>>>>>>>>> The possible options:
>> > >>>>>>>>>> 1. coarse grained interface. Add a
VMSnapshotStrategy
>> > >>>>>>>>>> interface, which has the following
interfaces:
>> > >>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot
vmSnapshot);
>> > Boolean
>> > >>>>>>>>>> revertVMSnapshot(VMSnapshot vmSnapshot);
Boolean
>> > >>>>>>>>>> DeleteVMSnapshot(VMSnapshot vmSnapshot);
>> > >>>>>>>>>>
>> > >>>>>>>>>> The work flow will be: createVMSnapshot
api ->
>> > >>>>>>>> VMSnapshotManagerImpl:
>> > >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy:
takeVMSnapshot
>> > >>>>>>>>>> VMSnapshotManagerImpl will manage VM
state, do the sanity
>> > >>>>>>>>>> check, then will handle over to VMSnapshotStrategy.
>> > >>>>>>>>>> In VMSnapshotStrategy implementation,
it may just send a
>> > >>>>>>>>>> Create/revert/delete VMSnapshotCommand
to hypervisor
>> > host, or
>> > >>>>>>>>>> do anything special operations.
>> > >>>>>>>>>>
>> > >>>>>>>>>> 2. fine-grained interface. Not only
add a VMSnapshotStrategy
>> > >>>>>>>>>> interface, but also add certain methods
on the storage driver.
>> > >>>>>>>>>> The VMSnapshotStrategy interface will
be the same as option 1.
>> > >>>>>>>>>> Will add the following methods on storage
driver:
>> > >>>>>>>>>> /* volumesBelongToVM  is the list of
volumes of the VM that
>> > >>>>>>>>>> created on this storage, storage vendor
can either take one
>> > >>>>>>>>>> snapshot for this volumes in one shot,
or take snapshot for
>> > each volume separately
>> > >>>>>>>>>>    The pre-condition: vm is unquiesced.
>> > >>>>>>>>>>    It will return a Boolean to indicate,
do need unquiesce vm
>> or
>> > not.
>> > >>>>>>>>>>    In the default storage driver, it
will return false.
>> > >>>>>>>>>> */
>> > >>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo>
>> > volumesBelongToVM,
>> > >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
>> > >>>>>>>>>> revertVMSnapshot(List<VolumeInfo>
volumesBelongToVM,
>> > >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
>> > >>>>>>>>>> deleteVMSnapshot(List<VolumeInfo>
volumesBelongToVM,
>> > >>>>>>>>>> VMSnapshot vmSNapshot);
>> > >>>>>>>>>>
>> > >>>>>>>>>> The work flow will be: createVMSnapshot
api ->
>> > >>>>>>>> VMSnapshotManagerImpl:
>> > >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy:
takeVMSnapshot ->
>> > >>>>>>>>>> storage driver:takeVMSnapshot In the
implementation of
>> > >>>>>>>>>> VMSnapshotStrategy's takeVMSnapshot,
the pseudo code
>> > looks like:
>> > >>>>>>>>>>    HypervisorHelper.quiesceVM(vm);
>> > >>>>>>>>>>    val volumes = vm.getVolumes();
>> > >>>>>>>>>>    val maps = new Map[driver, list[VolumeInfo]]();
>> > >>>>>>>>>>    Volumes.foreach(volume => maps.put(volume.getDriver,
>> > volume ::
>> > >>>>>>>>>> maps.get(volume.getdriver())))
>> > >>>>>>>>>>    val needUnquiesce = true;
>> > >>>>>>>>>>     maps.foreach((driver, volumes)
=> needUnquiesce  =
>> > >>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
>> > >>>>>>>>>>   if (needUnquiesce ) {
>> > >>>>>>>>>>    HypervisorHelper.unquiesce(vm);
}
>> > >>>>>>>>>>
>> > >>>>>>>>>> By default, the quiesceVM in HypervisorHelper
will actually
>> > >>>>>>>>>> take vm snapshot through hypervisor.
>> > >>>>>>>>>> Does above logic makes senesce?
>> > >>>>>>>>>>
>> > >>>>>>>>>> The pros of option 1 is that: it's
simple, no need to change
>> > >>>>>>>>>> storage driver interfaces. The cons
is that each storage
>> > >>>>>>>>>> vendor need to implement a strategy,
maybe they will do the
>> > same thing.
>> > >>>>>>>>>> The pros of option 2 is that, storage
driver won't need to
>> > >>>>>>>>>> worry about how to quiesce/unquiesce
vm. The cons is that, it
>> > >>>>>>>>>> will add these methods on each storage
drivers, so it assumes
>> > >>>>>>>>>> that this work flow will work for everybody.
>> > >>>>>>>>>>
>> > >>>>>>>>>> So which option we should take? Or
if you have other options,
>> > >>>>>>>>>> please let's know.
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>> --
>> > >>>>>>>>> *Mike Tutkowski*
>> > >>>>>>>>> *Senior CloudStack Developer, SolidFire
Inc.*
>> > >>>>>>>>> e: mike.tutkowski@solidfire.com
>> > >>>>>>>>> o: 303.746.7302
>> > >>>>>>>>> Advancing the way the world uses the
>> > >>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>> > >>>>>>>>> *(tm)*
>> > >>>>>>>
>> > >>>>>
>> > >>>
>> > >
>>
>
>
>
> --
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *™*

Mime
View raw message