cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Xu <Xuefei...@citrix.com>
Subject RE: XenServer & VM Snapshots
Date Mon, 03 Dec 2012 19:44:40 GMT
VM snapshot is a nice feature to have, but I think it has the same issue as volume snapshot
when VM snapshot is backed up to secondary storage.

It has a great room to improve VDI-copy, right now the slowness is not caused by coalesce.
Right now vdi-copy goes through all layers of blk-tap2,
The context switch consume most of the CPU cycle, if vdi-copy can work on VHD chain directly,
it can improve a lot.

If you compare performance of vdi-copy and "vhd-util coalesce" on the same VHD chain, you'll
see how many it can improve, that's a lot.


Anthony

> -----Original Message-----
> From: Mice Xia [mailto:mice_xia@tcloudcomputing.com]
> Sent: Monday, December 03, 2012 11:18 AM
> To: cloudstack-dev@incubator.apache.org
> Subject: 答复: XenServer & VM Snapshots
> 
> 
> Anthony,
> 
> This is one of the reasons that Im working on VM snapshot on PS,
> (instead of volume snapshot)
> 
> I don't think it's easy to improve vdi-copy, considering it needs to
> coalesce incremental snapshots and verify the result.
> 
> mice
> 
> -----Original Message-----
> From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
> Sent: 2012-12-4 (星期二) 3:08
> To: cloudstack-dev@incubator.apache.org
> Subject: RE: XenServer & VM Snapshots
> 
> You are right, Vdi-copy is slow. we have reported this to XenServer
> team, they are working on this, but no time/road map is provided on
> this so far.
> 
> 
> Anthony
> 
> > -----Original Message-----
> > From: Mice Xia [mailto:mice_xia@tcloudcomputing.com]
> > Sent: Monday, December 03, 2012 11:05 AM
> > To: cloudstack-dev@incubator.apache.org
> > Subject: 答复: XenServer & VM Snapshots
> >
> > It is slow to take volume snapshot if your volume is huge, the reason
> > is vdi-copy, which is used to backup snapshot to SS, has performance
> > problem.
> >
> > You can't speed it up much for a full snapshot, perhaps you can try
> > increasing dom0 memory, or, adjust the ratio between full snapshot
> and
> > incremental snapshot to reduce the times of full snapshot.
> >
> > Mice
> >
> >
> > -----Original Message-----
> > From: Matthew Hartmann [mailto:mhartmann@tls.net]
> > Sent: 2012-12-4 (星期二) 2:31
> > To: cloudstack-users@incubator.apache.org
> > Cc: 'Cloudstack Developers'
> > Subject: RE: XenServer & VM Snapshots
> >
> > Anthony:
> >
> > Thank you for the prompt and informative reply.
> >
> > > I'm pretty sure mount and copy are using the same XenServe host.
> >
> > The behavior I have witnessed with CS 3.0.2 is that it doesn't always
> > do the
> > mount & copy on the same host. Out of the 12 tests I've performed,
> only
> > once
> > was the mount & copy performed on the same host that the VM was
> running
> > on.
> >
> > > I think the issue is the backup takes a long time because the data
> > volume
> > is big and network rate is low.
> > > You can increase "BackupSnapshotWait" in global configuration table
> > to let
> > the backup operation finish.
> >
> > I increased this in global settings from the default of 9 hours to 16
> > hours.
> > The snapshot still doesn't complete on time; it on average copies
> about
> > ~460G before it times out. I'm pretty confident the network rate
> isn't
> > the
> > bottle neck as ISOs and imported VHDs install quickly. We have the
> > SecondaryP
> > Storage server set as the only internal site allowed to host files. I
> > upload
> > my ISO or VHD to Secondary Storage server and install using SSVM
> which
> > completes in a very timely manner. With a 1Gb network link, 1TB
> should
> > copy
> > in roughly 2 hours (if the link is saturated by the copy process);
> I've
> > only
> > found snapshotting (template creation appears to work flawlessly) to
> > take an
> > insanely long time to complete.
> >
> > Is there anything else I can do to increase performance or logs I
> > should
> > check?
> >
> > Cheers,
> >
> > Matthew
> >
> >
> > Matthew Hartmann
> > Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> >
> > TLS.NET, Inc.
> > http://www.tls.net
> >
> >
> > -----Original Message-----
> > From: Anthony Xu [mailto:Xuefei.Xu@citrix.com]
> > Sent: Monday, December 03, 2012 12:31 PM
> > To: Cloudstack Users
> > Cc: Cloudstack Developers
> > Subject: RE: XenServer & VM Snapshots
> >
> > Hi Matthew,
> >
> > You analysis is correct except following,
> >
> > >I must mention that the same Compute Node that ran sparse_dd or
> > mounted
> > Secondary Storage is not always the same. It appears the Management
> > Server
> > is simply round-robining through the list of >Compute Nodes and using
> > the
> > first one that is available.
> >
> > I'm pretty sure mount and copy are using the same XenServe host.
> >
> > I think the issue is the backup takes a long time because the data
> > volume is
> > big and network rate is low.
> > You can increase "BackupSnapshotWait" in global configuration table
> to
> > let
> > the backup operation finish.
> >
> >
> > Since CS takes the advantage of XenServer image format VHD, it uses
> VHD
> > to
> > do snapshot and clone, it requires snapshot to be backed up through
> > XenServer host.
> > The ideal solution for this issue might be leverage storage snapshot
> > and
> > clone functionality, Then snapshot back up is executed by storage
> host,
> > relieve some of the limitation.
> > Currently CS doesn't support this,  it is not hard to support this
> > after
> > Edison finishes storage frame change, it should be just another
> storage
> > plug-in.
> > When CS uses storage server snapshot and clone function, CS needs to
> > consider number of snapshot , number of volume limitation of storage
> > server.
> >
> >
> > Anthony
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > From: Matthew Hartmann [mailto:mhartmann@tls.net]
> > Sent: Monday, December 03, 2012 9:08 AM
> > To: Cloudstack Users
> > Cc: Cloudstack Developers
> > Subject: XenServer & VM Snapshots
> >
> > Hello! I'm hoping someone can help me troubleshoot the following
> issue:
> >
> > I have a client who has a 960G data volume which contains their VM's
> > Exchange Data Store. When starting a snapshot, I found that a process
> > is
> > started on one of my Compute Nodes titled "sparse_dd". I found that
> > this
> > process is then sending the output of "sparse_dd" through another
> > Compute
> > Node's xapi before placing it into the "snapshot store" on Secondary
> > Storage. It appears that this is part of the bottle neck as all of
> our
> > systems are connected via gigabit link and should not take 15+ hours
> to
> > create a snapshot. The following is the behavior that I have analyzed
> > from
> > within my environment:
> >
> >
> > 1)     Snapshot is started (either via Manual or Scheduled).
> >
> > 2)     Compute Node 1 "processes the snapshot" by exposing the VDI
> > which
> > "sparse_dd" then creates a "thin provisioned" snapshot.
> >
> > 3)     The output of sparse_dd is delivered over HTTP to xapi on
> > Compute
> > Node 2 where the Management Server mounted Secondary Storage.
> >
> > 4)     Compute Node 2 (receiving the snapshot via xapi) stores the
> > snapshot
> > in the Secondary Storage mount point.
> >
> > Based on the behavior, I have devise the following logic that I
> believe
> > CloudStack is utilizing:
> >
> >
> > 1)     CloudStack creates a "snapshot VDI" via XenServer Pool
> Master's
> > API.
> >
> > 2)     CloudStack finds a Compute Node that can mount Secondary
> Storage.
> >
> > 3)     CloudStack finds a Compute Node that can run "sparse_dd".
> >
> > 4)     CloudStack uses available Compute node to output the VDI to
> xapi
> > on
> > the Compute Node that mounted Secondary Storage.
> >
> > I must mention that the same Compute Node that ran sparse_dd or
> mounted
> > Secondary Storage is not always the same. It appears the Management
> > Server
> > is simply round-robining through the list of Compute Nodes and using
> > the
> > first one that is available.
> >
> > Does anyone have any input on the issue I'm having or analysis of how
> > CloudStack/XenServer snapshots operate?
> >
> > Thanks!
> >
> > Cheers,
> >
> > Matthew
> >
> >
> >
> > Matthew Hartmann
> > Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net
> >
> >
> [cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=s
> > ignat
> > ure&utm_source=home&utm_medium=email>
> >
> > [cid:image018.jpg@01CDD14E.DBAA2E70]
> >
> >
> >
> [cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_clou
> > d/clo
> >
> ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em
> > ail>
> >
> > [cid:image020.jpg@01CDD14E.DBAA2E70]
> >
> >
> [cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_
> > servi
> >
> ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_
> > mediu
> > m=email>
> >
> > [cid:image020.jpg@01CDD14E.DBAA2E70]
> >
> >
> [cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/netwo
> > rk_en
> >
> gineering.php?utm_campaign=signature&utm_source=network_engineering&utm
> > _medi
> > um=email>
> >
> > [cid:image020.jpg@01CDD14E.DBAA2E70]
> >
> >
> [cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/da
> > ta_ce
> >
> nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema
> > il>
> >
> >
> >
> >
> >
> >
> >
> 

Mime
View raw message