Return-Path: X-Original-To: apmail-incubator-cloudstack-users-archive@minotaur.apache.org Delivered-To: apmail-incubator-cloudstack-users-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BC9B4DEF0 for ; Tue, 4 Dec 2012 07:15:19 +0000 (UTC) Received: (qmail 26994 invoked by uid 500); 4 Dec 2012 07:15:19 -0000 Delivered-To: apmail-incubator-cloudstack-users-archive@incubator.apache.org Received: (qmail 26953 invoked by uid 500); 4 Dec 2012 07:15:18 -0000 Mailing-List: contact cloudstack-users-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cloudstack-users@incubator.apache.org Delivered-To: mailing list cloudstack-users@incubator.apache.org Received: (qmail 26898 invoked by uid 99); 4 Dec 2012 07:15:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Dec 2012 07:15:17 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mcirauqui@gmail.com designates 209.85.214.175 as permitted sender) Received: from [209.85.214.175] (HELO mail-ob0-f175.google.com) (209.85.214.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Dec 2012 07:15:09 +0000 Received: by mail-ob0-f175.google.com with SMTP id vb8so3366028obc.6 for ; Mon, 03 Dec 2012 23:14:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=mZK1eebzEcsheijWYPT6SxTNDRkpz8C7xHF04cc8KtE=; b=T1zoDaShc2qnE8ywgiYTVBbKM2A9X86eixx6P8kcBxpXKgvxVNzRLSMN5/dxP/ev2T uqRAVa7hQ8lteoP0ZIIJAtS0Ygv913Mcnv3Y5McR2Hc+cxuhporhh1MoOzmlnnVxN/LU VtpjzsYFhTc/JVRconAg7KyOSAVb/LxIKytAkVl77c+93hhXqPg3ZM0ntwLd/sM+T27n iz9VN/2bsfnPVbjxU1sFy5JacS2e48OdPBpM+PUdpqCnTuzewSZELG0vlJTVjBDVUrCc TVRQuLRTcbRuLLo1E24mNjsmISoPrE9Nzy8kK7q0z2YAWjOmBB0DwkGWvjq+Y2yRvrRf hQ/g== Received: by 10.182.114.71 with SMTP id je7mr6839983obb.20.1354605288471; Mon, 03 Dec 2012 23:14:48 -0800 (PST) MIME-Version: 1.0 Received: by 10.60.142.228 with HTTP; Mon, 3 Dec 2012 23:14:28 -0800 (PST) In-Reply-To: <030501cdd189$67938be0$36baa3a0$@tls.net> References: <02db01cdd178$c4a8f420$4dfadc60$@tls.net> <02fe01cdd184$659e64e0$30db2ea0$@tls.net> <030501cdd189$67938be0$36baa3a0$@tls.net> From: Marc Cirauqui Date: Tue, 4 Dec 2012 08:14:28 +0100 Message-ID: Subject: Re: XenServer & VM Snapshots To: cloudstack-users Cc: Cloudstack Developers Content-Type: multipart/alternative; boundary=f46d0444ee6d07bcd604d0019fa7 X-Virus-Checked: Checked by ClamAV on apache.org --f46d0444ee6d07bcd604d0019fa7 Content-Type: text/plain; charset=UTF-8 If I may, we've detected very poor performance executing snapshots. We think it's due to XenServer's API, I don't know how and why, but the API is very slow and runs one task at a time (if it is doing paralelization it's almost nothing). Do you know if there's a way to improve IO rates on XS side? thx. On Mon, Dec 3, 2012 at 8:07 PM, Matthew Hartmann wrote: > Thank you Anthony! :) > > Cheers, > > Matthew > > > > Matthew Hartmann > Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net > > TLS.NET, Inc. > http://www.tls.net > > > -----Original Message----- > From: Anthony Xu [mailto:Xuefei.Xu@citrix.com] > Sent: Monday, December 03, 2012 1:59 PM > To: 'Cloudstack Developers'; cloudstack-users@incubator.apache.org > Subject: RE: XenServer & VM Snapshots > > CS 3.0.2 is too old version. > > I'm pretty sure mount & copy on the same host in 3.0.4 and 3.0.5. > If mount & copy might be on different hosts, the issue is very likely to > happen. > I didn't hear this issue from QA and users. > > I just checked vmopsSnapshot plug-in for XenServer, at /etc/xapi.d/plugins, > Which mounts secondary storage just before sparse-dd. > > I recommend you to upgrade new version. > > If you still see the issue, > > Please post related management server log and /var/log/SMlog in XenServer. > > > Anthony > > > > > > > > > > > > > -----Original Message----- > > From: Matthew Hartmann [mailto:mhartmann@tls.net] > > Sent: Monday, December 03, 2012 10:31 AM > > To: cloudstack-users@incubator.apache.org > > Cc: 'Cloudstack Developers' > > Subject: RE: XenServer & VM Snapshots > > > > Anthony: > > > > Thank you for the prompt and informative reply. > > > > > I'm pretty sure mount and copy are using the same XenServe host. > > > > The behavior I have witnessed with CS 3.0.2 is that it doesn't always > > do the > > mount & copy on the same host. Out of the 12 tests I've performed, only > > once > > was the mount & copy performed on the same host that the VM was running > > on. > > > > > I think the issue is the backup takes a long time because the data > > volume > > is big and network rate is low. > > > You can increase "BackupSnapshotWait" in global configuration table > > to let > > the backup operation finish. > > > > I increased this in global settings from the default of 9 hours to 16 > > hours. > > The snapshot still doesn't complete on time; it on average copies about > > ~460G before it times out. I'm pretty confident the network rate isn't > > the > > bottle neck as ISOs and imported VHDs install quickly. We have the > > Secondary > > Storage server set as the only internal site allowed to host files. I > > upload > > my ISO or VHD to Secondary Storage server and install using SSVM which > > completes in a very timely manner. With a 1Gb network link, 1TB should > > copy > > in roughly 2 hours (if the link is saturated by the copy process); I've > > only > > found snapshotting (template creation appears to work flawlessly) to > > take an > > insanely long time to complete. > > > > Is there anything else I can do to increase performance or logs I > > should > > check? > > > > Cheers, > > > > Matthew > > > > > > Matthew Hartmann > > Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net > > > > TLS.NET, Inc. > > http://www.tls.net > > > > > > -----Original Message----- > > From: Anthony Xu [mailto:Xuefei.Xu@citrix.com] > > Sent: Monday, December 03, 2012 12:31 PM > > To: Cloudstack Users > > Cc: Cloudstack Developers > > Subject: RE: XenServer & VM Snapshots > > > > Hi Matthew, > > > > You analysis is correct except following, > > > > >I must mention that the same Compute Node that ran sparse_dd or > > mounted > > Secondary Storage is not always the same. It appears the Management > > Server > > is simply round-robining through the list of >Compute Nodes and using > > the > > first one that is available. > > > > I'm pretty sure mount and copy are using the same XenServe host. > > > > I think the issue is the backup takes a long time because the data > > volume is > > big and network rate is low. > > You can increase "BackupSnapshotWait" in global configuration table to > > let > > the backup operation finish. > > > > > > Since CS takes the advantage of XenServer image format VHD, it uses VHD > > to > > do snapshot and clone, it requires snapshot to be backed up through > > XenServer host. > > The ideal solution for this issue might be leverage storage snapshot > > and > > clone functionality, Then snapshot back up is executed by storage host, > > relieve some of the limitation. > > Currently CS doesn't support this, it is not hard to support this > > after > > Edison finishes storage frame change, it should be just another storage > > plug-in. > > When CS uses storage server snapshot and clone function, CS needs to > > consider number of snapshot , number of volume limitation of storage > > server. > > > > > > Anthony > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Matthew Hartmann [mailto:mhartmann@tls.net] > > Sent: Monday, December 03, 2012 9:08 AM > > To: Cloudstack Users > > Cc: Cloudstack Developers > > Subject: XenServer & VM Snapshots > > > > Hello! I'm hoping someone can help me troubleshoot the following issue: > > > > I have a client who has a 960G data volume which contains their VM's > > Exchange Data Store. When starting a snapshot, I found that a process > > is > > started on one of my Compute Nodes titled "sparse_dd". I found that > > this > > process is then sending the output of "sparse_dd" through another > > Compute > > Node's xapi before placing it into the "snapshot store" on Secondary > > Storage. It appears that this is part of the bottle neck as all of our > > systems are connected via gigabit link and should not take 15+ hours to > > create a snapshot. The following is the behavior that I have analyzed > > from > > within my environment: > > > > > > 1) Snapshot is started (either via Manual or Scheduled). > > > > 2) Compute Node 1 "processes the snapshot" by exposing the VDI > > which > > "sparse_dd" then creates a "thin provisioned" snapshot. > > > > 3) The output of sparse_dd is delivered over HTTP to xapi on > > Compute > > Node 2 where the Management Server mounted Secondary Storage. > > > > 4) Compute Node 2 (receiving the snapshot via xapi) stores the > > snapshot > > in the Secondary Storage mount point. > > > > Based on the behavior, I have devise the following logic that I believe > > CloudStack is utilizing: > > > > > > 1) CloudStack creates a "snapshot VDI" via XenServer Pool Master's > > API. > > > > 2) CloudStack finds a Compute Node that can mount Secondary Storage. > > > > 3) CloudStack finds a Compute Node that can run "sparse_dd". > > > > 4) CloudStack uses available Compute node to output the VDI to xapi > > on > > the Compute Node that mounted Secondary Storage. > > > > I must mention that the same Compute Node that ran sparse_dd or mounted > > Secondary Storage is not always the same. It appears the Management > > Server > > is simply round-robining through the list of Compute Nodes and using > > the > > first one that is available. > > > > Does anyone have any input on the issue I'm having or analysis of how > > CloudStack/XenServer snapshots operate? > > > > Thanks! > > > > Cheers, > > > > Matthew > > > > > > > > Matthew Hartmann > > Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net > > > > [cid:image017.jpg@01CDD14E.DBAA2E70] > ignat > > ure&utm_source=home&utm_medium=email> > > > > [cid:image018.jpg@01CDD14E.DBAA2E70] > > > > > > [cid:image019.jpg@01CDD14E.DBAA2E70] > d/clo > > ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em > > ail> > > > > [cid:image020.jpg@01CDD14E.DBAA2E70] > > > > [cid:image021.jpg@01CDD14E.DBAA2E70] > servi > > ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_ > > mediu > > m=email> > > > > [cid:image020.jpg@01CDD14E.DBAA2E70] > > > > [cid:image022.jpg@01CDD14E.DBAA2E70] > rk_en > > gineering.php?utm_campaign=signature&utm_source=network_engineering&utm > > _medi > > um=email> > > > > [cid:image020.jpg@01CDD14E.DBAA2E70] > > > > [cid:image023.jpg@01CDD14E.DBAA2E70] > ta_ce > > nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema > > il> > > > > > > > > > > > > > > > --f46d0444ee6d07bcd604d0019fa7--