Return-Path: X-Original-To: apmail-incubator-cloudstack-users-archive@minotaur.apache.org Delivered-To: apmail-incubator-cloudstack-users-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E9543EB97 for ; Tue, 4 Dec 2012 08:17:19 +0000 (UTC) Received: (qmail 92407 invoked by uid 500); 4 Dec 2012 08:17:18 -0000 Delivered-To: apmail-incubator-cloudstack-users-archive@incubator.apache.org Received: (qmail 92236 invoked by uid 500); 4 Dec 2012 08:17:18 -0000 Mailing-List: contact cloudstack-users-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cloudstack-users@incubator.apache.org Delivered-To: mailing list cloudstack-users@incubator.apache.org Received: (qmail 92131 invoked by uid 99); 4 Dec 2012 08:17:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Dec 2012 08:17:17 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Nitin.Mehta@citrix.com designates 203.166.19.134 as permitted sender) Received: from [203.166.19.134] (HELO SMTP.CITRIX.COM.AU) (203.166.19.134) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Dec 2012 08:17:12 +0000 X-IronPort-AV: E=Sophos;i="4.84,213,1355097600"; d="scan'208,217";a="13710345" Received: from banpmailmx02.citrite.net ([10.103.128.74]) by SYDPIPO01.CITRIX.COM.AU with ESMTP/TLS/RC4-MD5; 04 Dec 2012 08:16:45 +0000 Received: from BANPMAILBOX01.citrite.net ([10.103.128.72]) by BANPMAILMX02.citrite.net ([10.103.128.74]) with mapi; Tue, 4 Dec 2012 13:46:44 +0530 From: Nitin Mehta To: "cloudstack-users@incubator.apache.org" CC: Cloudstack Developers Date: Tue, 4 Dec 2012 13:46:43 +0530 Subject: Re: XenServer & VM Snapshots Thread-Topic: XenServer & VM Snapshots Thread-Index: Ac3R97IvsqvOGo++TfqclyJuNPhChw== Message-ID: <8CF3017A-152F-40C3-BEC8-2BE1A8FA5107@citrix.com> References: <02db01cdd178$c4a8f420$4dfadc60$@tls.net> <02fe01cdd184$659e64e0$30db2ea0$@tls.net> <030501cdd189$67938be0$36baa3a0$@tls.net> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_8CF3017A152F40C3BEC82BE1A8FA5107citrixcom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_8CF3017A152F40C3BEC82BE1A8FA5107citrixcom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable On 04-Dec-2012, at 12:44 PM, Marc Cirauqui wrote: If I may, we've detected very poor performance executing snapshots. We think it's due to XenServer's API, I don't know how and why, but the API is very slow and runs one task at a time (if it is doing paralelization it's almost nothing). Do you know if there's a way to improve IO rates on XS side? thx. Marc - I think there has been some work from the CS side as well to better = the performance a bit. I think following work has been done. You can tweak = the parameter concurrent.snapshots.threshold.perhost a bit to achieve some= better performance based on the workload. More info @ https://cwiki.apache.org/confluence/display/CLOUDSTACK/Snapshot= +improvements+FS On Mon, Dec 3, 2012 at 8:07 PM, Matthew Hartmann > wrote: Thank you Anthony! :) Cheers, Matthew Matthew Hartmann Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net TLS.NET, Inc. http://www.tls.net -----Original Message----- From: Anthony Xu [mailto:Xuefei.Xu@citrix.com] Sent: Monday, December 03, 2012 1:59 PM To: 'Cloudstack Developers'; cloudstack-users@incubator.apache.org Subject: RE: XenServer & VM Snapshots CS 3.0.2 is too old version. I'm pretty sure mount & copy on the same host in 3.0.4 and 3.0.5. If mount & copy might be on different hosts, the issue is very likely to happen. I didn't hear this issue from QA and users. I just checked vmopsSnapshot plug-in for XenServer, at /etc/xapi.d/plugins, Which mounts secondary storage just before sparse-dd. I recommend you to upgrade new version. If you still see the issue, Please post related management server log and /var/log/SMlog in XenServer. Anthony -----Original Message----- From: Matthew Hartmann [mailto:mhartmann@tls.net] Sent: Monday, December 03, 2012 10:31 AM To: cloudstack-users@incubator.apache.org Cc: 'Cloudstack Developers' Subject: RE: XenServer & VM Snapshots Anthony: Thank you for the prompt and informative reply. I'm pretty sure mount and copy are using the same XenServe host. The behavior I have witnessed with CS 3.0.2 is that it doesn't always do the mount & copy on the same host. Out of the 12 tests I've performed, only once was the mount & copy performed on the same host that the VM was running on. I think the issue is the backup takes a long time because the data volume is big and network rate is low. You can increase "BackupSnapshotWait" in global configuration table to let the backup operation finish. I increased this in global settings from the default of 9 hours to 16 hours. The snapshot still doesn't complete on time; it on average copies about ~460G before it times out. I'm pretty confident the network rate isn't the bottle neck as ISOs and imported VHDs install quickly. We have the Secondary Storage server set as the only internal site allowed to host files. I upload my ISO or VHD to Secondary Storage server and install using SSVM which completes in a very timely manner. With a 1Gb network link, 1TB should copy in roughly 2 hours (if the link is saturated by the copy process); I've only found snapshotting (template creation appears to work flawlessly) to take an insanely long time to complete. Is there anything else I can do to increase performance or logs I should check? Cheers, Matthew Matthew Hartmann Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net TLS.NET, Inc. http://www.tls.net -----Original Message----- From: Anthony Xu [mailto:Xuefei.Xu@citrix.com] Sent: Monday, December 03, 2012 12:31 PM To: Cloudstack Users Cc: Cloudstack Developers Subject: RE: XenServer & VM Snapshots Hi Matthew, You analysis is correct except following, I must mention that the same Compute Node that ran sparse_dd or mounted Secondary Storage is not always the same. It appears the Management Server is simply round-robining through the list of >Compute Nodes and using the first one that is available. I'm pretty sure mount and copy are using the same XenServe host. I think the issue is the backup takes a long time because the data volume is big and network rate is low. You can increase "BackupSnapshotWait" in global configuration table to let the backup operation finish. Since CS takes the advantage of XenServer image format VHD, it uses VHD to do snapshot and clone, it requires snapshot to be backed up through XenServer host. The ideal solution for this issue might be leverage storage snapshot and clone functionality, Then snapshot back up is executed by storage host, relieve some of the limitation. Currently CS doesn't support this, it is not hard to support this after Edison finishes storage frame change, it should be just another storage plug-in. When CS uses storage server snapshot and clone function, CS needs to consider number of snapshot , number of volume limitation of storage server. Anthony From: Matthew Hartmann [mailto:mhartmann@tls.net] Sent: Monday, December 03, 2012 9:08 AM To: Cloudstack Users Cc: Cloudstack Developers Subject: XenServer & VM Snapshots Hello! I'm hoping someone can help me troubleshoot the following issue: I have a client who has a 960G data volume which contains their VM's Exchange Data Store. When starting a snapshot, I found that a process is started on one of my Compute Nodes titled "sparse_dd". I found that this process is then sending the output of "sparse_dd" through another Compute Node's xapi before placing it into the "snapshot store" on Secondary Storage. It appears that this is part of the bottle neck as all of our systems are connected via gigabit link and should not take 15+ hours to create a snapshot. The following is the behavior that I have analyzed from within my environment: 1) Snapshot is started (either via Manual or Scheduled). 2) Compute Node 1 "processes the snapshot" by exposing the VDI which "sparse_dd" then creates a "thin provisioned" snapshot. 3) The output of sparse_dd is delivered over HTTP to xapi on Compute Node 2 where the Management Server mounted Secondary Storage. 4) Compute Node 2 (receiving the snapshot via xapi) stores the snapshot in the Secondary Storage mount point. Based on the behavior, I have devise the following logic that I believe CloudStack is utilizing: 1) CloudStack creates a "snapshot VDI" via XenServer Pool Master's API. 2) CloudStack finds a Compute Node that can mount Secondary Storage. 3) CloudStack finds a Compute Node that can run "sparse_dd". 4) CloudStack uses available Compute node to output the VDI to xapi on the Compute Node that mounted Secondary Storage. I must mention that the same Compute Node that ran sparse_dd or mounted Secondary Storage is not always the same. It appears the Management Server is simply round-robining through the list of Compute Nodes and using the first one that is available. Does anyone have any input on the issue I'm having or analysis of how CloudStack/XenServer snapshots operate? Thanks! Cheers, Matthew Matthew Hartmann Systems Administrator | V: 812.378.4100 x 850 | E: mhartmann@tls.net [cid:image017.jpg@01CDD14E.DBAA2E70] [cid:image018.jpg@01CDD14E.DBAA2E70] [cid:image019.jpg@01CDD14E.DBAA2E70] [cid:image020.jpg@01CDD14E.DBAA2E70] [cid:image021.jpg@01CDD14E.DBAA2E70] [cid:image020.jpg@01CDD14E.DBAA2E70] [cid:image022.jpg@01CDD14E.DBAA2E70] [cid:image020.jpg@01CDD14E.DBAA2E70] [cid:image023.jpg@01CDD14E.DBAA2E70] --_000_8CF3017A152F40C3BEC82BE1A8FA5107citrixcom_--