Return-Path: X-Original-To: apmail-cloudstack-issues-archive@www.apache.org Delivered-To: apmail-cloudstack-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AF64D10D26 for ; Thu, 25 Jul 2013 23:35:49 +0000 (UTC) Received: (qmail 26366 invoked by uid 500); 25 Jul 2013 23:35:49 -0000 Delivered-To: apmail-cloudstack-issues-archive@cloudstack.apache.org Received: (qmail 26323 invoked by uid 500); 25 Jul 2013 23:35:49 -0000 Mailing-List: contact issues-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list issues@cloudstack.apache.org Received: (qmail 26259 invoked by uid 500); 25 Jul 2013 23:35:49 -0000 Delivered-To: apmail-incubator-cloudstack-issues@incubator.apache.org Received: (qmail 26236 invoked by uid 99); 25 Jul 2013 23:35:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jul 2013 23:35:49 +0000 Date: Thu, 25 Jul 2013 23:35:49 +0000 (UTC) From: "ASF subversion and git services (JIRA)" To: cloudstack-issues@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CLOUDSTACK-3163) KVM Virtual Router startup time is painfully long MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CLOUDSTACK-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720239#comment-13720239 ] ASF subversion and git services commented on CLOUDSTACK-3163: ------------------------------------------------------------- Commit 17a675942cbd1f86a3441ec8299517f660656694 in branch refs/heads/master from [~yasker] [ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=17a6759 ] Bring back vm_data.sh which deleted by a KVM related commit The following commit removed vm_data.sh, but the file shared by Xen as well. Bring the file back. commit 28855b4987c9274d15a539b9d7ae26c0073b0651 Author: Marcus Sorensen Date: Wed Jul 24 13:58:17 2013 -0600 Summary: Get away from dozens of ssh/scp calls for KVM vm_data push Detail: userdata and vm metadata take a long time to program on KVM routers. This does it all in one go, processed on the router. BUG-ID: CLOUDSTACK-3163 Tested-by: Wido Signed-off-by: Marcus Sorensen 1374695897 -0600 > KVM Virtual Router startup time is painfully long > ------------------------------------------------- > > Key: CLOUDSTACK-3163 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3163 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the default.) > Components: KVM > Affects Versions: pre-4.0.0 > Environment: CloudPlatform 3.0.3, but I don't see any changes to the relevant code (I think) on master > Reporter: Andrew Bayer > Assignee: Marcus Sorensen > Priority: Critical > Fix For: 4.2.0 > > > When you've got a couple thousand instances, spread across 10 or so pods, virtual router startup time is near crippling - actually, if you don't enable the option to have virtual routers only populated with instances in their pod, it *is* crippling, in that the virtual routers don't finish starting before the management server decides they've timed out and tries to start a new one. > This seems to be the result of a few painful inefficiencies: > - The same codepath is followed whether you're adding a new instance to an already running VR, or adding two hundred already running instances to a new VR. So each ssh/scp/sed/cp/chmod/etc command is replicated for each instance, rather than finding efficiencies by doing things across the whole set of instances. > - But what really eats up the time is the population of vm data - for each piece of vm data (which, from a rough look at the code, seems to be something like 10 or 11 data files), there are something like 7 ssh calls and an scp call. So that means that per instance, we have somewhere around 80 to 90 ssh/scp calls, plus the single ssh call for dhcp_entry.sh. So with 200 instances, that's 1600 to 1800 ssh/scp calls on a single VR, with all the overhead entailed in opening that many ssh connections, starting bash, etc, etc... Given that in my experience, a VR with ~200 instances takes ~90 minutes to start up (I may be misremembering slightly - it could be ~200 instances takes closer to 60 minutes, and ~300 takes closer to 90), that works out to 3 seconds or so per ssh/scp, which doesn't seem implausible to me. > So, this shouldn't be this way. At a minimum, there's no reason not to offload the whole process from a script run on the host making repeated ssh calls to the VR to a script on the VR that gets called from the host, albeit possibly a temporary one that's generated on the fly and copied over to the VR. That alone would probably save most of the VR startup time, just by dropping the number of ssh/scp connections per instance from 80-90 to 3 (dhcp_entry.sh call, scp of temporary script, execution of temporary script). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira