cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus Sorensen <>
Subject Re: [jira] [Commented] (CLOUDSTACK-3163) KVM Virtual Router startup time is painfully long
Date Thu, 18 Jul 2013 21:03:28 GMT
... and each calls ssh and/or scp several times. Off the top
of my head, it seems like we could serialize that cmd.getVmData()
output to maybe JSON or something, get it up on the router in one
call, and then process it there in a python script.

On Thu, Jul 18, 2013 at 7:08 AM, Wido den Hollander (JIRA)
<> wrote:
>     [
> Wido den Hollander commented on CLOUDSTACK-3163:
> ------------------------------------------------
> So, I took a quick peek at how this works and I see it does about 13 calls on my set
up, of which 11 are calling "" with different parameters.
> I think that can be brought back to one call, bringing the total (in my setup) back to
3 instead of 13.
> I'll see if I can find the time to test this out.
>> KVM Virtual Router startup time is painfully long
>> -------------------------------------------------
>>                 Key: CLOUDSTACK-3163
>>                 URL:
>>             Project: CloudStack
>>          Issue Type: Bug
>>      Security Level: Public(Anyone can view this level - this is the default.)
>>          Components: KVM
>>    Affects Versions: pre-4.0.0
>>         Environment: CloudPlatform 3.0.3, but I don't see any changes to the relevant
code (I think) on master
>>            Reporter: Andrew Bayer
>>            Priority: Critical
>> When you've got a couple thousand instances, spread across 10 or so pods, virtual
router startup time is near crippling - actually, if you don't enable the option to have virtual
routers only populated with instances in their pod, it *is* crippling, in that the virtual
routers don't finish starting before the management server decides they've timed out and tries
to start a new one.
>> This seems to be the result of a few painful inefficiencies:
>> - The same codepath is followed whether you're adding a new instance to an already
running VR, or adding two hundred already running instances to a new VR. So each ssh/scp/sed/cp/chmod/etc
command is replicated for each instance, rather than finding efficiencies by doing things
across the whole set of instances.
>> - But what really eats up the time is the population of vm data - for each piece
of vm data (which, from a rough look at the code, seems to be something like 10 or 11 data
files), there are something like 7 ssh calls and an scp call. So that means that per instance,
we have somewhere around 80 to 90 ssh/scp calls, plus the single ssh call for
So with 200 instances, that's 1600 to 1800 ssh/scp calls on a single VR, with all the overhead
entailed in opening that many ssh connections, starting bash, etc, etc... Given that in my
experience, a VR with ~200 instances takes ~90 minutes to start up (I may be misremembering
slightly - it could be ~200 instances takes closer to 60 minutes, and ~300 takes closer to
90), that works out to 3 seconds or so per ssh/scp, which doesn't seem implausible to me.
>> So, this shouldn't be this way. At a minimum, there's no reason not to offload the
whole process from a script run on the host making repeated ssh calls to the VR to a script
on the VR that gets called from the host, albeit possibly a temporary one that's generated
on the fly and copied over to the VR. That alone would probably save most of the VR startup
time, just by dropping the number of ssh/scp connections per instance from 80-90 to 3 (
call, scp of temporary script, execution of temporary script).
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators
> For more information on JIRA, see:

View raw message