incubator-cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Mehta <Nitin.Me...@citrix.com>
Subject Re: [DISCUSS] Scaling up CPU and RAM for running VMs
Date Tue, 22 Jan 2013 13:54:22 GMT
Chiradeep - thanks for your questions. Please find my answers inline. I
might not have the best solutions for some of them so looking for some
guidance as well.
All these interactions make for good test cases so the QA for this feature
should test these interactions.


On 22/01/13 12:18 PM, "Chiradeep Vittal" <Chiradeep.Vittal@citrix.com>
wrote:

>What usage events will be generated to ensure that the proper billing is
>done?

Currently usage implementation won't be able to handle this. I will
introduce a 
new event to handle it for dynamically as well as statically(for Hvs that
do not support this)

>We need more details about the actual classes being affected and the
>layers at which the orchestration is being done.

I will put that information in the FS. It will be based on the flowchart
in the FS.

>What are the possible interactions that you are taking care of?
> - user powers off the vm during the operation

Need to explore how the HV is handling it. The HV should be handling it
gracefully else its a bug on the HV.
If the HV handles it cleanly then CS can accordingly surface it and take
corrective actions like updating the DB etc.

> - race conditions during calculation of sufficient capacity

It will be implemented parallel to how we do it in allocators. As soon as
we find a suitable destination we lock the capacity and release it in case
of 
failures. All this is through state machine transitions so will be done
cleanly.

> - failure during live migration

As you see in the flowchart in the FS I won't be touching the live
migration but leveraging on the current implementation.
I am assuming that we already handle it gracefully. But will definitely
test it.

> - HA event during live migration / upgrade

I am assuming that we already handle the HA event during live migrations
since this scenario is available in current implementation.
Will try and leverage on the same for HA event during vm upgrade.

> - scheduled snapshot of volumes during the operation

For vmware, the entire vm is locked by HV and this can be an issue. I will
leverage on current implementations for existing interactions like
scheduled snapshots events during live migration and will replicate the
same.
  

> - attach / detach during the operation

Same thing as the above.

> - hypervisor fails the upgrade.

The HV should be handling it gracefully else its a bug on the HV.
If the HV handles it cleanly then CS can accordingly surface it and take
corrective actions like updating the DB



>
>The idea is to not handle every possible scenario, but to ensure that the
>vm (and system) is in a sane recoverable state after the unexpected
>interaction.
> 
>
>On 1/21/13 10:01 PM, "Koushik Das" <koushik.das@citrix.com> wrote:
>
>>See inline for 1.
>>
>>-----Original Message-----
>>From: Hari Kannan [mailto:hari.kannan@citrix.com]
>>Sent: Tuesday, January 22, 2013 10:51 AM
>>To: cloudstack-dev@incubator.apache.org
>>Subject: RE: [DISCUSS] Scaling up CPU and RAM for running VMs
>>
>>Hello Nitin, Koushik,
>>
>>I'm following up on this feature - is the FS located here still
>>accurate/up to date?
>>
>>I also wish to get clarification on a couple of things:
>>
>>1.      There is a reference - open issue 1: "Ability to mark the VM for
>>scale up at creation time " - what is the intent behind this capability?
>>Why cant every VM be capable of scaling? Also, given the capability of
>>scaling up is actually a property of {OS, Hypervisor} what would be the
>>intent behind having this as a property of a service offering? How was
>>this "closed"?
>>
>>[Koushik] For all HVs the ability to dynamically increase RAM/CPU needs
>>to be explicitly enabled. This may mean that for some/all HVs there may
>>be some overhead in terms of performance/capacity planning etc. (came
>>across the following for Vmware
>>http://www.yellow-bricks.com/2012/01/16/enabling-hot-add-by-default-cc-ga
>>b
>>virtualworld/). As a starting point I would like to have it enabled by
>>default for all VMs. But later it may be required to attach some premium
>>with this kind of offering.
>>
>>2.      We also know that XS and KVM support this for Linux (max needs to
>>be pre-defined) - so, I assume we are supporting both these platforms, in
>>addition to VMware?
>>3.      In case there is no capacity in cluster to scale up, just making
>>sure that the existing VM will not have any impact, right?
>>
>>Hari
>>
>>-----Original Message-----
>>From: Marcus Sorensen [mailto:shadowsor@gmail.com]
>>Sent: Thursday, December 20, 2012 9:47 AM
>>To: cloudstack-dev@incubator.apache.org
>>Subject: Re: [DISCUSS] Scaling up CPU and RAM for running VMs
>>
>>Oh, if it's not already obvious, we're onboard for collaborating on this
>>feature and can help implement the KVM hypervisor portions. :-)
>>
>>
>>On Thu, Dec 20, 2012 at 8:44 AM, Marcus Sorensen
>><shadowsor@gmail.com>wrote:
>>
>>>
>>>
>>>
>>> On Thu, Dec 20, 2012 at 4:52 AM, Koushik Das
>>><koushik.das@citrix.com>wrote:
>>>
>>>> See inline
>>>>
>>>> Thanks,
>>>> Koushik
>>>>
>>>> > -----Original Message-----
>>>> > From: Chip Childers [mailto:chip.childers@sungard.com]
>>>> > Sent: Wednesday, December 19, 2012 7:55 PM
>>>> > To: cloudstack-dev@incubator.apache.org
>>>> > Subject: Re: [DISCUSS] Scaling up CPU and RAM for running VMs
>>>> >
>>>> > On Wed, Dec 19, 2012 at 3:34 AM, Koushik Das
>>>> > <koushik.das@citrix.com>
>>>> > wrote:
>>>> > > See inline
>>>> > >
>>>> > >> -----Original Message-----
>>>> > >> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
>>>> > >> Sent: Tuesday, December 18, 2012 10:35 PM
>>>> > >> To: cloudstack-dev@incubator.apache.org
>>>> > >> Subject: Re: [DISCUSS] Scaling up CPU and RAM for running VMs
>>>> > >>
>>>> > >> The FS looks good and addresses the things I'd want it to
>>>> > >> (scaling should be limited to within cluster, use offerings).
>>>> > >>
>>>> > >> As you mention, there's a real problem surrounding no support
>>>> > >> for scaling down CPU, and it's just as much a problem with
the
>>>> > >> guests as it is with hvms at the moment, it seems. This makes
it
>>>> > >> hard to just set a VM as a dynamic one, since at some point
>>>> > >> you'll likely trigger it to scale up and have to reboot to
get
>>>> > >> back down. My suggestion if this goes through however is that
>>>> > >> instead of marking a vm for auto scale, we can either attach
>>>> > >> multiple compute offerings (with a priority or "level") to
a VM,
>>>> > >> along with triggers (we can't really
>>>> trigger on
>>>> > memory, but perhaps cpu utilization over a specific time, e.g.
>>>> > >> if cpu is at 80% for x time, fall back to the next offering),
or
>>>> > >> we can create a specific single compute offering that allows
you
>>>> > >> to specify a min and max memory, cpu, and a trigger at which
it
>>>> > >> scales (this latter one is my preference).
>>>> > >>
>>>> > >> The whole thing is problematic though, because people can
>>>> > >> inadvertently trigger their VM to scale up when they're
>>>> > >> installing updates or compiling or something and then have
to
>>>> > >> reboot to come back down. If we can't take away resources
>>>> > >> without manual intervention, we shouldn't add them. For this
>>>> > >> reason I'd like to see the focus (at least initially) on simply
>>>> > >> being able to change to larger compute offerings while the
VM is
>>>> > >> up. With this in place, if someone really wants to autoscale,
>>>> > >> they can use the api in a combination of fetching the VM stats
>>>> > >> and the existing changeServiceForVirtualMachine. Or we can
put
>>>> > >> that in, but I think
>>>> any
>>>> > implementation will be a poor experience without being able to go
>>>> > both ways.
>>>> > >>
>>>> > >
>>>> > > This is a good suggestion but as you have mentioned first
>>>> > > priority is
>>>> to have
>>>> > the basic stuff working (increasing CPU/RAM for running VMs).
>>>> > > Also another thing is that HVs (at least Vmware) require that a
>>>> > > VM is
>>>> > configured appropriately when it is stopped in order to support
>>>> increasing
>>>> > CPU/RAM while it is running. We can either do this for all VMs
>>>> irrespective of
>>>> > the fact whether the CPU/RAM is going to be actually increased OR
>>>> > do it
>>>> only
>>>> > for selective VMs (maybe based on compute offering). If this is
>>>> > going
>>>> to be
>>>> > common across all HVs the latter can be done.
>>>>
>>>
>>> I think it could be done either way. The straightforward way is via
>>> offering that allows for max/current CPU and max/current RAM to be
>>> entered (basically exposing how the hypervisor settings themselves
>>> work). But you could also do a global setting of some sort that says
>>> 'set everything to a max of X CPU and Y RAM', so that every service
>>> offering can be upgraded live. As you mention, it will require at
>>> least a restart of the VMs to apply, so perhaps users could just
>>> switch service offerings anyway. It could be handy to allow people to
>>> upgrade service offering when it was unplanned for, though.
>>>
>>>
>>>> > >
>>>> > >> I don't know, maybe I'm off in left field here, I'd be
>>>> > >> interested in hearing the thoughts of others.
>>>> > >>
>>>> > >> You mention  'upgradeVirtualMachine', which should be mentioned
>>>> > >> on the customer facing API is called
>>>> > >> 'changeServiceForVirtualMachine', just to reduce confusion.
>>>> > >>
>>>> > >
>>>> > > upgradeVirtualMachine is an existing command (see
>>>> > UpgradeVMCmd.java), was planning to reuse it. But yes if the name
>>>> > sounds confusing we can deprecate it and create a new command with
>>>> > the name you have suggested.
>>>> > >
>>>> >
>>>> > Please don't break backward compatibility without the whole list
>>>> discussing
>>>> > the implications on a dedicated thread.  We had previously agreed
>>>> > that
>>>> we
>>>> > were going to maintain API compatibility between 4.0.0-incubating
>>>> > and
>>>> our
>>>> > next feature release.  If we break it, we have to release as
>>>> 5.0.0-incubating
>>>> > instead of 4.1.0-incubating.
>>>>
>>>> In that case will add a new async API changeServiceForVirtualMachine
>>>> (or if anyone else comes up with a better name) which will work for
>>>> both running and stopped VMs. upgradeVirtualMachine would continue to
>>>> exist till
>>>> 5.0.0 happens.
>>>>
>>>
>>> Would this break backward compatibility? If an API call goes from
>>> upgrading VMs only while they're off, and still upgrades VMs only
>>> while they're off, but also upgrades VMs with a newer, specific
>>> service offering type while they're on, does that break backward
>>> compatibility? Or let's say we simply removed the check to make sure
>>> the VM was off, and instead just checked if the VM was started with
>>> the newer compatible settings... would that break backward
>>> compatibility? The call still does what it did before when used as
>>>before (changes service offering while the VM is off).
>>>
>>> Regarding upgradeVirtualMachine, I saw no mention of it in the API
>>> docs, and found that in the code, changeServiceForVirtualMachine was
>>> mapped to UpgradeVMCmd.java, which is why I mentioned the confusion.
>>> 'upgradeVirtualMachine' only exists as an internal method of the
>>> userVmService. See the file "client/tomcatconf/commands.properties.in"
>>>
>>> changeServiceForVirtualMachine=com.cloud.api.commands.UpgradeVMCmd
>>>
>>>
>>>
>>>>
>>>> >
>>>> > >>
>>>> > >> On Tue, Dec 18, 2012 at 9:18 AM, Koushik Das
>>>> > >> <koushik.das@citrix.com
>>>> >
>>>> > >> wrote:
>>>> > >>
>>>> > >> > Created first draft of the FS
>>>> > >> >
>>>> > >>
>>>> > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Dynamic+scal
>>>> > in
>>>> > >> g
>>>> > >> > +of+CPU+and+RAM
>>>> > >> > Also created jira issue
>>>> > >> > https://issues.apache.org/jira/browse/CLOUDSTACK-658
>>>> > >> >
>>>> > >> > Comments? There is an 'open issue' section where I have
>>>> > >> > mentioned some issues that needs to be closed
>>>> > >> >
>>>> > >> > Thanks,
>>>> > >> > Koushik
>>>> > >> >
>>>> > >> > > -----Original Message-----
>>>> > >> > > From: Koushik Das [mailto:koushik.das@citrix.com]
>>>> > >> > > Sent: Saturday, December 15, 2012 11:14 PM
>>>> > >> > > To: cloudstack-dev@incubator.apache.org
>>>> > >> > > Subject: [DISCUSS] Scaling up CPU and RAM for running
VMs
>>>> > >> > >
>>>> > >> > > Currently CS supports changing CPU and RAM for stopped
VM.
>>>> > >> > > This is achieved by changing compute offering of
the VM
>>>> > >> > > (with new CPU and RAM
>>>> > >> > > values) and then starting it. I am planning to extend
the
>>>> > >> > > same for
>>>> > >> > running VM
>>>> > >> > > as well. Initially planning to do it for Vmware where
CPU
>>>> > >> > > and RAM can be dynamically increased. Support of
other HVs
>>>> > >> > > can also be added if they support increasing CPU/RAM.
>>>> > >> > >
>>>> > >> > > Assuming that in the updated compute offering only
CPU and
>>>> > >> > > RAM has changed, the deployment planner can either
select
>>>> > >> > > the same host in which case the values are dynamically
>>>> > >> > > scaled up OR a different one in which
>>>> > >> > case
>>>> > >> > > the operation fails. In future if there is support
for live
>>>> > >> > > migration
>>>> > >> > (provided
>>>> > >> > > HV supports it) then another option in the latter
case could
>>>> > >> > > be to
>>>> > >> > migrate the
>>>> > >> > > VM first and then scale it up.
>>>> > >> > >
>>>> > >> > > I will start working on the FS and share it out sometime
>>>> > >> > > next
>>>> week.
>>>> > >> > >
>>>> > >> > > Comments/suggestions?
>>>> > >> > >
>>>> > >> > > Thanks,
>>>> > >> > > Koushik
>>>> > >> >
>>>> > >
>>>>
>>>
>>>


Mime
View raw message