cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ilya <>
Subject Re: [ANNOUNCE] Open source distributed virtual machine scheduling platform
Date Tue, 03 May 2016 20:30:10 GMT
Rafael and Gabriel,

Firstly, thanks for working on this initiative.

We also realized current cloudstack allocation algorithms are rather
limited and AutonomicCS is very timely.

The project looks very promising and its something i'd like to try out
in my environments - as it gains production level stability and i have
internal CI lab with few hundred nested KVM hypervisors - to test on.

Many of us in the community put alot of effort into getting IPMI specs
and support with CloudStack. We will be merging IPMI support in our
environment shortly.

In addition, like you mentioned earlier, WOL and OS level shutdowns will
work most of the time, but aren't ideal when you have enterprise grade
hardware with IPMI support (which is being defacto even with whitebox

CloudStack IPMI feature Rohit worked on - is very extensive, to the
point that you can switch the IPMI driver to use WOL or Shutdown
commands and abstract the operations with shell scripts entirely (Rohit
please keep me honest).

With that said, please kindly consider integrating with IPMI interface
Rohit mentioned - or make the WOL/POWER OFF pluggable.


On 5/3/16 1:00 PM, Rafael Weingärtner wrote:
> Hi Rohit, thanks ;)
> I will answer your questions in line.
> I did not look at the code but I'm curious on how you're powering off
> hosts, I think with my out-of-band management PR you can use the oobm
> subsystem to perform power management operations for IPMI 2.0 enabled hosts.
> A: when we developed the first version (around October 2015), Apache
> CloudStack (ACS) did not have support to activate and deactivate hosts, it
> still does not have; you are working on that for Shapeblue, right? If there
> was something at that time, it would have been great. Therefore, we had to
> develop something to allow us to power on/off hosts (that was not our
> focus, but we needed it). So, we created the simplest solution possible
> (just to suffice our needs). Our cloud computing environment is created
> using pretty outdated servers, half of them do not have support for IPMI.
> Therefore, to shut down hosts, we use the hypervisors API. We noticed that
> most of the hypervisors have a shutdown command in their APIs; that is why
> we used it. We could not use many resources (time and energy) on developing
> that for every hypervisor ACS supports, so we did it only for XenServer to
> be used as a proof of concept (POC); to add the support to other
> hypervisors it would be a matter of implementing an interface.
> Even though we did the “shutdown“ using the hypervisor API, it would be
> nice to have it also through the IPMI interface; it is rare, but we have
> seen servers hung during the shutdown process.
> Then, to activate (start) servers, we used the wake on LAN (WOL) protocol.
> We found that to be the easiest way to activate servers on a LAN (there are
> some requirements to do that, giving that it uses the layer 2 of the OSI
> model to send the commands). However, once again, our environment did not
> help much. One of our servers did not support WOL, but gladly it had IPMI
> support. Therefore, to start servers depending on a flag that we add to the
> “” table we use IPMI or WOL.
> Did the explanation help? You are welcome to look at the code, we think it
> is more or less clear and documented.
> Also curious how you implemented the heuristics and wrote tests (esp.
> integration ones), some of us had a related discussion about such a feature
> and we looked at this paper from VMware DRS team:
> A: well, the heuristics are written in Java; we have an interface with a
> set of methods that have to be implemented and that can be used by our
> agents; also, we have a set of basic classes to support the development of
> new heuristics. We have created only two simple heuristics to be used as a
> proof of concept of the whole architecture we have created. Our first goal
> was to formalize and finish the whole architecture; after that, we could
> work on some more interesting things. Right now we are working on
> techniques to mix (add) neural or Bayesian networks into our heuristics; we
> intend to use those techniques to improve our VM mapping algorithms or the
> ranking of hosts.
> We have not read the VMware’s paper (we have based our whole proposals
> solely on academic work until now); I have just glanced at it, and it seems
> interesting; though I would need much more time and a deeper reading to be
> able to comment on it.
> The testing is done in a test environment we have, we isolate and control
> the variables of the environment and everything that can affect the agents
> behaviors; then, we start to test every functionalities and the agent
> behavior. The process of testing for the first release was very manual.
> However, now that we know the whole framework works. We are covering it
> with test cases (unit and integration) and then to test a heuristic it
> would be a matter of writing test cases for it.
> Even with test cases, every experiment we do or release that is closed, we
> execute a thorough batch of tests to check if everything is working; sadly,
> those tests today manually executed.
> I can say that the fun is going to start now. I find it much more
> interesting to create methods/heuristics to manage the environment than to
> create the structure that uses the heuristics.
> Do you have some other doubts?
> On Tue, May 3, 2016 at 12:18 PM, Rohit Yadav <>
> wrote:
>> Nice feature :)
>> I did not look at the code but I'm curious on how you're powering off
>> hosts, I think with my out-of-band management PR you can use the oobm
>> subsystem to perform power management operations for IPMI 2.0 enabled hosts.
>> Also curious how you implemented the heuristics and wrote tests (esp.
>> integration ones), some of us had a related discussion about such a feature
>> and we looked at this paper from VMware DRS team:
>> Regards,
>> Rohit Yadav
>> Regards,
>> Rohit Yadav
>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
>> @shapeblue
>> On Apr 27 2016, at 2:29 am, Gabriel Beims Bräscher <>
>> wrote:
>> Hello CloudStack community members (@dev and @users),
>> This email is meant to announce the publication of a project on Github that
>> provides a distributed virtual machine scheduling platform that can be
>> easily integrated with Apache CloudStack (ACS). The project is available at
>> [1], you can find a detailed explanation of the idea of the project, its
>> aspirations, basic concepts, installation and uninstallation processes and
>> other information at [2]. Also, if you want to know more about the
>> Autonomiccs and its creators, you can access the link [3].
>> The code that was opened at Github is part of a bigger system that has the
>> goal of managing a cloud computing environment autonomously. All of that is
>> being developed and used in my Ph. D. thesis and the masters’ thesis of
>> some colleagues. The formalization of that component will be published at
>> the 12th IEEE World Congress on Services (SERVICES 2016) at San Francisco
>> USA.
>> You can see the stats of our code at [4] and [5]. Right now we only have
>> ~40% of code test coverage. However, we intend to increase that value to
>> ~60% until next week and ~90% until the end of June.
>> To give you a picture of what we are preparing for the future, we can
>> highlight the following goals for this year (You can find others short term
>> goals at [6]):
>>    -
>>    Integrate our platform [1] with a multi-agent system (MAS) platform, in
>>    order to facilitate the development of agents. Currently, we are using
>>    Spring-integration to “emulate” and an agent life cycle; that can
>> become a
>>    problem when needing to add more agents and they start to communicate
>> with
>>    each other. Therefore, we will integrate the platform in [1] with JADE
>> [7];
>>    -
>>    Today the metrics about the use of resource are not properly gathered by
>>    ACS; in order to develop more accurate predictions we need to store
>>    resource usage metrics. Also, those metrics have to be gathered in a
>>    distributed way without causing service degradation. For that and a few
>>    other reasons (you can send us an email so we can provide you more
>>    details), we are developing an autonomic monitoring platform that will
>>    integrate with the system available in [1];
>>    -
>>    We also foresee the need to develop a better way to visualize the cloud
>>    environment, a way to detect hot spots (pods and hosts) with higher
>>    resource usage trends (VMs trends). We see the need to change the rustic
>>    view of the environment with tables for a better suitable one for humans
>>    (this is a surprise that we intend to present at the CCCBR).
>> We hope you like the software and that it meets your expectations. If it
>> does not suffice all of your needs, let’s work together to improve it. If
>> you have any doubts or suggestions please send us an email; we will reply
>> it as fast as we can. Also, critics that can help us improve that platform
>> are very welcome.
>> [1]
>> [2]
>> [3]
>> [4]
>> [5]
>> [6]
>> [7]
>> Cheers, Gabriel.

View raw message