cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rafael Weingärtner <rafaelweingart...@gmail.com>
Subject Re: [ANNOUNCE] Open source distributed virtual machine scheduling platform
Date Tue, 03 May 2016 22:08:42 GMT
You are welcome Ilya,
We are very glad that there are people interested in testing/using it.

We have indeed seen the Rohit PRs that introduces some changes that will
enable ACS to activate and deactivate servers. I also think that using IPMI
is a very interesting approach. Sadly, we needed something like this last
year; that is why we created it, we even thought about opening a PR to
donate the code to ACS, but we did not have much time to do that.

Also, as I said in the previous email, we just developed what we did
because we needed; and we tried to use the less amount of energy possible
to do that. We have bigger problems to focus on, and we are not a huge team
right now.

Of course that as soon as ACS has those changes incorporated we will change
our code to use the ACS’s functions.




On Tue, May 3, 2016 at 5:30 PM, ilya <ilya.mailing.lists@gmail.com> wrote:

> Rafael and Gabriel,
>
> Firstly, thanks for working on this initiative.
>
> We also realized current cloudstack allocation algorithms are rather
> limited and AutonomicCS is very timely.
>
> The project looks very promising and its something i'd like to try out
> in my environments - as it gains production level stability and i have
> internal CI lab with few hundred nested KVM hypervisors - to test on.
>
>
> Many of us in the community put alot of effort into getting IPMI specs
> and support with CloudStack. We will be merging IPMI support in our
> environment shortly.
>
> In addition, like you mentioned earlier, WOL and OS level shutdowns will
> work most of the time, but aren't ideal when you have enterprise grade
> hardware with IPMI support (which is being defacto even with whitebox
> servers).
>
> CloudStack IPMI feature Rohit worked on - is very extensive, to the
> point that you can switch the IPMI driver to use WOL or Shutdown
> commands and abstract the operations with shell scripts entirely (Rohit
> please keep me honest).
>
> With that said, please kindly consider integrating with IPMI interface
> Rohit mentioned - or make the WOL/POWER OFF pluggable.
>
> Thanks,
> ilya
>
> On 5/3/16 1:00 PM, Rafael Weingärtner wrote:
> > Hi Rohit, thanks ;)
> >
> > I will answer your questions in line.
> >
> >
> > I did not look at the code but I'm curious on how you're powering off
> > hosts, I think with my out-of-band management PR you can use the oobm
> > subsystem to perform power management operations for IPMI 2.0 enabled
> hosts.
> >
> > A: when we developed the first version (around October 2015), Apache
> > CloudStack (ACS) did not have support to activate and deactivate hosts,
> it
> > still does not have; you are working on that for Shapeblue, right? If
> there
> > was something at that time, it would have been great. Therefore, we had
> to
> > develop something to allow us to power on/off hosts (that was not our
> > focus, but we needed it). So, we created the simplest solution possible
> > (just to suffice our needs). Our cloud computing environment is created
> > using pretty outdated servers, half of them do not have support for IPMI.
> > Therefore, to shut down hosts, we use the hypervisors API. We noticed
> that
> > most of the hypervisors have a shutdown command in their APIs; that is
> why
> > we used it. We could not use many resources (time and energy) on
> developing
> > that for every hypervisor ACS supports, so we did it only for XenServer
> to
> > be used as a proof of concept (POC); to add the support to other
> > hypervisors it would be a matter of implementing an interface.
> >
> > Even though we did the “shutdown“ using the hypervisor API, it would be
> > nice to have it also through the IPMI interface; it is rare, but we have
> > seen servers hung during the shutdown process.
> >
> > Then, to activate (start) servers, we used the wake on LAN (WOL)
> protocol.
> > We found that to be the easiest way to activate servers on a LAN (there
> are
> > some requirements to do that, giving that it uses the layer 2 of the OSI
> > model to send the commands). However, once again, our environment did not
> > help much. One of our servers did not support WOL, but gladly it had IPMI
> > support. Therefore, to start servers depending on a flag that we add to
> the
> > “cloud.host” table we use IPMI or WOL.
> >
> >
> > Did the explanation help? You are welcome to look at the code, we think
> it
> > is more or less clear and documented.
> >
> > Also curious how you implemented the heuristics and wrote tests (esp.
> > integration ones), some of us had a related discussion about such a
> feature
> > and we looked at this paper from VMware DRS team:
> > http://www.waldspurger.org/carl/papers/drs-vmtj-mar12.pdf
> >
> > A: well, the heuristics are written in Java; we have an interface with a
> > set of methods that have to be implemented and that can be used by our
> > agents; also, we have a set of basic classes to support the development
> of
> > new heuristics. We have created only two simple heuristics to be used as
> a
> > proof of concept of the whole architecture we have created. Our first
> goal
> > was to formalize and finish the whole architecture; after that, we could
> > work on some more interesting things. Right now we are working on
> > techniques to mix (add) neural or Bayesian networks into our heuristics;
> we
> > intend to use those techniques to improve our VM mapping algorithms or
> the
> > ranking of hosts.
> >
> > We have not read the VMware’s paper (we have based our whole proposals
> > solely on academic work until now); I have just glanced at it, and it
> seems
> > interesting; though I would need much more time and a deeper reading to
> be
> > able to comment on it.
> >
> > The testing is done in a test environment we have, we isolate and control
> > the variables of the environment and everything that can affect the
> agents
> > behaviors; then, we start to test every functionalities and the agent
> > behavior. The process of testing for the first release was very manual.
> > However, now that we know the whole framework works. We are covering it
> > with test cases (unit and integration) and then to test a heuristic it
> > would be a matter of writing test cases for it.
> >
> > Even with test cases, every experiment we do or release that is closed,
> we
> > execute a thorough batch of tests to check if everything is working;
> sadly,
> > those tests today manually executed.
> >
> > I can say that the fun is going to start now. I find it much more
> > interesting to create methods/heuristics to manage the environment than
> to
> > create the structure that uses the heuristics.
> >
> > Do you have some other doubts?
> >
> > On Tue, May 3, 2016 at 12:18 PM, Rohit Yadav <rohit.yadav@shapeblue.com>
> > wrote:
> >
> >> Nice feature :)
> >>
> >> I did not look at the code but I'm curious on how you're powering off
> >> hosts, I think with my out-of-band management PR you can use the oobm
> >> subsystem to perform power management operations for IPMI 2.0 enabled
> hosts.
> >>
> >> Also curious how you implemented the heuristics and wrote tests (esp.
> >> integration ones), some of us had a related discussion about such a
> feature
> >> and we looked at this paper from VMware DRS team:
> >> http://www.waldspurger.org/carl/papers/drs-vmtj-mar12.pdf
> >>
> >> Regards,
> >> Rohit Yadav
> >>
> >>
> >> Regards,
> >>
> >> Rohit Yadav
> >>
> >> rohit.yadav@shapeblue.com
> >> www.shapeblue.com
> >> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> >> @shapeblue
> >> On Apr 27 2016, at 2:29 am, Gabriel Beims Bräscher <
> gabrascher@gmail.com>
> >> wrote:
> >>
> >> Hello CloudStack community members (@dev and @users),
> >>
> >> This email is meant to announce the publication of a project on Github
> that
> >> provides a distributed virtual machine scheduling platform that can be
> >> easily integrated with Apache CloudStack (ACS). The project is
> available at
> >> [1], you can find a detailed explanation of the idea of the project, its
> >> aspirations, basic concepts, installation and uninstallation processes
> and
> >> other information at [2]. Also, if you want to know more about the
> >> Autonomiccs and its creators, you can access the link [3].
> >>
> >> The code that was opened at Github is part of a bigger system that has
> the
> >> goal of managing a cloud computing environment autonomously. All of
> that is
> >> being developed and used in my Ph. D. thesis and the masters’ thesis of
> >> some colleagues. The formalization of that component will be published
> at
> >> the 12th IEEE World Congress on Services (SERVICES 2016) at San
> Francisco
> >> USA.
> >>
> >> You can see the stats of our code at [4] and [5]. Right now we only have
> >> ~40% of code test coverage. However, we intend to increase that value to
> >> ~60% until next week and ~90% until the end of June.
> >>
> >> To give you a picture of what we are preparing for the future, we can
> >> highlight the following goals for this year (You can find others short
> term
> >> goals at [6]):
> >>
> >>    -
> >>
> >>    Integrate our platform [1] with a multi-agent system (MAS) platform,
> in
> >>    order to facilitate the development of agents. Currently, we are
> using
> >>    Spring-integration to “emulate” and an agent life cycle; that can
> >> become a
> >>    problem when needing to add more agents and they start to communicate
> >> with
> >>    each other. Therefore, we will integrate the platform in [1] with
> JADE
> >> [7];
> >>    -
> >>
> >>    Today the metrics about the use of resource are not properly
> gathered by
> >>    ACS; in order to develop more accurate predictions we need to store
> >>    resource usage metrics. Also, those metrics have to be gathered in a
> >>    distributed way without causing service degradation. For that and a
> few
> >>    other reasons (you can send us an email so we can provide you more
> >>    details), we are developing an autonomic monitoring platform that
> will
> >>    integrate with the system available in [1];
> >>    -
> >>
> >>    We also foresee the need to develop a better way to visualize the
> cloud
> >>    environment, a way to detect hot spots (pods and hosts) with higher
> >>    resource usage trends (VMs trends). We see the need to change the
> rustic
> >>    view of the environment with tables for a better suitable one for
> humans
> >>    (this is a surprise that we intend to present at the CCCBR).
> >>
> >> We hope you like the software and that it meets your expectations. If it
> >> does not suffice all of your needs, let’s work together to improve it.
> If
> >> you have any doubts or suggestions please send us an email; we will
> reply
> >> it as fast as we can. Also, critics that can help us improve that
> platform
> >> are very welcome.
> >>
> >> [1] https://github.com/Autonomiccs/autonomiccs-platform
> >>
> >> [2] https://github.com/Autonomiccs/autonomiccs-platform/wiki
> >>
> >> [3] http://autonomiccs.com.br/
> >>
> >> [4] http://jenkins.autonomiccs.com.br/
> >>
> >> [5] http://sonar.autonomiccs.com.br/
> >>
> >> [6]
> https://github.com/Autonomiccs/autonomiccs-platform#project-evolution
> >>
> >> [7] http://jade.tilab.com/
> >>
> >> Cheers, Gabriel.
> >>
> >
> >
> >
>



-- 
Rafael Weingärtner

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message