cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Funs Kessen <FKes...@schubergphilis.com>
Subject RE: [DISCUSS] OOM killer and Routing/System VM's = :(
Date Fri, 06 Sep 2013 09:35:53 GMT
Hi Alex and Chiradeep,

@Alex: Yes it would work, but also means that everybody would have to implement this on a
machine that runs syslog, and that it is not part of CloudStack, while I think it would be
wonderful to have the SystemVM, as being an entity within CloudStack, combined with CloudStack
to be self-sustaining, and not depend on an external scripts that do API calls. For the short
term, yes it might be a viable solution, but in the long term it would feel kind of hack-ish
?

@Chiradeep: I agree, it was also not acceptable for some of the guys on a linux kernel irc
channel, and they had fair points, although I do believe people should have the option to
choose. They pointed me towards kcrash, like I mentioned before. Yesterday I've tested kcrash
and it works. It  means that a bit of the memory will be used to load a crash kernel and an
"adapted" init that does a poweroff at the moment the crash kernel is loaded, it also means
we can save the core and analyze why it crashed before doing a power off if required. The
watchdog functionality is something I found too, but I didn't feel comfortable with it somehow,
I'll have a deeper look at it to see if it does the trick, so thanks for bringing it up!

Cheers,

Funs


-----Original Message-----
From: Alex Huang [mailto:Alex.Huang@citrix.com] 
Sent: vrijdag 6 september 2013 2:05
To: dev@cloudstack.apache.org; Marcus Sorensen
Cc: Roeland Kuipers; int-cloud
Subject: RE: [DISCUSS] OOM killer and Routing/System VM's = :(

If I recall correctly, oom actually prints something into syslog so a cron job that watches
syslog and simply just shuts down the vm should work.

--Alex

> -----Original Message-----
> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
> Sent: Thursday, September 5, 2013 12:48 PM
> To: dev@cloudstack.apache.org; Marcus Sorensen
> Cc: Roeland Kuipers; int-cloud
> Subject: Re: [DISCUSS] OOM killer and Routing/System VM's = :(
> 
> Maintaining a custom kernel is a big hassle, even if it is a few lines 
> of code change.
> Can we do something in userspace? What about the software watchdog 
> that is available?
> Along the lines of: http://goo.gl/oO3Lzr 
> http://linux.die.net/man/8/watchdog
> 
> 
> On 9/5/13 7:13 AM, "Funs Kessen" <FKessen@schubergphilis.com> wrote:
> 
> >
> >> Well, you can't as far as I've looked in the source of panic.c. So 
> >>I'm thinking of  investigating of adding -1 as an option and seeing 
> >>if I can push halt in, let's hope  the guys that do kernel stuff 
> >>find this useful too.....
> >>
> >So it seems the patch, I conjured up for panic.c,  is seen as not so 
> >useful, there is however another way to achieve the same result. This 
> >would mean that we load a crash kernel with our own .sh script as 
> >init to do our bidding.
> >
> >Would that be a plan ?
> >
> >Cheers,
> >
> >Funs
> >
> >Sent from my iPhone
> >
> >On 4 sep. 2013, at 23:35, "Marcus Sorensen" <shadowsor@gmail.com>
> wrote:
> >
> >> What would work as a quick fix for this sort of situation would be 
> >> if the machine could be configured to power off rather than 
> >> rebooting on oom. Then the HA system would restart the VM, applying all configs.
> >>
> >> Anyone know how to do that? :-)
> >>
> >> On Wed, Sep 4, 2013 at 1:14 PM, Darren Shepherd 
> >> <darren.s.shepherd@gmail.com> wrote:
> >>> On 09/04/2013 11:37 AM, Roeland Kuipers wrote:
> >>>>
> >>>> Hi Darren,
> >>>>
> >>>> Thanks for your reply! Could you share a bit more on your plans/ideas?
> >>>>
> >>>> We also have been braining on other approaches of managing the 
> >>>> systemvm's, especially small customizations for specific tenants.
> >>>> And maybe even leveraging a config mgmt tools like chef or puppet 
> >>>> with the ability to integrate CS with that in some way.
> >>>
> >>> I'll have to send the full details later but here's a rough idea.
> >>> The basic approach is this.  Logical changes to the VRs (or system 
> >>>vms in general) get mapped to configuration items.  So add a LB 
> >>>rule maps to iptables config and haproxy config.  When you change a 
> >>>LB rule we then bump up the requested version of the configuration 
> >>>for iptables/haproxy.  So the requested version will be 4 maybe.  
> >>>The applied version will be 3 as the VR still has the old configuration.
> >>> Since 4 != 3, the VR will be signaled to pull the latest 
> >>>iptables/haproxy config.  So it will pull the configuration.  Say 
> >>>in the mean time somebody else adds four other LB rules.  So the 
> >>>requested version is now at 8.  So when the VR pulls the config it 
> >>>will get version 8, and then reply back saying it applied version 8.
> >>> The applied version is now 8 which is greater than 4 (the version 
> >>>the  first LB rule change was waiting
> >>> for) so basically all async jobs waiting for the LB change will be 
> >>>done.
> >>>
> >>> To pull the configuration from the VR, the VR will be hitting a 
> >>>templating configuration system.  So it pulls the full iptables and 
> >>>haproxy config.
> >>> Not incremental changes.
> >>>
> >>> So if the VR ever reboots itself, it can easily just pull the 
> >>> latest config of everything and apply it.  So it will be consistent.
> >>>
> >>> I'd be interested to hear what type of customizations you would 
> >>>like to add.
> >>> It will definitely be an extensible system, but the problem is if 
> >>>your extensions wants to touch the same configuration files that 
> >>>ACS wants to manage.  That gets a bit tricky as its really easy for 
> >>>each to break each other.  But I can definitely add some hooks that 
> >>>users can use to mess up things and "void the warranty."
> >>>
> >>> I've thought about chef and puppet for this, but basically it 
> >>>comes down to two things.  I'm really interested in this being fast 
> >>>and light weight.
> >>> Ruby is neither of those.  So the core ACS stuff will probably 
> >>>remain  as very simple shell scripts.  Simple in that they really 
> >>>just need  to download configuration and restart services.  They 
> >>>know nothing  about the nature of the changes.  If, as an 
> >>>extension, you want to do  something with puppet, chef, I'd be open 
> >>>to that.  That's your
> deal.
> >>>
> >>> This approach has many other benefits.  Like, for example, we can 
> >>> ensure that as we deploy a new ACS release existing system VMs can 
> >>> be updated (without a reboot, unless the kernel changes).
> >>> Additionally, its fast and updates happen in near constant time.  
> >>> So most changes will be just a couple of seconds, even if you have 
> >>> 4000 LB
> rules.
> >>>
> >>> Darren
> >>>


Mime
View raw message