cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koushik Das <koushik....@citrix.com>
Subject Re: [PROPOSAL] Service monitoring tool in virtual router
Date Tue, 01 Oct 2013 05:46:45 GMT
This is a very useful feature. Can this be extended to the other system VMs? SSVM and CPVM

Based on the discussion I see that there is an assumption that restarting services/rebooting
should fix the issues. Is that always true? What if the service fails to restart after repeated
attempts? What is the fallback?

-Koushik


On 01-Oct-2013, at 3:15 AM, Chiradeep Vittal <Chiradeep.Vittal@citrix.com> wrote:

> Good idea. If x and y and z are borked, initiate shutdown?
> 
> More generically, it seems we need some form of in-VM automation that can
> co-ordinate with top-level orchestration
> 
> On 9/28/13 4:14 AM, "Daan Hoogland" <daan.hoogland@gmail.com> wrote:
> 
>> Even when always restarting on every glitch we need to monitor the inside
>> of the vr to know when to restart/respin a new vr. There is much
>> functionality present on the vr an for us it is not possible to say for
>> sure what is important to a customer installation so the admin should be
>> able to define the minimal reqs that will stop us from spinning up a new
>> vr. And there must be tools present for monitoring these reqs.
>> 
>> makes sense?
>> 
>> 
>> On Thu, Sep 26, 2013 at 10:01 PM, David Nalley <david@gnsa.us> wrote:
>> 
>>> For what it's worth we created an ACS-specific MIB (beneath the
>>> org.apache MIB) so really this is just a matter of defining and
>>> publishing it.
>>> 
>>> But lets think about monit being used to restart services - with HA,
>>> Redundant VR, are we sure that we want to inject yet another point of
>>> control into things? Is it better to just respawn an instance since
>>> they are essentially stateless? I don't know, but management server,
>>> local daemons, and other SysVMs making decisions seems like we are
>>> increasing complexity.
>>> 
>>> --David
>>> 
>>> On Thu, Sep 26, 2013 at 10:31 AM, Chiradeep Vittal
>>> <Chiradeep.Vittal@citrix.com> wrote:
>>>> In this case you would have to invent another enterprise MIB. Not too
>>>> hard, but I'd argue that it needs to be proxied through some other
>>> service
>>>> anyway and it represents a different integration point with ACS.
>>> Depends
>>>> on whether you consider the system vm part of the ACS deployment, or
>>> an
>>>> entity like a host.
>>>> 
>>>> On 9/26/13 10:27 AM, "Alex Huang" <Alex.Huang@citrix.com> wrote:
>>>> 
>>>>> Using SNMP for alert notification is not a bad idea though.  I don't
>>> see
>>>>> why we can't do that instead of posting to the management server.
>>> This
>>>>> is specifically referring to the second part of the proposal.  Why
>>>>> reinvent that part of it?
>>>>> 
>>>>> --Alex
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
>>>>>> Sent: Wednesday, September 25, 2013 10:28 PM
>>>>>> To: dev@cloudstack.apache.org
>>>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>>>> 
>>>>>> SNMP wouldn't restart a failed process nor would it generate
>>> alerts. It
>>>>>> is
>>>>>> simply too generic for the requirements outlined here. The proposal
>>> does
>>>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>>> the AGPL.
>>>>>> I think the idea is to have a tight monitoring loop that scales:
so
>>>>>> executing the
>>>>>> monitoring loop in-situ makes sense.
>>>>>> 
>>>>>> 
>>>>>> On 9/25/13 9:53 PM, "David Nalley" <david@gnsa.us> wrote:
>>>>>> 
>>>>>>> On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi
>>>>>>> <jayapalreddy.uradi@citrix.com> wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Currently in virtual router there is no way to recover and
>>> notify if
>>>>>>>> some service goes down unexpectedly.
>>>>>>>> 
>>>>>>>> This feature is about monitoring all the services rendered
by the
>>>>>>>> virtual router, ensure that the services are running through
the
>>> life
>>>>>>>> time of the VR.
>>>>>>>> 
>>>>>>>> On service failure:
>>>>>>>> 1. Generate an alert and event indicating failure 2. Restart
the
>>>>>>>> service
>>>>>>>> 
>>>>>>>> Services to be monitored:
>>>>>>>> DHCP, DNS, haproxy, password server etc.
>>>>>>>> 
>>>>>>>> As part of monitoring there are two activities
>>>>>>>> 
>>>>>>>> 1. One is monitoring the services in VR and log the events.
Using
>>>>>>>> monit for monitoring services  2. Second part is pushing
alerts
>>> from
>>>>>>>> router to  MS server. Thinking on POST the logs to web server
in
>>> MS.
>>>>>>>> 
>>>>>>>> I will be updating more details and FS in this thread.
>>>>>>>> 
>>>>>>>> I created enhancement bug for this.
>>>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-4736
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Jayapal
>>>>>>> 
>>>>>>> So several things - why not make this via SNMP? Query processes,
>>> and
>>>>>>> many other things. This should be relatively simple, is well
known,
>>> can
>>>>>>> be locked down (or could be monitored for many other things by
>>> external
>>>>>>> monitoring packages) and is the defacto standard for monitoring
>>> hosts.
>>>>>>> Second - monit is Affero GPL licensed - which is a cat-x license.
>>>>>>> While I expect that we would merely use this and not do any
>>> hacking on
>>>>>>> it - I think its inclusion might be a surprise (and forbidden
in
>>> many
>>>>>>> environments) to our users
>>>>>>> 
>>>>>>> --David
>>>>> 
>>>> 
>>> 
> 


Mime
View raw message