bigtop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Leidle, Rob" <lei...@amazon.com>
Subject Re: Problem using puppet scripts to configure bigtop on AmazonLinux
Date Thu, 11 Dec 2014 17:37:41 GMT
Thanks Nate, this is exactly what I was looking for. One more question — 
does puppet have any mechanism for monitoring service daemons and 
restarting them in the case where they have a catastrophic failure/crash? 
How do others in the Bigtop world deal with high availability and ensuring 
that processes are restarted when they inappropriately terminate? Does 
anyone have this kind of need?




On 12/11/14, 12:26 AM, "Nate D'Amico" <nate@reactor8.com> wrote:

>Guess breaking into two items:
>
>-detecting a failed puppet run when triggered via script/external apply
>-how many times to retry
>
>For the former, you could try to use " --detailed-exitcodes" which should 
>force a non-zero exit code, your script could detect that and act 
>accordingly.  Remember seeing a bug while back mentioned that you needed 
>to assert that param on apply to force puppet to return non-zero on 
>error.  Not sure if still exists, or what version you are running but 
>safe to probably try.
>
>As far as number of retries, all apps/services/etc could be different.., 
>only specific point of view I would say is given the puppet apply has all 
>data/attributes it needs to successfully converge, after two failed 
>attempts you can safely assume failed, and then resort to log check to 
>see what issue could be.
>
>One other aspect to consider is that the puppet converge could succeed 
>but something outside causes a failure right after.  Depending on 
>resiliency you would want your process/other monitor to assert after a 
>successful run, and restart the whole converge run again.., or just 
>notify, or etc.
>
>Does that help?
>
>
>-----Original Message-----
>From: Konstantin Boudnik [mailto:cos@apache.org] 
>Sent: Wednesday, December 10, 2014 4:08 PM
>To: user@bigtop.apache.org
>Cc: dev@bigtop.apache.org; Nate D'Amico; Rich
>Subject: Re: Problem using puppet scripts to configure bigtop on 
>AmazonLinux
>
>Rob,
>
>following on our IRC chat I will Cc here two guys from the community who 
>know Puppet the best. Nate and Rich are likely to have the answer. Guys, 
>if you can chime in on the topic - it'd be great!
>
>To reiterate it: you are looking to a way to automatically tell if a 
>recipe has failed and repeat it, if required, right?
>
>On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote:
>> Thanks Cos,
>> 
>> This would be something that I would want to automate as it would be 
>> running many times across many different clusters. Ideally I would fix 
>> any issues causing the puppet scripts to not complete properly, but I 
>> don╧t know how realistic that is in the short term so I would like to 
>> setup retry logic if that is the recommended way of doing things. 
>> That╧s why I was hoping for some direction on how often to run the 
>>retry.
>> 
>> On 11/29/14, 5:12 PM, "Konstantin Boudnik" <cos@apache.org> wrote:
>> 
>> >On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
>> >> Thanks Roman,
>> >> 
>> >> I actually fixed the problem. I had an existing process monitoring 
>> >>the  daemon and restarting it if it terminated. However, puppet 
>> >>encapsulates this  so it is no longer needed. Also, this process was 
>> >>causing the namenode  service to terminate once. I removed my 
>> >>existing monitoring process and  everything is working fine.
>> >> 
>> >> That being said is there a recommended number of times we should 
>> >>retry the  puppet scripts on failure?
>> >
>> >Good to see you're coming through! As for the retries: if something 
>> >doesn't work I usually check the logs immediatelly. Sometimes after a 
>> >second re-run.
>> >
>> >Cos
>> >
>> 
>
Mime
View raw message