bigtop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <...@apache.org>
Subject Re: Problem using puppet scripts to configure bigtop on AmazonLinux
Date Sun, 30 Nov 2014 01:12:16 GMT
On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
> Thanks Roman,
> 
> I actually fixed the problem. I had an existing process monitoring the
> daemon and restarting it if it terminated. However, puppet encapsulates this
> so it is no longer needed. Also, this process was causing the namenode
> service to terminate once. I removed my existing monitoring process and
> everything is working fine. 
> 
> That being said is there a recommended number of times we should retry the
> puppet scripts on failure?

Good to see you're coming through! As for the retries: if something doesn't
work I usually check the logs immediatelly. Sometimes after a second re-run.

Cos

> > On Nov 29, 2014, at 3:49 PM, Roman Shaposhnik <roman@shaposhnik.org> wrote:
> > 
> >> On Fri, Nov 28, 2014 at 7:08 PM, Konstantin Boudnik <cos@apache.org> wrote:
> >>> On Sat, Nov 29, 2014 at 01:43AM, Leidle, Rob wrote:
> >>> Yes, I ran into Bigtop-1522 and figured out I needed to add mapred-app.
> >>> Sorry, I wrote what I said in the previous email incorrectly, yes,
> >>> resource manager does not install because the depdendency namenode does
> >>> not install correctly. I will look more closely at the service logs to see
> >>> if I can figure out why it isn╧t starting. The error code of Ё3╡ indicates
> >>> from the /etc/init.d/hadoop-hdfs-namenode script that this means it can╧t
> >>> find the running process 5 seconds after starting it.
> >> 
> >> Yes, please look into the logs - might be something obvious missed. We are
> >> running these recipes for a good 3+ years and they are fairly well tested.
> >> Would be good to fix last bugs if any ;)
> > 
> > What Cos said above, but also note that Puppet encourages this unfortunate
> > 'eventual convergence' pattern. IOW, even if the first time around a
> > few services
> > failed if everything goes OK on the next Puppet run -- the cluster comes up.
> > 
> > It would be very nice to debug the nitty gritty details of
> > synchronization issues
> > like the ones you seem to be seeing. Unfortunately, we haven't really had
> > much of a focus there, since, like I said, for internal Bigtop testing purposes
> > the 'eventual convergence' suffices.
> > 
> > Thanks,
> > Roman.

Mime
View raw message