Return-Path: X-Original-To: apmail-bigtop-user-archive@www.apache.org Delivered-To: apmail-bigtop-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 98CD910D4E for ; Sun, 30 Nov 2014 01:12:47 +0000 (UTC) Received: (qmail 34167 invoked by uid 500); 30 Nov 2014 01:12:47 -0000 Delivered-To: apmail-bigtop-user-archive@bigtop.apache.org Received: (qmail 34093 invoked by uid 500); 30 Nov 2014 01:12:47 -0000 Mailing-List: contact user-help@bigtop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@bigtop.apache.org Delivered-To: mailing list user@bigtop.apache.org Received: (qmail 34082 invoked by uid 99); 30 Nov 2014 01:12:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Nov 2014 01:12:47 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.252.207.38] (HELO resqmta-ch2-06v.sys.comcast.net) (69.252.207.38) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Nov 2014 01:12:20 +0000 Received: from resomta-ch2-18v.sys.comcast.net ([69.252.207.114]) by resqmta-ch2-06v.sys.comcast.net with comcast id MdCH1p0042Udklx01dCJp0; Sun, 30 Nov 2014 01:12:18 +0000 Received: from tpx ([24.130.135.131]) by resomta-ch2-18v.sys.comcast.net with comcast id MdCG1p00S2qGB6001dCHno; Sun, 30 Nov 2014 01:12:18 +0000 Received: from localhost (localhost [127.0.0.1]) by tpx (Postfix) with ESMTP id A2685200521DA; Sat, 29 Nov 2014 17:12:16 -0800 (PST) Date: Sat, 29 Nov 2014 17:12:16 -0800 From: Konstantin Boudnik To: user@bigtop.apache.org Cc: "dev@bigtop.apache.org" Subject: Re: Problem using puppet scripts to configure bigtop on AmazonLinux Message-ID: <20141130011216.GP4791@tpx> Mail-Followup-To: user@bigtop.apache.org, "dev@bigtop.apache.org" References: <20141129001429.GK4791@tpx> <20141129030842.GM4791@tpx> <8DDB04F9-CF8F-4198-B907-C17D8FEECB88@amazon.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="AA9g+nFNFPYNJKiL" Content-Disposition: inline In-Reply-To: <8DDB04F9-CF8F-4198-B907-C17D8FEECB88@amazon.com> X-Organization: It's something of 'Cos X-PGP-Key: http://www.boudnik.org/~cos/pubkey.asc User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1417309938; bh=/d5ydumhmf1gkTDQyvlIsxIUUoFA/1+ErLjDIYHrMDA=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=apFedEL8tru+vRHufhm9VHH9nl4uZusZRbIr6w3Jo/GnNlhHea87uOBSPlL2TEgBF 4tskgkR5zHEtUHt8dtctqmO981vZYkm5Jk+VXFIy5fF8gKnXQnTbIZ9JbSe4yRcQQq PdDxKpPyrWANuYi/gio3pGjb3IDN8rEUln/V7UULWSEzbOcbNmQsCabn8SmzQV8cBZ FLTNt8J+OyKssFYCdyRxUVAjF1j64LsgTCM4ihVciSChLJr+5KvwDPA2yjqvvj5HvO fDD2viows2eHsH5UiPptcKQZ2CfM1a0rRll7RlAHLMOtAjBUvhf6TLMTjMWqfQxYg3 YRtKWwYffKTxw== X-Virus-Checked: Checked by ClamAV on apache.org --AA9g+nFNFPYNJKiL Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote: > Thanks Roman, >=20 > I actually fixed the problem. I had an existing process monitoring the > daemon and restarting it if it terminated. However, puppet encapsulates t= his > so it is no longer needed. Also, this process was causing the namenode > service to terminate once. I removed my existing monitoring process and > everything is working fine.=20 >=20 > That being said is there a recommended number of times we should retry the > puppet scripts on failure? Good to see you're coming through! As for the retries: if something doesn't work I usually check the logs immediatelly. Sometimes after a second re-run. Cos > > On Nov 29, 2014, at 3:49 PM, Roman Shaposhnik wr= ote: > >=20 > >> On Fri, Nov 28, 2014 at 7:08 PM, Konstantin Boudnik w= rote: > >>> On Sat, Nov 29, 2014 at 01:43AM, Leidle, Rob wrote: > >>> Yes, I ran into Bigtop-1522 and figured out I needed to add mapred-ap= p. > >>> Sorry, I wrote what I said in the previous email incorrectly, yes, > >>> resource manager does not install because the depdendency namenode do= es > >>> not install correctly. I will look more closely at the service logs t= o see > >>> if I can figure out why it isn=E2=95=A7t starting. The error code of = =D0=813=E2=95=A1 indicates > >>> from the /etc/init.d/hadoop-hdfs-namenode script that this means it c= an=E2=95=A7t > >>> find the running process 5 seconds after starting it. > >>=20 > >> Yes, please look into the logs - might be something obvious missed. We= are > >> running these recipes for a good 3+ years and they are fairly well tes= ted. > >> Would be good to fix last bugs if any ;) > >=20 > > What Cos said above, but also note that Puppet encourages this unfortun= ate > > 'eventual convergence' pattern. IOW, even if the first time around a > > few services > > failed if everything goes OK on the next Puppet run -- the cluster come= s up. > >=20 > > It would be very nice to debug the nitty gritty details of > > synchronization issues > > like the ones you seem to be seeing. Unfortunately, we haven't really h= ad > > much of a focus there, since, like I said, for internal Bigtop testing = purposes > > the 'eventual convergence' suffices. > >=20 > > Thanks, > > Roman. --AA9g+nFNFPYNJKiL Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJUem7wAAoJEKtmQW7Qw4JPbS8H/AnepevW8wGDtDJouT8XkQ6Z wunpM+xwJ0WwRoULtsjwA5rCd8Igm+gbMlUWcPrIV8+mOKkFcS4dHPTFddPxerDN /Vlyh1+PQwKcrDYAqCNAN9nhXKk+erHrNY5cRzhmUAi7/qZoWVQ6gHsYAx9slSTq iahL45/1BbDGZDVKFviLsgNNoDvKf1zHaKmfvsRBQn7suua2FmB21dWwJEAKB2yc hmFHnbc3WVSnNSvGUcb03nwH+AOMzufKiEocChW8uDDhsvp/jsC1JfAJhvn+dRgq y+ozgJaAFH8Khv0WBh5WFPQw1tLsWbZMdoxTPzZJhg9Zw0AmuyJOMeKdPo997rM= =wDvx -----END PGP SIGNATURE----- --AA9g+nFNFPYNJKiL--