Return-Path: X-Original-To: apmail-ambari-dev-archive@www.apache.org Delivered-To: apmail-ambari-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A4F6617D34 for ; Fri, 13 Feb 2015 17:19:27 +0000 (UTC) Received: (qmail 62680 invoked by uid 500); 13 Feb 2015 17:19:21 -0000 Delivered-To: apmail-ambari-dev-archive@ambari.apache.org Received: (qmail 62647 invoked by uid 500); 13 Feb 2015 17:19:21 -0000 Mailing-List: contact dev-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ambari.apache.org Delivered-To: mailing list dev@ambari.apache.org Received: (qmail 62631 invoked by uid 99); 13 Feb 2015 17:19:21 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Feb 2015 17:19:21 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 0ADE61C2CBF; Fri, 13 Feb 2015 17:19:20 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============1549110807597681107==" MIME-Version: 1.0 Subject: Re: Review Request 31002: RU - NodeManager failed to restart in Kerberized clusters From: "Alejandro Fernandez" To: "Nate Cole" , "Dmitro Lisnichenko" , "Robert Levas" , "Jonathan Hurley" Cc: "Alejandro Fernandez" , "Ambari" Date: Fri, 13 Feb 2015 17:19:20 -0000 Message-ID: <20150213171920.29076.9234@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: "Alejandro Fernandez" X-ReviewGroup: Ambari X-ReviewRequest-URL: https://reviews.apache.org/r/31002/ X-Sender: "Alejandro Fernandez" References: <20150213154234.11124.6855@reviews.apache.org> In-Reply-To: <20150213154234.11124.6855@reviews.apache.org> Reply-To: "Alejandro Fernandez" X-ReviewRequest-Repository: ambari --===============1549110807597681107== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31002/ ----------------------------------------------------------- (Updated Feb. 13, 2015, 5:19 p.m.) Review request for Ambari, Dmitro Lisnichenko, Jonathan Hurley, Nate Cole, and Robert Levas. Bugs: AMBARI-9627 https://issues.apache.org/jira/browse/AMBARI-9627 Repository: ambari Description ------- Node Manager failed to restart in a Kerberized cluster while performing a Rolling Upgrade. I deployed a 3-node cluster with all services from HDDFS through ZK, then enabled Namenode HA, and kerberized the cluster. When I performed a RU from 2.2.0.0 GA bits to 2.2.1.0-2260, I first had to comment out an error in ZK server, and when I got to the Slaves group, NodeManager failed. See attached log. ``` Fail: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/nm.service.keytab nm/_HOST@EXAMPLE.COM;' returned 1. kinit: Keytab contains no suitable keys for nm/_HOST@EXAMPLE.COM while getting initial credentials ``` ``` [root@c6404 ~]# klist -kt /etc/security/keytabs/nm.service.keytab Keytab name: FILE:/etc/security/keytabs/nm.service.keytab KVNO Timestamp Principal ---- ----------------- -------------------------------------------------------- 1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM 1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM 1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM 1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM 1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM ``` This means that params.py is probably missing to replace _HOST with the value. Diffs ----- ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params.py 53beb96 Diff: https://reviews.apache.org/r/31002/diff/ Testing (updated) ------- Deployed a 3-node cluster with HDFS, YARN, ..., ZK, then added Namenode HA, and kerberized the cluster. After performing a RU, I verified that the fix for NodeManager worked. Unit tests passed in ABO. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 01:13 h [INFO] Finished at: 2015-02-13T17:11:56+00:00 [INFO] Final Memory: 44M/475M [INFO] ------------------------------------------------------------------------ Thanks, Alejandro Fernandez --===============1549110807597681107==--