Return-Path: X-Original-To: apmail-incubator-ambari-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-ambari-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3E44E102C8 for ; Sat, 7 Sep 2013 13:38:38 +0000 (UTC) Received: (qmail 11451 invoked by uid 500); 7 Sep 2013 13:38:37 -0000 Delivered-To: apmail-incubator-ambari-user-archive@incubator.apache.org Received: (qmail 11269 invoked by uid 500); 7 Sep 2013 13:38:32 -0000 Mailing-List: contact ambari-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: ambari-user@incubator.apache.org Delivered-To: mailing list ambari-user@incubator.apache.org Received: (qmail 11261 invoked by uid 99); 7 Sep 2013 13:38:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Sep 2013 13:38:32 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of smohanty@hortonworks.com designates 209.85.192.180 as permitted sender) Received: from [209.85.192.180] (HELO mail-pd0-f180.google.com) (209.85.192.180) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Sep 2013 13:38:25 +0000 Received: by mail-pd0-f180.google.com with SMTP id y10so4362408pdj.25 for ; Sat, 07 Sep 2013 06:38:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:user-agent:date:subject:from:to:message-id :thread-topic:in-reply-to:mime-version:content-type; bh=RMozI2VF4o09JzX1Vc5RxxFNOo4t85z2kD3JkiE9Kig=; b=j3xW57JH7T2k3XxI8CQxqXJWn+tvZmUxp/HFq4Pz0HHfwz2Rdrel1glORzkjcg6z1A UBjqhxsVL4UAghXW273JiZIkxZTjPcWPntCqiUMW1T+NbSisF4+fAFdcGGdWGqAZ0dK5 kQOeHbu4ak6giPmQDus763cmZI4JCA3h9IpCPEV0ihbGS8Gr8ldaNp8QtjmSiFYblFO5 csV8GyM4YowGgJD9qOQFkKFo0K2gbjSKUCp1xaFmZTAnRkEdJFn+8ZdmrTdhJOc0Ll7U tOuA4dbemAxCFt8zj/3fcCMsN5VPPEvkdZqZGAdDjE+dzrl95MBCqgpuZanU/lMGdudV vVlw== X-Gm-Message-State: ALoCoQm7GDtN1V3YZcasIeZSyW21g17Bd1Um15uMjYm8QSBNV1/fSyf5e8AdD74LXGVaMlNil/HRgT5E11XQ30j/8kWsQzEyzkBBekDN2AhLhOMKlfem+ho= X-Received: by 10.68.244.2 with SMTP id xc2mr8652379pbc.58.1378561084187; Sat, 07 Sep 2013 06:38:04 -0700 (PDT) Received: from [192.168.2.17] (c-24-18-127-234.hsd1.wa.comcast.net. [24.18.127.234]) by mx.google.com with ESMTPSA id kd1sm4850534pab.20.1969.12.31.16.00.00 (version=TLSv1 cipher=RC4-SHA bits=128/128); Sat, 07 Sep 2013 06:38:03 -0700 (PDT) User-Agent: Microsoft-MacOutlook/14.2.5.121010 Date: Sat, 07 Sep 2013 06:38:04 -0700 Subject: Re: Ambari server claiming no heartbeats from agents From: Sumit Mohanty To: Message-ID: Thread-Topic: Ambari server claiming no heartbeats from agents In-Reply-To: Mime-version: 1.0 Content-type: multipart/alternative; boundary="B_3461380690_28171165" X-Virus-Checked: Checked by ClamAV on apache.org > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --B_3461380690_28171165 Content-type: text/plain; charset=ISO-8859-1 Content-transfer-encoding: quoted-printable Hi Christian, Heartbeat hostname not aligning with the registered hostname is the most likely reason. Try these API calls to confirm: curl =ADu user:passwd http://AmbariHost:8080/api/v1/hosts =ADthis will tell= you how many hosts are registered and their hostname (FQDN is what is typically used for registration) You can compare that with curl =ADu user:passwd http://AmbariHost:8080/api/v1/clusters/YourClusterName/hosts =AD tells you the list of hosts that the cluster is associated with If indeed there is a hostname mismatch, you can modify the hostname on the host itself and restart the agent. If you can't modify the hostname for some reason, let us know. There is a way for ambari agents to override the host supplied hostname as well. However, the prior solution is preferred. -Sumit From: Christian Smith Reply-To: Date: Saturday, September 7, 2013 2:56 AM To: "ambari-user@incubator.apache.org" Subject: Ambari server claiming no heartbeats from agents Hi, I've got a new cluster configured via the API with HDFS and MR. The configuration went fine and the HDFS service says its running. However, on the hosts tab, all hosts are marked with a yellow circle and state that no heartbeat has been received for over 3 minutes. I've checked the agent and server logs and heartbeats are being sent and received by the expected parties. So my question is what could be going wrong? And how does the server associate a received heartbeat with a host in the cluster config? Does the server to a reserve DNS lookup of the heartbeats source IP? Or does the heartbeat contain the hostname of the agent? =20 It seems like something around the heartbeat hostname is not aligned with what the server is expecting... Any ideas how to debug further? Cheers, Christian --=20 CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to= =20 which it is addressed and may contain information that is confidential,=20 privileged and exempt from disclosure under applicable law. If the reader= =20 of this message is not the intended recipient, you are hereby notified that= =20 any printing, copying, dissemination, distribution, disclosure or=20 forwarding of this communication is strictly prohibited. If you have=20 received this communication in error, please contact the sender immediately= =20 and delete it from your system. Thank You. --B_3461380690_28171165 Content-type: text/html; charset=ISO-8859-1 Content-transfer-encoding: quoted-printable
Hi Christian,
=
Heartbeat hostname not aligning with the registered hostname= is the most likely reason.

Try these API calls to= confirm:
curl –u user:passwd http://AmbariHost:8080/api/v1/hosts –this= will tell you how many hosts are registered and their hostname (FQDN is wh= at is typically used for registration)

You can com= pare that with 
curl –u user:passwd http://AmbariHost:8080/api/v1/clusters/Y= ourClusterName/hosts – tells you the list of hosts that the = cluster is associated with

If indeed there is a ho= stname mismatch, you can modify the hostname on the host itself and restart= the agent.

If you can't modify the hostname for s= ome reason, let us know. There is a way for ambari agents to override the h= ost supplied hostname as well. However, the prior solution is preferred.

-Sumit
Fr= om: Christian Smith <christian@greenbutton.com>
Rep= ly-To: <amba= ri-user@incubator.apache.org>
Da= te: Saturday, September 7, 2013 2:56 AM
To: "a= mbari-user@incubator.apache.org" <ambari-user@incubator.apache.org>
Subject: Ambari server claiming no heartbeats= from agents

Hi,

I've got a new cluster configured via the API with HDFS and MR.  The= configuration went fine and the HDFS service says its running.  Howev= er, on the hosts tab, all hosts are marked with a yellow circle and state t= hat no heartbeat has been received for over 3 minutes.

=
I've checked the agent and server logs and heartbeats are being sent a= nd received by the expected parties.  So my question is what could be = going wrong?  And how does the server associate a received heartbeat w= ith a host in the cluster config?  Does the server to a reserve DNS lo= okup of the heartbeats source IP?  Or does the heartbeat contain the h= ostname of the agent?  

It seems like somethi= ng around the heartbeat hostname is not aligned with what the server is exp= ecting...

Any ideas how to debug further?

Cheers,
Christian

CONFIDENTIALITY NOTICE
NOTICE: This message is = intended for the use of the individual or entity to which it is addressed a= nd may contain information that is confidential, privileged and exempt from= disclosure under applicable law. If the reader of this message is not the = intended recipient, you are hereby notified that any printing, copying, dis= semination, distribution, disclosure or forwarding of this communication is= strictly prohibited. If you have received this communication in error, ple= ase contact the sender immediately and delete it from your system. Thank Yo= u. --B_3461380690_28171165--