incubator-ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Smith <christ...@greenbutton.com>
Subject Re: Ambari server claiming no heartbeats from agents
Date Sun, 08 Sep 2013 04:36:11 GMT
Hi Sumit,

I got it sorted, I had an issue with the times being out of sync, but also
an issue with config.

Thanks again...


On Sun, Sep 8, 2013 at 11:21 AM, Christian Smith
<christian@greenbutton.com>wrote:

> Hi Sumit,
>
> It seems that still hasn't fixed the issue.  The clocks are synced and
> services restarted.  From the logs I see:
>
> Agent:
> INFO 2013-09-07 23:11:08,906 Heartbeat.py:61 - Sending heartbeat with
> response id: 4 and timestamp: 1378595468906
>
> Server:
> 23:11:09,795  INFO HeartBeatHandler:108 - Received heartbeat from host,
> hostname=hadoop-cluster-1-1-10613434.greenbutton.local,
> currentResponseId=4, receivedResponseId=4
>
> The UI still reports that it hasn't received a heartbeat from that agent
> in over 3 minutes.
>
> I've attached screen shots that show all hostnames are aligned.
>
> Thanks,
> Christian[image: Inline image 2][image: Inline image 3][image: Inline
> image 1]
>
>
>
> On Sun, Sep 8, 2013 at 9:57 AM, Christian Smith <christian@greenbutton.com
> > wrote:
>
>> Hi Sumit,
>>
>> It seems the clocks are off, I should have checked that earlier!  Thanks
>> for you help.
>>
>> -Christian
>>
>>
>>
>>
>> On Sun, Sep 8, 2013 at 1:38 AM, Sumit Mohanty <smohanty@hortonworks.com>wrote:
>>
>>> Hi Christian,
>>>
>>> Heartbeat hostname not aligning with the registered hostname is the most
>>> likely reason.
>>>
>>> Try these API calls to confirm:
>>> curl –u user:passwd http://AmbariHost:8080/api/v1/hosts –this will tell
>>> you how many hosts are registered and their hostname (FQDN is what is
>>> typically used for registration)
>>>
>>> You can compare that with
>>> curl –u user:passwd
>>> http://AmbariHost:8080/api/v1/clusters/YourClusterName/hosts<http://AmbariHost:8080/api/v1/hosts>
–
>>> tells you the list of hosts that the cluster is associated with
>>>
>>> If indeed there is a hostname mismatch, you can modify the hostname on
>>> the host itself and restart the agent.
>>>
>>> If you can't modify the hostname for some reason, let us know. There is
>>> a way for ambari agents to override the host supplied hostname as well.
>>> However, the prior solution is preferred.
>>>
>>> -Sumit
>>> From: Christian Smith <christian@greenbutton.com>
>>> Reply-To: <ambari-user@incubator.apache.org>
>>> Date: Saturday, September 7, 2013 2:56 AM
>>> To: "ambari-user@incubator.apache.org" <ambari-user@incubator.apache.org
>>> >
>>> Subject: Ambari server claiming no heartbeats from agents
>>>
>>> Hi,
>>>
>>> I've got a new cluster configured via the API with HDFS and MR.  The
>>> configuration went fine and the HDFS service says its running.  However, on
>>> the hosts tab, all hosts are marked with a yellow circle and state that no
>>> heartbeat has been received for over 3 minutes.
>>>
>>> I've checked the agent and server logs and heartbeats are being sent and
>>> received by the expected parties.  So my question is what could be going
>>> wrong?  And how does the server associate a received heartbeat with a host
>>> in the cluster config?  Does the server to a reserve DNS lookup of the
>>> heartbeats source IP?  Or does the heartbeat contain the hostname of the
>>> agent?
>>>
>>> It seems like something around the heartbeat hostname is not aligned
>>> with what the server is expecting...
>>>
>>> Any ideas how to debug further?
>>>
>>> Cheers,
>>> Christian
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>

Mime
View raw message