incubator-ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sumit Mohanty <smoha...@hortonworks.com>
Subject Re: heartbeats being ignored
Date Mon, 15 Jul 2013 19:57:19 GMT
You can write a custom script that will modify the name used by the agents
to register. 
The name of the property (at /etc/ambari-agent/conf/ambari-agent.ini) is
hostname_script.

E.g.
[agent]
prefix=/var/lib/ambari-agent/data
...
hostname_script=/myScripts/customizeHostName.sh

Essentially, the script will convert names like "pc1.foo.net" to "pc1"
assuming all the agents have similar issue.

Once you modify the .ini file you can restart the agent for it to register
with the modified name.

-Sumit



On 7/15/13 10:50 AM, "Brian Jeltema" <brian.jeltema@digitalenvoy.net>
wrote:

>I think that is the problem. The original hosts were on a private net and
>registered as:
>
>      "href" : "http://localhost:8080/api/v1/hosts/pc1",
>      "Hosts" : {
>        "host_name" : "pc1"
>      }
>
>but after the change, they identify themselves with the FQN:
>
>      "href" : "http://localhost:8080/api/v1/hosts/pc1.foo.net",
>      "Hosts" : {
>        "host_name" : "pc1.foo.net"
>      }
>
>Is there some way to fix this?
>
>TIA
>
>Brian
>
>On Jul 15, 2013, at 12:13 PM, Sumit Mohanty wrote:
>
>> Is it possible that the FQDN/hostname of the agent hosts have changed?
>> E.g. Agents initially registered themselves as host A (you can get that
>> using API server:8080/api/v1/clusters/<cluster name>/hosts) and after
>>the
>> network configuration the agents started sending as their heartbeat as B
>> (server:8080/api/v1/hosts will tell you about the hosts that have
>> registered)
>> 
>> -Sumit
>> 
>> On 7/15/13 8:47 AM, "Brian Jeltema" <brian.jeltema@digitalenvoy.net>
>>wrote:
>> 
>>> I had to do some network reconfiguration on our cluster. After
>>>rebooting
>>> everything and restarting
>>> the ambari server and the ambari agents, the server reports (via the
>>>UI)
>>> that it is not receiving heartbeats.
>>> However, when I look at the server and agent logs, I see heartbeat
>>> activity:
>>> 
>>> agent:
>>> INFO 2013-07-15 11:40:12,169 Heartbeat.py:61 - Sending heartbeat with
>>> response id: 251 and timestamp: 1373902812168
>>> INFO 2013-07-15 11:40:12,214 Controller.py:176 - No commands sent from
>>> the Server.
>>> 
>>> server
>>> 11:41:44,760  INFO HeartBeatHandler:108 - Received heartbeat from host,
>>> hostname=foo.net, currentResponseId=260, receivedResponseId=260
>>> 11:41:44,761  INFO AgentResource:109 - Sending heartbeat response with
>>> response id 261
>>> 
>>> (response id's don't match because I didn't try to capture them in
>>> unison). I suspect there may be persisted state in the postgres
>>>database
>>> from the previous network configuration that is causing the problem.
>>>Any
>>> suggestions for a fix short of a complete redeploy?
>>> 
>>> TIA
>>> 
>>> Brian
>> 
>> 
>



Mime
View raw message