incubator-ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Jeltema <brian.jelt...@digitalenvoy.net>
Subject Re: heartbeats being ignored
Date Mon, 15 Jul 2013 17:50:48 GMT
I think that is the problem. The original hosts were on a private net and registered as:

      "href" : "http://localhost:8080/api/v1/hosts/pc1",
      "Hosts" : {
        "host_name" : "pc1"
      }

but after the change, they identify themselves with the FQN:

      "href" : "http://localhost:8080/api/v1/hosts/pc1.foo.net",
      "Hosts" : {
        "host_name" : "pc1.foo.net"
      }

Is there some way to fix this?

TIA

Brian

On Jul 15, 2013, at 12:13 PM, Sumit Mohanty wrote:

> Is it possible that the FQDN/hostname of the agent hosts have changed?
> E.g. Agents initially registered themselves as host A (you can get that
> using API server:8080/api/v1/clusters/<cluster name>/hosts) and after the
> network configuration the agents started sending as their heartbeat as B
> (server:8080/api/v1/hosts will tell you about the hosts that have
> registered)
> 
> -Sumit
> 
> On 7/15/13 8:47 AM, "Brian Jeltema" <brian.jeltema@digitalenvoy.net> wrote:
> 
>> I had to do some network reconfiguration on our cluster. After rebooting
>> everything and restarting
>> the ambari server and the ambari agents, the server reports (via the UI)
>> that it is not receiving heartbeats.
>> However, when I look at the server and agent logs, I see heartbeat
>> activity:
>> 
>> agent:
>> INFO 2013-07-15 11:40:12,169 Heartbeat.py:61 - Sending heartbeat with
>> response id: 251 and timestamp: 1373902812168
>> INFO 2013-07-15 11:40:12,214 Controller.py:176 - No commands sent from
>> the Server.
>> 
>> server
>> 11:41:44,760  INFO HeartBeatHandler:108 - Received heartbeat from host,
>> hostname=foo.net, currentResponseId=260, receivedResponseId=260
>> 11:41:44,761  INFO AgentResource:109 - Sending heartbeat response with
>> response id 261
>> 
>> (response id's don't match because I didn't try to capture them in
>> unison). I suspect there may be persisted state in the postgres database
>> from the previous network configuration that is causing the problem. Any
>> suggestions for a fix short of a complete redeploy?
>> 
>> TIA
>> 
>> Brian
> 
> 


Mime
View raw message