hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Narrell <matt.narr...@gmail.com>
Subject Re: YARN HA Active ResourceManager failover when machine is stopped
Date Fri, 24 Apr 2015 17:57:31 GMT
Also, another observation is that when the VMs are halted, its seems like the NodeManagers
do not consider this a scenario to round-robin among the configured ResourceManagers?  Is
there some timeout that I’ve missed to instruct the NodeManagers to do this round-robining
in the case of the machine not responding (to distinguish it from a network blip)?

mn

> On Apr 24, 2015, at 1:50 AM, Drake민영근 <drake.min@nexr.com> wrote:
> 
> Hi, Matt
> 
> The second log file looks like node manager's log, not the standby resource manager.
> 
> Thanks.
> 
> Drake 민영근 Ph.D
> kt NexR
> 
> On Fri, Apr 24, 2015 at 11:39 AM, Matt Narrell <matt.narrell@gmail.com <mailto:matt.narrell@gmail.com>>
wrote:
> Active ResourceManager:  http://pastebin.com/hE0ppmnb <http://pastebin.com/hE0ppmnb>
> Standby ResourceManager: http://pastebin.com/DB8VjHqA <http://pastebin.com/DB8VjHqA>
> 
> Oppressively chatty and not much valuable info contained therein.
> 
> 
>> On Apr 23, 2015, at 4:25 PM, Vinod Kumar Vavilapalli <vinodkv@hortonworks.com
<mailto:vinodkv@hortonworks.com>> wrote:
>> 
>> I have run into this offline with someone else too but couldn't root-cause it.
>> 
>> Will you be able to share your active/standby ResourceManager logs via pastebin or
something?
>> 
>> +Vinod
>> 
>> On Apr 23, 2015, at 9:41 AM, Matt Narrell <matt.narrell@gmail.com <mailto:matt.narrell@gmail.com>>
wrote:
>> 
>>> I’m using Hadoop 2.6.0 from HDP 2.2.4 installed via Ambari 2.0
>>> 
>>> I’m testing the YARN HA ResourceManager failover. If I STOP the active ResourceManager
(shut the machine off), the standby ResourceManager is elected to active, but the NodeManagers
do not register themselves with the newly elected active ResourceManager. If I restart the
machine (but DO NOT resume the YARN services) the NodeManagers register with the newly elected
ResourceManager and my jobs resume. I assume I have some bad configuration, as this produces
a SPOF, and is not HA in the sense I’m expecting.
>>> 
>>> Thanks,
>>> mn
>> 
> 
> 


Mime
View raw message