hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohith Sharma K S <rohithsharm...@huawei.com>
Subject RE: YARN HA Active ResourceManager failover when machine is stopped
Date Mon, 27 Apr 2015 04:38:01 GMT

     I had seen this issue in my cluster without HA configured when the process is Halted.
 I assume that your scenario also having similar issue when Active RM machine is Shutdown
abruptly.  May be you can verify and compare taking thread dump of NM and with below JIRA’s.

Open JIRA’s in community regarding this problem are
https://issues.apache.org/jira/i#browse/YARN-1061 (Without HA)
https://issues.apache.org/jira/i#browse/YARN-2578 (With HA)

Thanks & Regards
Rohith Sharma K S

From: Matt Narrell [mailto:matt.narrell@gmail.com]
Sent: 24 April 2015 23:28
To: user@hadoop.apache.org
Subject: Re: YARN HA Active ResourceManager failover when machine is stopped

Also, another observation is that when the VMs are halted, its seems like the NodeManagers
do not consider this a scenario to round-robin among the configured ResourceManagers?  Is
there some timeout that I’ve missed to instruct the NodeManagers to do this round-robining
in the case of the machine not responding (to distinguish it from a network blip)?


On Apr 24, 2015, at 1:50 AM, Drake민영근 <drake.min@nexr.com<mailto:drake.min@nexr.com>>

Hi, Matt

The second log file looks like node manager's log, not the standby resource manager.


Drake 민영근 Ph.D
kt NexR

On Fri, Apr 24, 2015 at 11:39 AM, Matt Narrell <matt.narrell@gmail.com<mailto:matt.narrell@gmail.com>>
Active ResourceManager:  http://pastebin.com/hE0ppmnb
Standby ResourceManager: http://pastebin.com/DB8VjHqA

Oppressively chatty and not much valuable info contained therein.

On Apr 23, 2015, at 4:25 PM, Vinod Kumar Vavilapalli <vinodkv@hortonworks.com<mailto:vinodkv@hortonworks.com>>

I have run into this offline with someone else too but couldn't root-cause it.

Will you be able to share your active/standby ResourceManager logs via pastebin or something?


On Apr 23, 2015, at 9:41 AM, Matt Narrell <matt.narrell@gmail.com<mailto:matt.narrell@gmail.com>>

I’m using Hadoop 2.6.0 from HDP 2.2.4 installed via Ambari 2.0

I’m testing the YARN HA ResourceManager failover. If I STOP the active ResourceManager (shut
the machine off), the standby ResourceManager is elected to active, but the NodeManagers do
not register themselves with the newly elected active ResourceManager. If I restart the machine
(but DO NOT resume the YARN services) the NodeManagers register with the newly elected ResourceManager
and my jobs resume. I assume I have some bad configuration, as this produces a SPOF, and is
not HA in the sense I’m expecting.


View raw message