hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neill Lima (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3152) Missing hadoop exclude file fails RMs in HA
Date Wed, 11 Feb 2015 14:53:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316294#comment-14316294
] 

Neill Lima commented on YARN-3152:
----------------------------------

[~vinodkv]] -- It fails in both RMs indeed. What was 'unexpected' it didn't fail in single
RM because of the missing exclude file. 

Is the need of the exclude file so relevant to the RMs but not so much for the NNs? Because
the behavior (NNs vs RMs) is very different. I lean towards the NNs behavior. 

> Missing hadoop exclude file fails RMs in HA
> -------------------------------------------
>
>                 Key: YARN-3152
>                 URL: https://issues.apache.org/jira/browse/YARN-3152
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>         Environment: Debian 7
>            Reporter: Neill Lima
>            Assignee: Naganarasimha G R
>
> NI have two NNs in HA, they do not fail when the exclude file is not present (hadoop-2.6.0/etc/hadoop/exclude).
I had one RM and I wanted to make two in HA. I didn't create the exclude file at this point
as well. I applied the HA RM settings properly and when I started both RMs I started getting
this exception:
> 2015-02-06 12:25:25,326 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=root	OPERATION=transitionToActive	TARGET=RMHAProtocolService	RESULT=FAILURE	DESCRIPTION=Exception
transitioning to active	PERMISSIONS=All users are allowed
> 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling
the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> 	at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
> 	at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805)
> 	at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active
mode
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
> 	at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
> 	... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.FileNotFoundException:
/hadoop-2.6.0/etc/hadoop/exclude (No such file or directory)
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626)
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297)
> 	... 5 more
> 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish
ZK session
> 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 0x44af32566180094
closed
> 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection,
connectString=x.x.x.x:2181,x.x.x.x:2181 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c
> 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection
to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket connection established
to x.x.x.x/x.x.x.x:2181, initiating session
> The issue is descriptive enough to resolve the problem - and it has been fixed by creating
the exclude file. 
> I just think as of a improvement: 
> - Should RMs ignore the missing file as the NNs did?
> - Should single RM fail even when the file is not present?
> Just suggesting this improvement to keep the behavior consistent when working with in
HA (both NNs and RMs). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message