hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dustin Cote (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3924) Submitting an application to standby ResourceManager should respond better than Connection Refused
Date Tue, 11 Aug 2015 12:20:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681713#comment-14681713

Dustin Cote commented on YARN-3924:

Yes, [~ajithshetty] that's the point I'm trying to get across.  The scenario that is problematic
Configuring wrong/invalid ha.rm-ids at client is user mistake, this can be rechecked by user.

Returning "Connection Refused" gives the user no information that this is what happened. 
Generally, I see users looking for closed ports or firewall issues when they see this message
back, when really they've just forgotten to change their Oozie workflow to point to a logical
RM name after enabling HA.  This kind of error is doubly hard to debug when it works intermittently
(because when a failover occurs, suddenly their workflow starts working again!).  Yes, this
is the current RM HA design, so it's not as easy as changing the message or exception type.
 That said, I still think it's a good supportability/usability improvement. 

> Submitting an application to standby ResourceManager should respond better than Connection
> --------------------------------------------------------------------------------------------------
>                 Key: YARN-3924
>                 URL: https://issues.apache.org/jira/browse/YARN-3924
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Dustin Cote
>            Assignee: Ajith S
>            Priority: Minor
> When submitting an application directly to a standby resource manager, the resource manager
responds with 'Connection Refused' rather than indicating that it is a standby resource manager.
 Because the resource manager is aware of its own state, I feel like we can have the 8032
port open for standby resource managers and reject the request with something like 'Cannot
process application submission from this standby resource manager'.  
> This would be especially helpful for debugging oozie problems when users put in the wrong
address for the 'jobtracker' (i.e. they don't put the logical RM address but rather point
to a specific resource manager).  

This message was sent by Atlassian JIRA

View raw message