hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Graves (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4074) Client continuously retries to RM When RM goes down before launching Application Master
Date Tue, 17 Apr 2012 20:55:13 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255929#comment-13255929
] 

Thomas Graves commented on MAPREDUCE-4074:
------------------------------------------

Still reviewing, but one thing I noticed is that your config setting "client-rm.hs.am.max-retries"
doesn't match what other yarn max-retries (like yarn.resurcemanager.am.max-retries and yarn.app.mapreduce.client-am.ipc.max-retries)
do. In general if they are set to 3 it actually only tries 3 times total. I realize retries
would imply that it tries 4 times total (the first and then 3 more) like yours does, but I
think we should keep it consistent across yarn and change yours to only try 3 times total
if its set to 3. 

I personally would prefer the setting "yarn.app.mapreduce.client-rm.hs.am.max-retries" just
be called yarn.app.mapreduce.client.max-retries or atleast use - instead of .  so something
more like yarn.app.mapreduce.client-rm-hs-am.max-retries. I'm open to other ideas to just
don't like the periods since its saying client to rm or hs or am.

You should also add the default setting with description to yarn-default.xml
                
> Client continuously retries to RM When RM goes down before launching Application Master
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4074
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4074
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.23.1
>            Reporter: Devaraj K
>         Attachments: MAPREDUCE-4074-1.patch, MAPREDUCE-4074.patch
>
>
> Client continuously tries to RM and logs the below messages when the RM goes down before
launching App Master. 
> I feel exception should be thrown or break the loop after finite no of retries.
> {code:xml}
> 28/03/12 07:15:03 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 0 time(s).
> 28/03/12 07:15:04 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 1 time(s).
> 28/03/12 07:15:05 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 2 time(s).
> 28/03/12 07:15:06 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 3 time(s).
> 28/03/12 07:15:07 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 4 time(s).
> 28/03/12 07:15:08 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 5 time(s).
> 28/03/12 07:15:09 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 6 time(s).
> 28/03/12 07:15:10 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 7 time(s).
> 28/03/12 07:15:11 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 8 time(s).
> 28/03/12 07:15:12 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 9 time(s).
> 28/03/12 07:15:13 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 0 time(s).
> 28/03/12 07:15:14 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 1 time(s).
> 28/03/12 07:15:15 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 2 time(s).
> 28/03/12 07:15:16 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 3 time(s).
> 28/03/12 07:15:17 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032.
Already tried 4 time(s).
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message