hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events
Date Thu, 16 May 2013 14:45:17 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659596#comment-13659596
] 

Robert Joseph Evans commented on MAPREDUCE-5124:
------------------------------------------------

I believe in most cases it is enough to restrict it at the server side and retry at the client
side, but there are some RPC calls that are different and perhaps should be handled slightly
differently.  YARN-309 went in to try and throttle the hearbeats, instead of rejecting them
and asking them to retry.  I think this is preferable for heartbeats over an outright rejection.
 Simply because we know that the heartbeats are going to come regularly and asking the next
one to wait does not reduce the total amount of work that we are going to need to do.

So I would throw a ToBusyRetryLater type of exception for once time RPC calls when the AsyncDispatcher's
queue is over a high water mark, but for heartbeats I would want them to scale the frequency
based off of how busy the AsyncDispatcher is.  
                
> AM lacks flow control for task events
> -------------------------------------
>
>                 Key: MAPREDUCE-5124
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>         Attachments: MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events from tasks.
 If the AM is unable to keep pace with the rate of incoming events for a sufficient period
of time then it will eventually exhaust the heap and crash.  MAPREDUCE-5043 addressed a major
bottleneck for event processing, but the AM could still get behind if it's starved for CPU
and/or handling a very large job with tens of thousands of active tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message