hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miklos Szegedi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events
Date Fri, 01 Sep 2017 00:05:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149817#comment-16149817
] 

Miklos Szegedi commented on MAPREDUCE-5124:
-------------------------------------------

Thank you, [~ozawa] for the patch and the design and [~jlowe], [~haibochen], [~revans2], and
[~tgraves] for the design suggestions. I think there is an approach that does not cause deadlocks
but it is a little bit more robust. The AM is the usual bottleneck. Because of this, it should
be the side to drive the communication. Could the server (AM) send the heartbeat to the client
(Task)?
{code}
  foreach task in tasks
      thread.run(()->{while(…){sendHeartBeat(task);metric = receiveHeartBeat(task);process(metric);sleep(3secs);});
{code}
The processing of metrics, which is the bottleneck is blocking the loop above (not scheduled
into AsyncDispatcher), so the heartbeat frequency will degrade gracefully as the number of
tasks increase. For example, it will be a little bit more than 3 seconds with 2 tasks. It
will be much longer like 40 seconds with 100000 tasks, but all the participants remain responsive
and no exceptions or errors are thrown. The previously suggested approach would unnecessarily
create rejected heartbeat messages on the network that may become the bottleneck at scale.
The actual code may use asynchronous calls not to create a thread for each task.

> AM lacks flow control for task events
> -------------------------------------
>
>                 Key: MAPREDUCE-5124
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Haibo Chen
>         Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events from tasks.
 If the AM is unable to keep pace with the rate of incoming events for a sufficient period
of time then it will eventually exhaust the heap and crash.  MAPREDUCE-5043 addressed a major
bottleneck for event processing, but the AM could still get behind if it's starved for CPU
and/or handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message