tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-348) Improve how completion events are fetched by the ShuffleHandler
Date Mon, 26 Aug 2013 18:55:53 GMT

    [ https://issues.apache.org/jira/browse/TEZ-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750404#comment-13750404

Siddharth Seth commented on TEZ-348:

With pull based requests, the number of handlers is restricted by the RPC configuration (not
the number of connections afaik).

For an AM push, the child will likely have to start a Hadoop RPC server for security purposes.
(Netty HTTP may be an option as well similar to the ShuffleHandler). The AM ends up making
a lot of outgoing connections in that case though. Not sure how well this will work. If this
work well, this would be ideal in terms of starting the fetch as soon as more information
is available.
> Improve how completion events are fetched by the ShuffleHandler
> ---------------------------------------------------------------
>                 Key: TEZ-348
>                 URL: https://issues.apache.org/jira/browse/TEZ-348
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Hitesh Shah
> The ShuffleHandler currently has a thread which checks for new completion events every
second. This can add unnecessary delay to the reduce getting started.
> An async RPC implementation would work well to fix this. That, however, is currently
not available in Hadoop.
> Options
> - Poll with a smaller interval. This can overload the AM if there's a large number of
reduce tasks. The poll interval could be set based on the # of tasks.
> - Have the AM push completion events to the Task. AM ends up creating way too many connections,
and the child has to run an RPC server.
> - Rely on an external service like ZK with monitors.
> Thoughts / suggestions on how this can be improved ?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message