tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-338) Determine reduce task parallelism
Date Mon, 19 Aug 2013 20:39:48 GMT

    [ https://issues.apache.org/jira/browse/TEZ-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13744185#comment-13744185
] 

Siddharth Seth commented on TEZ-338:
------------------------------------

Mostly looks good.
Don't like using Configuration as the user payload - but this can be addressed at a later
point when we work out a better way to configure tez-engine components.

The partition range is only sent along with one dependency completion event. This seems like
it can lead to obscure bugs. I'm guessing this is for efficiency purposes ?
One of them: Shuffle.setReduceRange can end up being called multiple times since all DependencyCompletionEvents
may not always be fetched via a single call. The first element of each getDependencyCompletionEvent
call will have the shuffle range set, and will try to set it in "Shuffle", causing an exception.

- TEZ_AM_SHUFFLE_VERTEX_MANAGER_TASK_PARALLELISM_DEFAULT is missing MIN in the name
- Is ShuffleVertexManager supposed to use MRHelpers of TezUtils for 'createUserPayloadFromConf'.
A comment in the code indicating future user-code location would be useful.
- // TODO 2 second sleep!!!! - TEZ-375 ?

                
> Determine reduce task parallelism
> ---------------------------------
>
>                 Key: TEZ-338
>                 URL: https://issues.apache.org/jira/browse/TEZ-338
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>              Labels: TEZ-0.2.0
>         Attachments: TEZ-338.1.patch, TEZ-338.2.patch, TEZ-338.3.patch, TEZ-338.4.patch,
TEZ-338.5.patch
>
>
> Determine the parallelism of reduce tasks at runtime. This is important because its difficult
to determine this accurately before the job actually runs due to unknown data reduction ratios
in the intermediate stages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message