tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-393) Handle custom partitioners
Date Tue, 27 Aug 2013 16:31:53 GMT

    [ https://issues.apache.org/jira/browse/TEZ-393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13751403#comment-13751403

Rohini Palaniswamy commented on TEZ-393:

For example, consider the case of doing range partitioning to achieve distributed order by.
If there are keys k1 to k100 and there are 10 reducers, and the range partitioning is such
that k1-10 goes to reducer 1, k11-20 goes to reducer 2. .. k91-100 goes to reducer 10 part-r-00000
to part-r-00009 will have data sorted when files are read in order. 

If you reduce this to 5 reducers, then k1-20 has to go to reducer 1, ..k81-100 has to go to
reducer 5. If that does not happen within a file data might still be sorted, but when you
read the files in order data will not be sorted. So how multiple partitions can be sent to
one reduce task will depend on the custom partitioner logic.
> Handle custom partitioners
> --------------------------
>                 Key: TEZ-393
>                 URL: https://issues.apache.org/jira/browse/TEZ-393
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>   With dynamic allocation of reducers, cases like range partitioning (for ORDER BY) need
to be handled properly. Also many users have other custom partitioners. So might be good to
have an option to turn of dynamic reducer allocation in case they don't work well with dynamic

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message