hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4143) Support for a "raw" Partitioner that partitions based on the serialized key and not record objects
Date Wed, 10 Sep 2008 18:47:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629908#action_12629908
] 

Chris Douglas commented on HADOOP-4143:
---------------------------------------

The performance reasons are pretty limited to "memcmp" types like Text and BytesWritable.
Since the partitioner is called from collect when we still have the cooked records, the only
motivation would be in support of partitioners like the one used in the terasort example.
I talked offline with Owen about this, and he makes the case that a "MemComparable" interface
to the aforementioned types would probably be more than sufficient for practical uses, more
readable than the partitioner handling different/layered length encodings, and a more general
abstraction than this is.

The only remaining reason would be the aforementioned space/time tradeoff, saving an int per
record while adding a call to the partitioner for each compare in the sort. If this effected
any improvement in running time, it would probably be noise at best and likely inferior to
a better configuration.

I don't usually like "tagging" types, but the MemComparable interface will not only resolve
any case this would, but could also help with RawComparator impl, table stores, etc. This
was conceived as a way to avoid that, but it's clearly not an improvement on it and should
probably be closed as "Won't fix".

> Support for a "raw" Partitioner that partitions based on the serialized key and not record
objects
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4143
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>         Attachments: 4143-0.patch
>
>
> For some partitioners (particularly those using comparators to classify keys), it would
be helpful if one could specify a "raw" partitioner that would receive the serialized version
of the key rather than the object emitted from the map.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message