hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Klaas Bosteels (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5528) Binary partitioner
Date Fri, 27 Mar 2009 07:42:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689842#action_12689842

Klaas Bosteels commented on HADOOP-5528:


* 3:-1 does indeed not refer to the last two bytes, but
** that's how it works in Python as well:
>>> l = [1,2,3,4,5]
>>> l[1:-2], l[1:3], l[3:-1]
([2, 3], [2, 3], [4])
** you can specify "the last n" bytes by setting only the left offset (because {{LastIndexer}}
is the default right indexer), which is also how you do it in Python:
>>> l[-2:]
[4, 5]
** because of the {{min}} in the {{PosOffsetIndexer}}, you can also just set the right offset
to a large enough number to get "the last n" bytes.
* I don't think that -1 should be the default right offset, since that would mean that the
last byte is ignored by default.
* It might indeed be possible to use {{(index + key.getLength()) % key.getLength()}} for both
negative and positive offsets, but we need a separate indexer to implement the default right
index anyway, and I think it makes sense to minimize the required computations by using more
specialized indexers.

So, personally, I think that:

* we need the indexer classes (and cannot use -1 as default right index),
* the max/min games are useful (and not merely a way of preventing exceptions),
* the semantics are correct,

which leaves me with nothing to change in the latest patch *smile* Can you agree with this,
or is there still something you want me to change nevertheless?

> Binary partitioner
> ------------------
>                 Key: HADOOP-5528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5528
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Klaas Bosteels
>            Assignee: Klaas Bosteels
>         Attachments: HADOOP-5528.patch, HADOOP-5528.patch, HADOOP-5528.patch, HADOOP-5528.patch
> It would be useful to have a {{BinaryPartitioner}} that partitions {{BinaryComparable}}
keys by hashing a configurable part of the bytes array corresponding to each key.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message