hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Klaas Bosteels (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-5528) Binary partitioner
Date Mon, 23 Mar 2009 11:19:50 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Klaas Bosteels updated HADOOP-5528:

    Attachment: HADOOP-5528.patch

The revised patch allows the subarray to be defined by means of Python-style offsets:

* {{mapred.binary.partitioner.left.offset}}: left Python-style offset in array
* {{mapred.binary.partitioner.right.offset}}: right Python-style offset in array

As indicated by Owen, the best way to remember how these offsets work is by thinking of them
as indices pointing between the array elements, with the left edge of the first element numbered
0, e.g.:

. +---+---+---+---+---+
  | B | B | B | B | B |
  0   1   2   3   4   5
 -5  -4  -3  -2  -1

 The first row of numbers gives the position of the offsets 0...5 in  the array; the second
row gives the corresponding negative offsets. When _i_ and _j_ are specified as left and right
offset, respectively, then all bytes between the edges labeled _i_ and _j_ are taken into
account for the partitioning.
More generally, the indexing logic can now be customized by specifying the {{BinaryPartitioner.Indexer}}
classes to be used via the following properties:

* {{mapred.binary.partitioner.left.indexer.class}}
* {{mapred.binary.partitioner.right.indexer.class}}

By default, {{FirstIndexer}} and {{LastIndexer}} are used (i.e. the whole byte array is taken
into account for the hashing), and the offset properties trigger the usage of {{PosOffsetIndexer}}
and/or {{NegOffsetIndexer}}, which implement the indexing by means of Python-style offsets.

> Binary partitioner
> ------------------
>                 Key: HADOOP-5528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5528
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Klaas Bosteels
>            Assignee: Klaas Bosteels
>         Attachments: HADOOP-5528.patch, HADOOP-5528.patch
> It would be useful to have a {{BinaryPartitioner}} that partitions {{BinaryComparable}}
keys by hashing a configurable part of the bytes array corresponding to each key.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message