hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2302) Streaming should provide an option for numerical sort of keys
Date Tue, 22 Jul 2008 19:03:31 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Devaraj Das updated HADOOP-2302:
--------------------------------

    Attachment: 2302.1.patch

A reasonably well tested patch. The following is done:
1) Options supported are -n (numeric comparison) and -r (reverse the result of comparison).
So for e.g., one could say "-k1.1,1.2 -k2.1,2.3n -k2.4,3.1nr" as the value of mapred.text.key.comparator.job
option (that the comparator understands).
2) Some refactoring is done - I needed access to the findBytes method defined in Streaming.UTF8ByteArrayUtils.
But since this comparator implementation need not be dependent on the Streaming package, I
made a new class org.apache.hadoop.util.UTF8ByteArrayUtils and filled that up with the "real"
bytearray util methods. A few Streaming specific methods like findTab also exists in the Streaming.UTF8ByteArrayUtils.
I moved them to a new class called Streaming.StreamKeyValUtil. All in all, i introduced two
new classes and deprecated Streaming.UTF8ByteArrayUtils.
3) There is a partitioner function defined that would take a hash of just the portions of
the keys that the user is interested in (using the same spec as the one defined for the comparator).

A note - the numCompare method in the comparator may be slightly verbose in terms of the code
but that should help readability of the code.

>  Streaming should provide an option for numerical sort of keys
> --------------------------------------------------------------
>
>                 Key: HADOOP-2302
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2302
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Lohit Vijayarenu
>            Assignee: Devaraj Das
>             Fix For: 0.19.0
>
>         Attachments: 2302.1.patch
>
>
> It would be good to have an option for numerical sort of keys for streaming. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message