hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2302) Streaming should provide an option for numerical sort of keys
Date Tue, 24 Jun 2008 18:46:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607721#action_12607721
] 

Devaraj Das commented on HADOOP-2302:
-------------------------------------

Would supporting the basic unix/gnu sort options in the comparator work:
-f,   (Ignore-case)
-n,  (Sort numerically)
-r,   (Reverse the result of comparison)
-k _pos1[,pos2]_, where pos is of the form _f[.c][opts]_, where _f_ is the number of the field
to use, and _c_ is the number of the first character from the beginning of the field. Fields
and character positions are numbered starting with 1; a character position of zero in pos2
indicates the field's last character. If '.c' is omitted from pos1, it defaults to 1 (the
beginning of the field); if omitted from pos2, it defaults to 0 (the end of the field). opts
are ordering options (any of _fnr_ as described above).

We assume that the fields in the key are separated by map.output.key.field.separator (already
exists).

Do we need anything else?

Also, this could be done in a Java comparator implementation that the user specifies to the
framework via mapred.output.key.comparator.class. This comparator would be used by both sort
and merge.

>  Streaming should provide an option for numerical sort of keys
> --------------------------------------------------------------
>
>                 Key: HADOOP-2302
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2302
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Lohit Vijayarenu
>
> It would be good to have an option for numerical sort of keys for streaming. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message