crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-280) Specify Comparator for total order sort
Date Thu, 17 Oct 2013 08:42:44 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797716#comment-13797716
] 

Chao Shi commented on CRUNCH-280:
---------------------------------

I found it difficult that MR needs RawComparator, which compares two buffers of serialized
records. But this would be not easy to use. I would be nice to support:
1) RawComparator, this is the most efficient way, but users must know the serialization format
in mind
2) normal Comparator class (with extra record serialization overhead)
3) a serializable Comparator object, whose in-memory state is serialized to MR workers (with
serialization overhead)

I found 2) and 3) are not easy, as I don't know how to deserialize data at runtime. Is it
possible [~jwills]?

> Specify Comparator for total order sort
> ---------------------------------------
>
>                 Key: CRUNCH-280
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-280
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>
> It seems that Sort#sort can only uses the default comparator. It would be nice to make
it to be specified by clients. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message