crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-368) TupleWritable.Comparator
Date Wed, 02 Apr 2014 14:43:15 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957697#comment-13957697
] 

Chao Shi commented on CRUNCH-368:
---------------------------------

Yes, agree with you that this could be rarely happened. So I think it is reasonable to compare
on type code first. With this, we can simply skip calling the real comparator, which may likely
fail though.

Another for this is about the implementation of the new comparator. In compareField(), it
tries to get the comparator of the inner writable type, which is registered per-type. If comparison
on different writable type is allowed, we would have to fallback to the old comparator.

> TupleWritable.Comparator
> ------------------------
>
>                 Key: CRUNCH-368
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-368
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.10.0, 0.8.3
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>         Attachments: crunch-368 benchmark.pdf, crunch-368.patch, gen_data.py
>
>
> This patch should improve comparison performance on TupleWritables. It saves the deserialization
overhead. It is particularly useful when the input tuple are large, e.g. contains long strings.
> Please note that this changes the binary format of TupleWritable. It adds a var-int indicating
size of field after each type code. This is a limitation of the writable system. We do not
know the size of each field until fully desalinizing it. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message