datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jian wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DATAFU-31) bags.DistinctBy works incorrectly on string containing minuses
Date Sun, 16 Feb 2014 13:03:21 GMT

    [ https://issues.apache.org/jira/browse/DATAFU-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902702#comment-13902702
] 

jian wang commented on DATAFU-31:
---------------------------------

Although the simple fix is to change the delimter to ',' or some other characters(such as
invisible characters like 0x01), propose to change the getDelimitedDistinctString() method.
Instead of returning "-" separated distinct string, the proposed method "getDistinctFieldTuple"
is to return a new tuple containing the subset of specified fields and use it as the key.









> bags.DistinctBy works incorrectly on string containing minuses
> --------------------------------------------------------------
>
>                 Key: DATAFU-31
>                 URL: https://issues.apache.org/jira/browse/DATAFU-31
>             Project: DataFu
>          Issue Type: Bug
>    Affects Versions: 1.3.0
>            Reporter: Roman Borisov
>         Attachments: 0001-fix-issue-bags.DistinctBy-works-incorrectly.patch
>
>
> How to reproduce:
> Input:
> {(a-b,c), (a-b,d)}
> define distinct as DistinctBy('1')
> input = load 'input' as vs:bag{(v0:chararray,v1:chararray)};
> output = foreach input generate distinct(vs);
> dump output;
> expected: {(a-b,c), (a-b,d)}
> actual: {(a-b,c)}
> The bug is caused by the implementation based on splitting the tuple string by '-' to
get tuple parts.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message