crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-167) Sort.sortTuples and related methods write out duplicate values
Date Sun, 24 Feb 2013 21:06:14 GMT


Josh Wills updated CRUNCH-167:

    Attachment: CRUNCH-167-t2.patch

Thanks Gabriel-- updated the patch based on your feedback, and I added an integration test
that fails under the old Sort lib but works under the new one. This is the one I will commit.
> Sort.sortTuples and related methods write out duplicate values
> --------------------------------------------------------------
>                 Key: CRUNCH-167
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>    Affects Versions: 0.4.0, 0.5.0
>            Reporter: Josh Wills
>             Fix For: 0.6.0
>         Attachments: CRUNCH-167.patch, CRUNCH-167-t2.patch
> I noticed when I was debugging CRUNCH-166 that the strategy that the Sort.sortPairs,
sortTrips, etc. methods are using has the potential to write out duplicate values in cases
where we are only sorting/grouping on a subset of the fields, because all of the records that
have the same value for those sub-fields will be called as part of the same reduce() call,
where only a single one of the records that had the same set of values for those sub-fields
will be used as the key, and the rest of the values will have been thrown away.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message