crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Vargo (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-528) Pair: Integer overflow during comparison cause inconsistent sort.
Date Wed, 27 May 2015 17:13:17 GMT
Brandon Vargo created CRUNCH-528:
------------------------------------

             Summary: Pair: Integer overflow during comparison cause inconsistent sort.
                 Key: CRUNCH-528
                 URL: https://issues.apache.org/jira/browse/CRUNCH-528
             Project: Crunch
          Issue Type: Bug
          Components: Core
            Reporter: Brandon Vargo
            Assignee: Josh Wills
            Priority: Minor


Pair uses the hash code of each value for comparison if the values are not themselves comparable.
If the hash code values are too large, then the values will wrap when doing subtraction. This
results in a comparison function that is not transitive.

Among other things, this makes Joins using the in-memory pipeline not work, since the in-memory
shuffler uses a TreeMap if the key type is Comparable. Since the key in a join is a Pair of
the original key and a join tag, the key is always comparable. With a non-transitive comparison
function, it is possible for the two join tags of the original key to sort differently, resulting
in the two join tags not being adjacent for the original key. This results either in either
the cross product erroneously producing no values in the case of an inner join, since the
two join tags are not adjacent, or null values appearing when they should not in the case
of an outer join.

As a workaround, ensure that the key used in a Join is comparable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message