incubator-s4-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthieu Morel (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (S4-30) DefaultHasher hashes keys to negative number
Date Thu, 22 Dec 2011 17:53:31 GMT

    [ https://issues.apache.org/jira/browse/S4-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174929#comment-13174929
] 

Matthieu Morel commented on S4-30:
----------------------------------

thanks a lot Quoc, that's very useful!

What happens is that the calculated hash value is truncated to a _signed_ 32 bits number (contrary
to what I initially assumed).

I'm not exactly sure about the rationale for truncating to 32 bits, and I don't see an optimized
way to make sure we get a positive value when casting to int, maybe somebody has one?

In the meantime, we could simply use Math.abs (slower, but correct!) and probably replace:
{code}return rv & 0xffffffffL;{code}

with 

{code}return Math.abs((int)(rv & 0xffffffffL));{code}

...so that we make sure we have a positive value when we cast to an integer.

We might also add regression tests such as those from twitter's utility library https://github.com/twitter/util/blob/master/util-hashing/src/test/scala/com/twitter/hashing/KeyHasherSpec.scala
                
> DefaultHasher hashes keys to negative number
> --------------------------------------------
>
>                 Key: S4-30
>                 URL: https://issues.apache.org/jira/browse/S4-30
>             Project: Apache S4
>          Issue Type: Bug
>    Affects Versions: 0.4
>         Environment: All - Windows and Linux
>            Reporter: Quoc Nguyen
>            Priority: Blocker
>
> DefaultHasher uses HashAlgorithm hashAlgorithm = HashAlgorithm.FNV1_64_HASH; which hashes
key strings such as 118+18233, 118+17360, 118+17258, 118+18147 and 118+18121 and many more
to negative values which the DefaultPartitioner (int partitionId = (int) (hasher.hash(stringValue)
% partitionCount);) tries to partition the key to incorrect partition.
> Workaround:
> None - stream has those keys, they will get dropped because the partitioner cannot correctly
partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message