incubator-s4-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthieu Morel (Commented) (JIRA)" <>
Subject [jira] [Commented] (S4-30) DefaultHasher hashes keys to negative number
Date Thu, 22 Dec 2011 17:53:31 GMT


Matthieu Morel commented on S4-30:

thanks a lot Quoc, that's very useful!

What happens is that the calculated hash value is truncated to a _signed_ 32 bits number (contrary
to what I initially assumed).

I'm not exactly sure about the rationale for truncating to 32 bits, and I don't see an optimized
way to make sure we get a positive value when casting to int, maybe somebody has one?

In the meantime, we could simply use Math.abs (slower, but correct!) and probably replace:
{code}return rv & 0xffffffffL;{code}


{code}return Math.abs((int)(rv & 0xffffffffL));{code} that we make sure we have a positive value when we cast to an integer.

We might also add regression tests such as those from twitter's utility library
> DefaultHasher hashes keys to negative number
> --------------------------------------------
>                 Key: S4-30
>                 URL:
>             Project: Apache S4
>          Issue Type: Bug
>    Affects Versions: 0.4
>         Environment: All - Windows and Linux
>            Reporter: Quoc Nguyen
>            Priority: Blocker
> DefaultHasher uses HashAlgorithm hashAlgorithm = HashAlgorithm.FNV1_64_HASH; which hashes
key strings such as 118+18233, 118+17360, 118+17258, 118+18147 and 118+18121 and many more
to negative values which the DefaultPartitioner (int partitionId = (int) (hasher.hash(stringValue)
% partitionCount);) tries to partition the key to incorrect partition.
> Workaround:
> None - stream has those keys, they will get dropped because the partitioner cannot correctly

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message