flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Richter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3623) Adjust MurmurHash algorithm
Date Tue, 09 Aug 2016 19:44:20 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414110#comment-15414110
] 

Stefan Richter commented on FLINK-3623:
---------------------------------------

In my opinion, our implementation of Murmur might be too complicated and computationally expensive
for simply hashing an int.  Most parts of the used algorithm are only required for arbitrary
byte[] keys. What matters for mixing the bits of an int is the finalizer part of Murmur3,
which is:
  
  h ^= h >> 16;
  h *= 0x85ebca6b;
  h ^= h >> 13;
  h *= 0xc2b2ae35;
  h ^= h >> 16;

If I remember correctly, the purpose of the remaining code is to generate the intermediate
hash from blocks over variable size byte[], whereas the purpose of the finalizer is to actually
increase entropy and reduce collisions of the intermediate hash. Restricting the code to these
lines could speed up hashing significantly, which might be even more relevant in case hash
calculations are also used to determine keygroups.

> Adjust MurmurHash algorithm
> ---------------------------
>
>                 Key: FLINK-3623
>                 URL: https://issues.apache.org/jira/browse/FLINK-3623
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination
>    Affects Versions: 1.1.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>            Priority: Trivial
>             Fix For: 1.1.0
>
>
> Flink's MurmurHash implementation differs from the published algorithm.
> From Flink's MathUtils.java:
> {code}
> code *= 0xe6546b64;
> {code}
> The Murmur3_32 algorithm as described by [Wikipedia|https://en.wikipedia.org/wiki/MurmurHash]:
> {code}
> m ← 5
> n ← 0xe6546b64
> hash ← hash × m + n
> {code}
> and in Guava's Murmur3_32HashFunction.java:
> {code}
> h1 = h1 * 5 + 0xe6546b64;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message