cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Allsopp (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (CASSANDRA-2975) Upgrade MurmurHash to version 3
Date Wed, 16 Nov 2011 10:20:52 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150931#comment-13150931
] 

David Allsopp edited comment on CASSANDRA-2975 at 11/16/11 10:20 AM:
---------------------------------------------------------------------

Does anyone have any data on what the typical key size is (i.e. the average input size for
the hash)?

I have a couple of optimisations for the MurmurHash3 implementation that I think give another
10-40% speedup, particularly for smaller values (e.g. 30% speedup for buffer lengths under
256 bytes) and no worse for large values (tens of KB). These results were on a AMD Phenom
II X6 1055T @ 2.80 GHz, under 64-bit Windows 7, Java 1.6.0_27.

Firstly, inline the {{rotl64}} calls , e.g. so that
{noformat}
k1 = rotl64(k1, 31);
{noformat}
becomes
{noformat}
k1 = (k1 << 31) | (k1 >>> 33);
{noformat}

-Secondly, rather than a large {{switch-case}} to handle the 'tail', use nested {{if-else}}
to form a simple binary search. Particularly for relatively small inputs, handling the 'tail'
is a significant part of the computation. E.g:-

{noformat}
int ln = length & 15;
if (ln > 8)
  {
     if (ln > 12)
       {
          // etc for cases 13 - 15
       }
     else
       {
          // cases 11 and 12
       }

  }
else
  {
     // etc for cases 1-7
  }
{noformat}

Will try to post a proper benchmark when I've tidied it up (run out of time today!) so anyone
interested can try it on other hardware...

-This latter optimisation is pretty verbose and ugly to look at - it _might_ be just as fast,
and much more concise, to lookup the offsets and shifts from an array, and deal with the special
cases 1 and 9 as, well, special cases - but haven't benchmarked this alternative yet...-
                
      was (Author: dallsopp):
    Does anyone have any data on what the typical key size is (i.e. the average input size
for the hash)?

I have a couple of optimisations for the MurmurHash3 implementation that I think give another
10-40% speedup, particularly for smaller values (e.g. 30% speedup for buffer lengths under
256 bytes) and no worse for large values (tens of KB). These results were on a AMD Phenom
II X6 1055T @ 2.80 GHz, under 64-bit Windows 7, Java 1.6.0_27.

Firstly, inline the {{rotl64}} calls , e.g. so that
{noformat}
k1 = rotl64(k1, 31);
{noformat}
becomes
{noformat}
k1 = (k1 << 31) | (k1 >>> 33);
{noformat}

Secondly, rather than a large {{switch-case}} to handle the 'tail', use nested {{if-else}}
to form a simple binary search. Particularly for relatively small inputs, handling the 'tail'
is a significant part of the computation. E.g:

{noformat}
int ln = length & 15;
if (ln > 8)
  {
     if (ln > 12)
       {
          // etc for cases 13 - 15
       }
     else
       {
          // cases 11 and 12
       }

  }
else
  {
     // etc for cases 1-7
  }
{noformat}

Will try to post a proper benchmark when I've tidied it up (run out of time today!) so anyone
interested can try it on other hardware...

This latter optimisation is pretty verbose and ugly to look at - it _might_ be just as fast,
and much more concise, to lookup the offsets and shifts from an array, and deal with the special
cases 1 and 9 as, well, special cases - but haven't benchmarked this alternative yet...
                  
> Upgrade MurmurHash to version 3
> -------------------------------
>
>                 Key: CASSANDRA-2975
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Brian Lindauer
>            Assignee: Brian Lindauer
>            Priority: Trivial
>              Labels: lhf
>             Fix For: 1.1
>
>         Attachments: 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch,
0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch
>
>
> MurmurHash version 3 was finalized on June 3. It provides an enormous speedup and increased
robustness over version 2, which is implemented in Cassandra. Information here:
> http://code.google.com/p/smhasher/
> The reference implementation is here:
> http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136&r=136
> I have already done the work to port the (public domain) reference implementation to
Java in the MurmurHash class and updated the BloomFilter class to use the new implementation:
> https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93
> Apart from the faster hash time, the new version only requires one call to hash() rather
than 2, since it returns 128 bits of hash instead of 64.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message