cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-13291) Replace usages of MessageDigest with Guava's Hasher
Date Wed, 27 Sep 2017 23:20:01 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183400#comment-16183400
] 

Jason Brown edited comment on CASSANDRA-13291 at 9/27/17 11:19 PM:
-------------------------------------------------------------------

I knocked out a quick [JMH bench|https://github.com/jasobrown/cassandra/commit/cd35b1a771a74c2bf1d3bf2c0916967e74821385]
to see what the difference between {{MessageDigest}} and {{Hasher}} would be. I selected guava's
MD5 and murmur3_128 hashers for comparison. Here's what I found:

{noformat}
     [java] Benchmark                            (bufferSize)  Mode  Cnt     Score      Error
 Units
     [java] HashingBench.benchHasherMD5                    31  avgt    5   336.613 ±   18.826
 ns/op
     [java] HashingBench.benchHasherMD5                   131  avgt    5   709.226 ±   19.489
 ns/op
     [java] HashingBench.benchHasherMD5                   517  avgt    5  1800.091 ±   37.748
 ns/op
     [java] HashingBench.benchHasherMD5                  2041  avgt    5  6275.607 ±  623.008
 ns/op
     [java] HashingBench.benchHasherMurmur3_128            31  avgt    5   260.859 ±   39.229
 ns/op
     [java] HashingBench.benchHasherMurmur3_128           131  avgt    5   421.268 ±   68.287
 ns/op
     [java] HashingBench.benchHasherMurmur3_128           517  avgt    5   861.577 ±   68.423
 ns/op
     [java] HashingBench.benchHasherMurmur3_128          2041  avgt    5  2863.952 ±  314.205
 ns/op
     [java] HashingBench.benchMessageDigestMD5             31  avgt    5   787.373 ±   69.869
 ns/op
     [java] HashingBench.benchMessageDigestMD5            131  avgt    5  1264.677 ±  117.790
 ns/op
     [java] HashingBench.benchMessageDigestMD5            517  avgt    5  2822.846 ±  178.416
 ns/op
     [java] HashingBench.benchMessageDigestMD5           2041  avgt    5  9611.875 ± 1760.809
 ns/op
{noformat}

Naively, I used byte arrays for four varying sizes, updated the hasher/digest, and got the
hashed result. I selected buffer sizes that are close to powers-of-2, but intentionally not.
It looks like the guava {{Hasher}} s do indeed perform better than {{MessageDigest}}.


was (Author: jasobrown):
I knocked out a quick [JMH bench|https://github.com/jasobrown/cassandra/commit/cd35b1a771a74c2bf1d3bf2c0916967e74821385]
to see what the difference between {{MessageDigest}} and {{Hasher}} would be. I selected guava's
MD5 and murmur3_128 hashers for comparison. Here's what I found:

{noformat}
     [java] Benchmark                            (bufferSize)  Mode  Cnt     Score      Error
 Units
     [java] HashingBench.benchHasherMD5                    31  avgt    5   336.613 ±   18.826
 ns/op
     [java] HashingBench.benchHasherMD5                   131  avgt    5   709.226 ±   19.489
 ns/op
     [java] HashingBench.benchHasherMD5                   517  avgt    5  1800.091 ±   37.748
 ns/op
     [java] HashingBench.benchHasherMD5                  2041  avgt    5  6275.607 ±  623.008
 ns/op
     [java] HashingBench.benchHasherMurmur3_128            31  avgt    5   260.859 ±   39.229
 ns/op
     [java] HashingBench.benchHasherMurmur3_128           131  avgt    5   421.268 ±   68.287
 ns/op
     [java] HashingBench.benchHasherMurmur3_128           517  avgt    5   861.577 ±   68.423
 ns/op
     [java] HashingBench.benchHasherMurmur3_128          2041  avgt    5  2863.952 ±  314.205
 ns/op
     [java] HashingBench.benchMessageDigestMD5             31  avgt    5   787.373 ±   69.869
 ns/op
     [java] HashingBench.benchMessageDigestMD5            131  avgt    5  1264.677 ±  117.790
 ns/op
     [java] HashingBench.benchMessageDigestMD5            517  avgt    5  2822.846 ±  178.416
 ns/op
     [java] HashingBench.benchMessageDigestMD5           2041  avgt    5  9611.875 ± 1760.809
 ns/op
{noformat}

Naively, I used byte arrays for four varying sizes, updated the hasher/digest, and got the
hashed result. I selected buffer sizes that are close to powers-of-2, but intentionally not.
It looks like the guava {{Hasher}}s do indeed perform better than {{MessageDigest}}.

> Replace usages of MessageDigest with Guava's Hasher
> ---------------------------------------------------
>
>                 Key: CASSANDRA-13291
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13291
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Michael Kjellman
>            Assignee: Michael Kjellman
>         Attachments: CASSANDRA-13291-trunk.diff
>
>
> During my profiling of C* I frequently see lots of aggregate time across threads being
spent inside the MD5 MessageDigest implementation. Given that there are tons of modern alternative
hashing functions better than MD5 available -- both in terms of providing better collision
resistance and actual computational speed -- I wanted to switch out our usage of MD5 for alternatives
(like adler128 or murmur3_128) and test for performance improvements.
> Unfortunately, I found given the fact we use MessageDigest everywhere --  switching out
the hashing function to something like adler128 or murmur3_128 (for example) -- which don't
ship with the JDK --  wasn't straight forward.
> The goal of this ticket is to propose switching out usages of MessageDigest directly
in favor of Hasher from Guava. This means going forward we can change a single line of code
to switch the hashing algorithm being used (assuming there is an implementation in Guava).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message