incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kocoloski (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-757) crypto:md5 vs erlang:md5
Date Tue, 04 May 2010 19:19:55 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863955#action_12863955
] 

Adam Kocoloski commented on COUCHDB-757:
----------------------------------------

Cool, nice find Filipe.  In the interest of not relying too heavily on crypto we could add
couch_util:md5*, e.g.

md5(Data) ->
    try crypto:md5(Data) catch error:_ -> erlang:md5(Data) end.

I didn't notice any performance hit from the extra function call and try..catch wrapper. 
Of course CouchDB still depends on crypto in other places, but at least this patch wouldn't
tie us any more closely to it.

> crypto:md5 vs erlang:md5
> ------------------------
>
>                 Key: COUCHDB-757
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-757
>             Project: CouchDB
>          Issue Type: Improvement
>         Environment: GNU/Linux
>            Reporter: Filipe Manana
>         Attachments: crypto_md5.patch
>
>
> Just noticed that crypto:md5 is faster than erlang:md5 by about an order of magnitude
when hashing just 8Kb or 4Kb of data.
> Basically we use md5 hashing when writing and reading documents and attachments through
couch_file and couch_stream.
> Eshell V5.8  (abort with ^G)
> 1> crypto:start().
> ok
> 2> Bin1 = crypto:rand_bytes(4 * 1024).
> <<92,239,233,29,1,237,96,193,188,97,4,72,51,90,96,91,187,
>   112,112,198,7,173,105,99,205,65,105,94,144,...>>
> 3>        
> 3> {T1, _} = timer:tc(erlang, md5, [Bin1]).
> {211,
>  <<20,235,111,74,212,254,194,144,49,70,205,105,124,106,
>    131,230>>}
> 4> 
> 4> {T2, _} = timer:tc(crypto, md5, [Bin1]).
> {60,
>  <<20,235,111,74,212,254,194,144,49,70,205,105,124,106,
>    131,230>>}
> 5> 
> 5> Bin2 = crypto:rand_bytes(8 * 1024).     
> <<246,66,158,227,62,127,62,239,202,232,133,244,191,9,136,
>   6,164,179,109,166,253,41,144,185,177,39,177,88,142,...>>
> 6> 
> 6> {T3, _} = timer:tc(erlang, md5, [Bin2]).
> {446,
>  <<7,55,252,42,249,30,58,22,245,12,111,82,131,58,199,51>>}
> 7> 
> 7> {T4, _} = timer:tc(crypto, md5, [Bin2]).
> {77,
>  <<7,55,252,42,249,30,58,22,245,12,111,82,131,58,199,51>>}
> 8> 
> I know there's a ticket around with the goal of the possibility to remove the dependency
on the crypto module, but for environments where this is not a problem it would be a plus.
> Made a test that wrote 400 attachments with about 60Kbs and noticed an average response
time of 0.16s versus 0.18s (erlang:md5).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message