hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: HBase Bloom filters hash
Date Mon, 22 Oct 2007 20:28:23 GMT
Jim Kellerman wrote:
>> -----Original Message-----
>> From: Andrzej Bialecki [mailto:ab@getopt.org]
>> Sent: Monday, October 22, 2007 11:45 AM
>> To: hadoop-dev@lucene.apache.org
>> Subject: HBase Bloom filters hash
>>
>> Hi,
>>
>> I'm curious why the hashing function that these filters use
>> is based on SHA-1 (which is relatively slow to compute)
>> instead of a bunch of fast and simple non-cryptographic
>> functions such as Jenkins' hash (see
>> http://bretm.home.comcast.net/hash/7.html
>> for the evaluation of Jenkins hash).
> 
> The reason for SHA-1 is that it was what came with the open
> source bloom filter implementation we used. We've been focused
> on just getting things to work and not on performance, yet.
> 
> If you'd like to open a Jira, it will be on the list of things
> to do - sometime. If this is really important to you, how about
> submitting a patch? There's mostly just the two of us, +
> contributors, so we have to put our priorities on bug fixing
> and making HBase robust before we get around to adding more
> features or doing performance analysis.
> 
> We'd really like to get more contributions... the project would
> mature much more rapidly.

That's perfectly understandable. I'm still in the discovery phase, and 
it was something that caught my eye. I agree, it's a relatively low 
priotity - I happen to have a Jenkins hash impl. in Java, if I find some 
free time I'll prepare a patch.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message