Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 34138 invoked from network); 22 Oct 2007 20:19:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Oct 2007 20:19:09 -0000 Received: (qmail 96100 invoked by uid 500); 22 Oct 2007 20:18:56 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 96063 invoked by uid 500); 22 Oct 2007 20:18:56 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 96054 invoked by uid 99); 22 Oct 2007 20:18:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Oct 2007 13:18:56 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jim@powerset.com designates 207.5.72.94 as permitted sender) Received: from [207.5.72.94] (HELO exhub015-2.exch015.msoutlookonline.net) (207.5.72.94) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Oct 2007 20:18:59 +0000 Received: from EXVMBX015-1.exch015.msoutlookonline.net ([207.5.72.71]) by exhub015-2.exch015.msoutlookonline.net ([207.5.72.94]) with mapi; Mon, 22 Oct 2007 13:18:16 -0700 From: Jim Kellerman To: "hadoop-dev@lucene.apache.org" Date: Mon, 22 Oct 2007 13:18:19 -0700 Subject: RE: HBase Bloom filters hash Thread-Topic: HBase Bloom filters hash Thread-Index: AcgU2+zsmrBQepvGT8WY0lrakkIjxgACP5LQ Message-ID: <84E2AE771361E9419DD0EFBD31F09C4D4A94D250B6@EXVMBX015-1.exch015.msoutlookonline.net> References: <471CEFA9.7040900@getopt.org> In-Reply-To: <471CEFA9.7040900@getopt.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org > -----Original Message----- > From: Andrzej Bialecki [mailto:ab@getopt.org] > Sent: Monday, October 22, 2007 11:45 AM > To: hadoop-dev@lucene.apache.org > Subject: HBase Bloom filters hash > > Hi, > > I'm curious why the hashing function that these filters use > is based on SHA-1 (which is relatively slow to compute) > instead of a bunch of fast and simple non-cryptographic > functions such as Jenkins' hash (see > http://bretm.home.comcast.net/hash/7.html > for the evaluation of Jenkins hash). The reason for SHA-1 is that it was what came with the open source bloom filter implementation we used. We've been focused on just getting things to work and not on performance, yet. If you'd like to open a Jira, it will be on the list of things to do - sometime. If this is really important to you, how about submitting a patch? There's mostly just the two of us, + contributors, so we have to put our priorities on bug fixing and making HBase robust before we get around to adding more features or doing performance analysis. We'd really like to get more contributions... the project would mature much more rapidly. Hope this helps. --- Jim Kellerman, Senior Engineer; Powerset jim@powerset.com