Return-Path: Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: (qmail 99926 invoked from network); 25 Jan 2011 17:31:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Jan 2011 17:31:48 -0000 Received: (qmail 77279 invoked by uid 500); 25 Jan 2011 17:31:48 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 77211 invoked by uid 500); 25 Jan 2011 17:31:47 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 77203 invoked by uid 99); 25 Jan 2011 17:31:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Jan 2011 17:31:47 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.216.41] (HELO mail-qw0-f41.google.com) (209.85.216.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Jan 2011 17:31:42 +0000 Received: by qwa26 with SMTP id 26so9413qwa.14 for ; Tue, 25 Jan 2011 09:31:21 -0800 (PST) Received: by 10.229.236.134 with SMTP id kk6mr4842220qcb.93.1295976680998; Tue, 25 Jan 2011 09:31:20 -0800 (PST) MIME-Version: 1.0 Received: by 10.220.190.75 with HTTP; Tue, 25 Jan 2011 09:31:00 -0800 (PST) X-Originating-IP: [64.105.168.204] In-Reply-To: <525EB08B-7390-409F-A77B-BC35FF90D7E4@TheFacebook.com> References: <525EB08B-7390-409F-A77B-BC35FF90D7E4@TheFacebook.com> From: Ted Dunning Date: Tue, 25 Jan 2011 09:31:00 -0800 Message-ID: Subject: Re: Overhead of Bloomfilters To: dev@hbase.apache.org Content-Type: multipart/alternative; boundary=0016e64cbb30b548db049aaf1528 --0016e64cbb30b548db049aaf1528 Content-Type: text/plain; charset=ISO-8859-1 See http://en.wikipedia.org/wiki/Double_hashing for information on double hashing. On Tue, Jan 25, 2011 at 8:11 AM, Nicolas Spiegelberg wrote: > A great article for Bloom Filter rules of thumb: > > http://corte.si/posts/code/bloom-filter-rules-of-thumb/ > > Note that only rules #1 & #2 apply for our use case. Rule #3, while true, > isn't as big a worry because we use combinatorial generation for hashes, so > the number of 'expensive' hash calculations is 2, no matter how many hash > functions need to be generated. Note that this drastically (400%+) sped up > our BloomFilter.add() speed. > > Sent from my iPhone > > On Jan 25, 2011, at 6:22 AM, "Lars George" wrote: > > > Hi, > > > > (Probably aimed at Nicolas) > > > > Do we have a (rough) formula of overhead, i.e. the size of the > > bloomfilters for row and col granularity as for example depending on > > the KV count and average sizes (as reported by the HFile main() > > helper)? > > > > Thanks, > > Lars > --0016e64cbb30b548db049aaf1528--