hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: Overhead of Bloomfilters
Date Tue, 25 Jan 2011 17:31:00 GMT
See http://en.wikipedia.org/wiki/Double_hashing for information on double
hashing.

On Tue, Jan 25, 2011 at 8:11 AM, Nicolas Spiegelberg <nspiegelberg@fb.com>wrote:

> A great article for Bloom Filter rules of thumb:
>
> http://corte.si/posts/code/bloom-filter-rules-of-thumb/
>
> Note that only rules #1 & #2 apply for our use case. Rule #3, while true,
> isn't as big a worry because we use combinatorial generation for hashes, so
> the number of 'expensive' hash calculations is 2, no matter how many hash
> functions need to be generated.   Note that this drastically (400%+) sped up
> our BloomFilter.add() speed.
>
> Sent from my iPhone
>
> On Jan 25, 2011, at 6:22 AM, "Lars George" <lars.george@gmail.com> wrote:
>
> > Hi,
> >
> > (Probably aimed at Nicolas)
> >
> > Do we have a (rough) formula of overhead, i.e. the size of the
> > bloomfilters for row and col granularity as for example depending on
> > the KV count and average sizes (as reported by the HFile main()
> > helper)?
> >
> > Thanks,
> > Lars
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message