hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Re: Overhead of Bloomfilters
Date Tue, 25 Jan 2011 18:31:12 GMT
Thanks Nicolas,

I was after the actual size of it though, I assume you have no
disclosable numbers? Just curious.If not I guess the best is to run a
YSCB! or even PE to load a BF enables table and then check the HFile
output? Does that print the BF sizes too?


On Tue, Jan 25, 2011 at 4:11 PM, Nicolas Spiegelberg
<nspiegelberg@fb.com> wrote:
> A great article for Bloom Filter rules of thumb:
> http://corte.si/posts/code/bloom-filter-rules-of-thumb/
> Note that only rules #1 & #2 apply for our use case. Rule #3, while true, isn't as
big a worry because we use combinatorial generation for hashes, so the number of 'expensive'
hash calculations is 2, no matter how many hash functions need to be generated.   Note that
this drastically (400%+) sped up our BloomFilter.add() speed.
> Sent from my iPhone
> On Jan 25, 2011, at 6:22 AM, "Lars George" <lars.george@gmail.com> wrote:
>> Hi,
>> (Probably aimed at Nicolas)
>> Do we have a (rough) formula of overhead, i.e. the size of the
>> bloomfilters for row and col granularity as for example depending on
>> the KV count and average sizes (as reported by the HFile main()
>> helper)?
>> Thanks,
>> Lars

View raw message