hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/UsingBloomFilters" by izaakrubin
Date Tue, 22 Jul 2008 18:05:57 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by izaakrubin:
http://wiki.apache.org/hadoop/Hbase/UsingBloomFilters

------------------------------------------------------------------------------
  Bloom filters can be enabled on a per-column family basis in Hbase. 
- There are three bloom filter variants supported:
+ There are four bloom filter variants supported:
   1. A [http://portal.acm.org/citation.cfm?id=362692&dl=ACM&coll=portal bloom filter]
as defined by Bloom in 1970.
   1. A [http://portal.acm.org/citation.cfm?id=343571.343572 counting bloom filter] as defined
by Fan et al. in a ToN 2000 paper.
   1. A [http://www-rp.lip6.fr/site_npa/site_rp/_publications/740-rbf_cameraready.pdf retouched
bloom filter] as described in the CoNEXT 2006 paper.
+  1. A [http://www.cse.fau.edu/~jie/research/publications/Publication_files/infocom2006.pdf
dynamic bloom filter] as defined in the INFOCOM 2006 paper.
  
+ Bloom filters can be instantiated by specifying the vector size and the number of hash functions.
 Dynamic bloom filters require an additional argument, a threshold for the maximum number
of keys to record in a row.  
- There are two ways in which a bloom filter can be instantiated:
-  1. by supplying the estimated number of values, in which case HBase selects the number
of hash functions to be 4 and computes the vector size from the formula
-   {{{size = number-of-values * number-of-hashfunctions / ln(2) }}}
  
+ Junit testing for these four bloom filters can be found in hbase.regionserver.!TestBloomFilters.
-  This formula was presented in [http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/BloomFilterSurvey.pdf
Network Applications of Bloom Filters: A Survey, by Broder and Mitzenmacher]
-  1.#2 by specifying the vector size and the number of hash functions explicitly.
- 
- Both of these techniques are demonstrated in the Junit test hbase.!TestBloomFilters.
  
  '''Additional Resources:'''
  

Mime
View raw message