hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David G. Boney" <ch...@austin-acm-sigkdd.org>
Subject Bloom filter based scanner/filter
Date Tue, 15 Jan 2013 23:24:52 GMT
I am building a data cube on top of HBase. All access to the data is by map/reduce jobs. I
want to build a scanner where its first matching criteria is based on the set intersection
of bloom filters, followed by additional matching criteria specified in the current filter
architecture. First, I run a map/reduce job on table A. For every row I match in table A,
I add the row key to a bloom filter. I then do a map/reduce job on table B, where the row
keys are over the same domain as table A. I want to build a scanner that can use the builtin
Bloom filters in HBase. When the scanner goes to get the block of data to which a row key
based bloom filter is attached, it does a set intersection with the table A bloom filter to
see if any of the keys from Table A are in the block. If so, the block is read in and the
the scanner does addition matching on the rows according to the filter.

This is a simplification of my problem. I am trying to find out what the complexity of implementing
such a feature would be in HBase.
David G. Boney
Chair, Austin ACM SIGKDD

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message