hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John <johnnyenglish...@gmail.com>
Subject storing custom bloomfilter/BitSet
Date Thu, 19 Sep 2013 22:36:35 GMT

Is there a way to store a custom BitSet for every row and add new bits
while importing? I can't use the bloomfilter that is already there because
in every columnnames are 2 elements.

Here is my scenario:
My table looks like this:
rowKey1 -> cf:<data1,data2>,  cf:<data3,data4>, ...
rowKey2 -> cf:<data234,data5>. ...

the columname includes data1 and data2.

This setup  works for me now, but I try to imrpove it. I'm using the
BulkLoad feature. At first I import a CSV file that looks like this:
rowKey1       cf                            <data1,data2>     5
rowKey1       cf                            <data3,data4>     8

For every hash in HASH_INDEX_1/2 I creat a new column with the index as a
name and the columnfamily "bloomfilter1" or "bloomfilter2". I store the
columname as a 4byte Integer String. For the Example above I would store
this: bloomfilter1:5 and bloomfilter2:12. This method works fine, but the
export and backtransformation to a BitSet become very slow if the
bloomfilter is to big (> 1 million). So a better solution would be to store
only the BitSet instead of a 4byte Integer for every index.

Does anyone now if it is possible to create this filter while importing the


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message