hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13510) Purge ByteBloomFilter
Date Sat, 16 May 2015 07:20:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546613#comment-14546613
] 

ramkrishna.s.vasudevan commented on HBASE-13510:
------------------------------------------------

bq.(Anoop said use EMPTY_BYTES from HConstants)
Sorry I have not updated the diff here.  See RB for the latest diff.  It has that change.
bq.but Anoop does above so that is enough for me (for now – smile)
Not getting this point. You mean you want to create the hash keys based on Cells.  Ya we could
try that may be rewrite the hash function to pick the individual bytes from the Cell's components
like row, fam etc.
bq.The javadoc on BloomFilterChunk is about BloomFilters. Is BFC a BF or utility a BF could
use to make chunks? In javadoc, we don't say what a BFC is. If it is a BF, then why not call
it so? We have a BF in our code base already and it has javadoc on the class that is similar
to what is here. How does a BFC relate to a BF.
I have thought on all possibilities.  ByteBloomFilter is actually a building block for the
CompoundBloomFilter.  But in another comment it was decided not to continue ByteBF as a type
of BF.  So decided to remove it.  But once we remove then there is no need for a type call
ByteBF.  Hence thought will rename it.
I can change the javadoc.  In fact I thought about it but later decided not to because it
describes basically how the bloom works.

bq.Man, BloomFilterBase is and Interface? That'll throw folks off.
I think it was already there, but we just removed some API from that.  Since the ByteBF (now
BloomChunk) is the building block for the compoundBF we need to have some common APIs to getKeys,
getHashCount etc.
bq.Having a bit of a hard time navigating the hierarchy here with BloomFilter and BloomFilterBase
and BloomFilterChunk. 
When we started this offheap and came to this BLOOM I am finding it difficult to make this
hierarchy more easier.  Lot of dependencies are there between them.
bq.ByteBloomFilter seems like a better name than BFC yet we are removing it and putting in
place a new class named BFC that has a good bit of BBF. You don't want to just purge the unused
bits from BBF?
As I said in my earlier comment I can leave it as is - see my older patch I did not want to
rename it. 
I would say what this patch mainly does is it allows the Blooms to work with Cells and in
course of it we did not want to have two bloom types.  As currently CompoundBloomFilter is
the only thing we use.  ByteBloomFilter is only a building block.  





> Purge ByteBloomFilter
> ---------------------
>
>                 Key: HBASE-13510
>                 URL: https://issues.apache.org/jira/browse/HBASE-13510
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: HBASE-13510_1.patch, HBASE-13510_2.patch, HBASE-13510_3.patch
>
>
> In order to address the comments over in HBASE-10800 related to comparing Cell with a
serialized KV's key we had some need for that in Bloom filters.  After discussing with Anoop,
we found that it may be possible to remove/modify some of the APIs in the BloomFilter interfaces
and for doing that we can purge ByteBloomFilter.  
> I read the code and found that ByteBloomFilter was getting used in V1 version only. 
Now as it is obsolete we can remove this code and move some of the static APIs in ByteBloomFilter
to some other util class or bloom related classes which will help us in refactoring the code
too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message