hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Holstad (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1200) Add bloomfilters to hfile; use dynamicbloomfilter instead of base bloomfilter; depend on hadoop 0.20
Date Tue, 17 Feb 2009 19:11:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674308#action_12674308
] 

Erik Holstad commented on HBASE-1200:
-------------------------------------

I think that the user should have an option to not use bloom filters, even though I can't
really see
why you wouldn't, but still have an option to do so. I also think that we should try to go
towards
row+column like BT. Using the Dynamic bloom filter seems like a reasonable way to go, the
only 
thing I can see is that we are still going to have an overhead, even though it is smaller
than now.
So if possible wait until we know the exact number and then create the filter. Not sure what
the time
loss will be for the flush doing it this way, but that could be tested.

> Add bloomfilters to hfile; use dynamicbloomfilter instead of base bloomfilter; depend
on hadoop 0.20
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1200
>                 URL: https://issues.apache.org/jira/browse/HBASE-1200
>             Project: Hadoop HBase
>          Issue Type: Task
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.20.0
>
>
> Add bloomfiltering to hfile.  Should it be optional or on always?  Currently, we bloom
filter rows only, not the column + ts component, which seems good place to start but we size
the bloomfilter with the number of entries we are about to flush which seems like usually
we'd be making a filter too big.  How to figure how many rows in the flush?   We should use
the DynamicBloomFilter as Andrezj does up in hadoop BloomFilterMapFile.  Start small and let
it resize as entries are added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message