hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get
Date Mon, 12 Jul 2010 17:55:51 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887443#action_12887443
] 

HBase Review Board commented on HBASE-2794:
-------------------------------------------

Message from: "Kris Jirapinyo" <kjirapinyo@attensity.com>


bq.  On 2010-07-12 10:17:25, Nicolas wrote:
bq.  > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 860
bq.  > <http://review.hbase.org/r/296/diff/1/?file=2378#file2378line860>
bq.  >
bq.  >     probably should pre-allocate the ArrayList() size so we only deal with one heap
element.

Good idea.


bq.  On 2010-07-12 10:17:25, Nicolas wrote:
bq.  > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 857
bq.  > <http://review.hbase.org/r/296/diff/1/?file=2378#file2378line857>
bq.  >
bq.  >     have you done any tests to see when the number of bloom checks takes significant
time compared to just getting the block?  For example, if you have 100 columns to lookup,
do bloom filters really buy you anything, or shouldn't you just switch to a Row-level bloom
anyways?  Also, with a default 1% error rate, you're looking at ~100% false positive with
100 columns.  Maybe max.columns = sqrt(1/error.rate)

I have not, but would running on just the test data be sufficent to tell the true savings
since the tests just run on mock data?  I don't really have a dev cluster with real data that
I can test this on, so perhaps you or someone could help out in that regard.


- Kris


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/296/#review350
-----------------------------------------------------------





> ROWCOL bloom filter not used if multiple columns within same family are requested in
a Get
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2794
>                 URL: https://issues.apache.org/jira/browse/HBASE-2794
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>         Attachments: 2794_multi_column_check.txt
>
>
> Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():
> {code}
>         switch(bloomFilterType) {
>           case ROW:
>             key = row;
>             break;
>           case ROWCOL:
>             if (columns.size() == 1) {
>               byte[] col = columns.first();
>               key = Bytes.add(row, col);
>               break;
>             }
>             //$FALL-THROUGH$
>           default:
>             return true;
>         }
> {code}
> If columns.size > 1, then we currently don't take advantage of the bloom filter. 
We should optimize this to check bloom for each of columns and if none of the columns are
present in the bloom avoid opening the file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message