hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get
Date Wed, 28 Sep 2011 16:16:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116578#comment-13116578
] 

jiraposter@reviews.apache.org commented on HBASE-2794:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2084/#review2130
-----------------------------------------------------------


nice work mikhail!  i will let someone else give the +1 though


src/main/java/org/apache/hadoop/hbase/KeyValue.java
<https://reviews.apache.org/r/2084/#comment4946>

    method doesn't actually take a KeyValue... this is to create the last KV the on row and
column for the KeyValue this is called on?



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
<https://reviews.apache.org/r/2084/#comment4947>

    got it.  maybe add a comment on this method to explain this usage



src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
<https://reviews.apache.org/r/2084/#comment4948>

    license


- Jonathan


On 2011-09-28 16:03:52, Mikhail Bautin wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2084/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-28 16:03:52)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Previously we only used row-column Bloom filters for scans that only requested one column.
We have seen production queries that request up to 200 columns, and with say ~6 store files
per store (region / column family combination) this might have resulted in 1200 block read
operations in the worst case. With this diff we will be avoiding seeks on store files that
we know don't contain the row/column of interest when using an ExplicitColumnTracker. The
performance should remain the same for column range queries.
bq.  
bq.  
bq.  This addresses bug HBASE-2794.
bq.      https://issues.apache.org/jira/browse/HBASE-2794
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION

bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4 
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e 
bq.    src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696

bq.    src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION

bq.  
bq.  Diff: https://reviews.apache.org/r/2084/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Mikhail
bq.  
bq.


                
> ROWCOL bloom filter not used if multiple columns within same family are requested in
a Get
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2794
>                 URL: https://issues.apache.org/jira/browse/HBASE-2794
>             Project: HBase
>          Issue Type: Improvement
>          Components: performance
>            Reporter: Kannan Muthukkaruppan
>
> Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():
> {code}
>         switch(bloomFilterType) {
>           case ROW:
>             key = row;
>             break;
>           case ROWCOL:
>             if (columns.size() == 1) {
>               byte[] col = columns.first();
>               key = Bytes.add(row, col);
>               break;
>             }
>             //$FALL-THROUGH$
>           default:
>             return true;
>         }
> {code}
> If columns.size > 1, then we currently don't take advantage of the bloom filter. 
We should optimize this to check bloom for each of columns and if none of the columns are
present in the bloom avoid opening the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message