hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
Date Thu, 20 Oct 2011 00:09:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131191#comment-13131191
] 

jiraposter@reviews.apache.org commented on HBASE-4532:
------------------------------------------------------



bq.  On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 1058
bq.  > <https://reviews.apache.org/r/2393/diff/2/?file=50558#file50558line1058>
bq.  >
bq.  >     not clear where you are using this "-1" state

Even if there is no delete family bloom filter, the Store file will still count how many delete
family key values and append this information into HFile's File info.
So when reading the file, we will know how many delete family kvs.

However, if there is no this delete family field in the file info, deleteFamilyCnt shall be
set to -1. So the function passesDeleteFamilyBloomFilter won't take this into account.


bq.  On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 1217
bq.  > <https://reviews.apache.org/r/2393/diff/2/?file=50558#file50558line1217>
bq.  >
bq.  >     To make sure I understand this...
bq.  >     
bq.  >     for "HFileV1" case or for "HFileV2 + but without this fix", I am guessing deleteFamilyCnt
will be equal to -1, and the fact that it doesn't have a bloomFilter will cause it to return
true. That look's fine. Just not obvious.

Yes:) If there is a deleteFamilyCnt and the deleteFamilyCnt is 0, then there is no need to
check Bloom filter and return false for function passesDeleteFamilyBloomFilter(). It means
there is no need to seek this store file for delete family with the row.

if the deleteFamilyCnt is not initialized properly for some reason, which is set to -1, then
it needs to check the delete family bloom filter.
So there is no delete family bloom filter, it will return true. It means it is possible that
there is a delete family for this row.


bq.  On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java, line 238
bq.  > <https://reviews.apache.org/r/2393/diff/2/?file=50559#file50559line238>
bq.  >
bq.  >     In case there is a deleteFamily kv, there are two sub-cases here...
bq.  >     
bq.  >     a) we have ROWCOL bloom (in which case there is no DeleteFamilyBloomFilter)
and we want to use the ROWCOL bloom filter itself.
bq.  >     
bq.  >     b) we have a DeleteFamilyBloomFilter.
bq.  >     
bq.  >     I don't see us taking advantage of (a) like we used to earlier. Isn't this a
regression for the ROWCOL bloom case? And if so, TestBlocksRead should have caught it, no?

1) Yes, it should the ROWCOL Bloom filter. It can also help to warm up row col bloom filter
in the cache OR get benefit from block cache.
     I will update the code.
2) There is no regression for the ROWCOL bloom case. It is because we only count for data
block seek number. 
    No matter which bloom filter (delete family or row col), it will return the same result.
So it won't affect the decision whether to seek to the store file file or not.
    Please correct me if I am wrong :)


bq.  On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java, line 111
bq.  > <https://reviews.apache.org/r/2393/diff/2/?file=50560#file50560line111>
bq.  >
bq.  >     isSeekToEmptyColumn and useBloom should be separate flags I think.
bq.  >     
bq.  >     For example, if the CF had ROWCOL bloom, and the query for looking for "row/0-length
column", then with this change, we won't use the ROWCOL bloom filter even when it exists.
bq.  >     
bq.  >     Isn't it the case that we want to avoid using only the deleteFamilyBloom filter
when isSeekToEmptyColumn is true?

Agree:) I will update the code to pass the scan query matcher to each store file scanner.
Also this will help us for further optimization.
When the store file scanner has more information about the matcher's status, it may help to
avoid more unnecessarily seeks.


- Liyin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/#review2615
-----------------------------------------------------------


On 2011-10-20 00:08:14, Liyin Tang wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2393/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-20 00:08:14)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania,
Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan,
and Nicolas Spiegelberg.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. 
bq.  This jira tries to avoid top row seek for all the cases by creating a dedicated bloom
filter only for delete family
bq.  
bq.  The only subtle use case is when we are interested in the top row with empty column.
bq.  
bq.  For example, 
bq.  we are interested in row1/cf1:/1/put.
bq.  So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter
will say there is NO delete family.
bq.  Then it will avoid the top row seek and return a fake kv, which is the last kv for this
row (createLastOnRowCol).
bq.  In this way, we have already missed the real kv we are interested in.
bq.  
bq.  The solution for the above problem is to disable this optimization if we are trying to
GET/SCAN a row with empty column.
bq.  
bq.  This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well.
I will submit the patch for apache-trunk later.
bq.  
bq.  
bq.  This addresses bug HBASE-4532.
bq.      https://issues.apache.org/jira/browse/HBASE-4532
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce 
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a 
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 
bq.    src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163

bq.    src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 
bq.  
bq.  Diff: https://reviews.apache.org/r/2393/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Running all the unit tests now
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Liyin
bq.  
bq.


                
> Avoid top row seek by dedicated bloom filter for delete family bloom filter
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-4532
>                 URL: https://issues.apache.org/jira/browse/HBASE-4532
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D27.1.patch, D27.1.patch
>
>
> HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. 
> This jira tries to avoid top row seek for all the cases by creating a dedicated bloom
filter only for delete family
> The only subtle use case is when we are interested in the top row with empty column.
> For example, 
> we are interested in row1/cf1:/1/put.
> So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter
will say there is NO delete family.
> Then it will avoid the top row seek and return a fake kv, which is the last kv for this
row (createLastOnRowCol).
> In this way, we have already missed the real kv we are interested in.
> The solution for the above problem is to disable this optimization if we are trying to
GET/SCAN a row with empty column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message