hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
Date Thu, 20 Oct 2011 17:36:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131821#comment-13131821
] 

jiraposter@reviews.apache.org commented on HBASE-4532:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/
-----------------------------------------------------------

(Updated 2011-10-20 17:35:38.480668)


Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania,
Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan,
and Nicolas Spiegelberg.


Changes
-------

Thanks Jonathan and Ted's review. Address their comments.


Summary
-------

HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter
only for delete family

The only subtle use case is when we are interested in the top row with empty column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will
say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv for this row
(createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.

The solution for the above problem is to disable this optimization if we are trying to GET/SCAN
a row with empty column.

This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will
submit the patch for apache-trunk later.


This addresses bug HBASE-4532.
    https://issues.apache.org/jira/browse/HBASE-4532


Diffs (updated)
-----

  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163

  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 
  src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce 
  src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 

Diff: https://reviews.apache.org/r/2393/diff


Testing
-------

Passed all the unit tests


Thanks,

Liyin


                
> Avoid top row seek by dedicated bloom filter for delete family bloom filter
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-4532
>                 URL: https://issues.apache.org/jira/browse/HBASE-4532
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D27.1.patch, D27.1.patch
>
>
> HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. 
> This jira tries to avoid top row seek for all the cases by creating a dedicated bloom
filter only for delete family
> The only subtle use case is when we are interested in the top row with empty column.
> For example, 
> we are interested in row1/cf1:/1/put.
> So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter
will say there is NO delete family.
> Then it will avoid the top row seek and return a fake kv, which is the last kv for this
row (createLastOnRowCol).
> In this way, we have already missed the real kv we are interested in.
> The solution for the above problem is to disable this optimization if we are trying to
GET/SCAN a row with empty column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message