hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Muthukkaruppan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row
Date Thu, 28 Feb 2013 19:59:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589767#comment-13589767
] 

Kannan Muthukkaruppan edited comment on HBASE-4433 at 2/28/13 7:58 PM:
-----------------------------------------------------------------------

The relevant JIRA that addresses this issue is: HBASE-5987.

Basically, whenever we go down an index, we also lookahead and maintain the start key of the
next block in the HFileScanner state. When a need to reseek to a key arises, we do a quick
check to see if the key is in the same block (i.e. is less than the start key of the next
block). If it is, the reseek doesn't need to consult the index again and can simple march
along in the same block to find the key; else, it uses the index to find the block it needs
to go to.

Looks like this was fixed in 0.95. Raymond: Which version are you trying this with?
---
                
      was (Author: kannanm):
    The relevant JIRA that addresses this issue is: HBASE-5987.

Basically, whenever we go done an index, we also lookahead and maintain the start key of the
next block in the HFileScanner state. When a need to reseek to a key arises, we do a quick
check to see if the key is in the same block (i.e. is less than the start key of the next
block). If it is, the reseek doesn't need to consult the index again and can simple march
along in the same block to find the key; else, it uses the index to find the block it needs
to go to.

Looks like this was fixed in 0.95. Raymond: Which version are you trying this with?
---
                  
> avoid extra next (potentially a seek) if done with column/row
> -------------------------------------------------------------
>
>                 Key: HBASE-4433
>                 URL: https://issues.apache.org/jira/browse/HBASE-4433
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>             Fix For: 0.92.0
>
>
> [Noticed this in 89, but quite likely true of trunk as well.]
> When we are done with the requested column(s) the code still does an extra next() call
before it realizes that it is actually done. This extra next() call could potentially result
in an unnecessary extra block load. This is likely to be especially bad for CFs where the
KVs are large blobs where each KV may be occupying a block of its own. So the next() can often
load a new unrelated block unnecessarily.
> --
> For the simple case of reading say the top-most column in a row in a single file, where
each column (KV) was say a block of its own-- it seems that we are reading 3 blocks, instead
of 1 block!
> I am working on a simple patch and with that the number of seeks is down to 2. 
> [There is still an extra seek left.  I think there were two levels of extra/unnecessary
next() we were doing without actually confirming that the next was needed. One at the StoreScanner/ScanQueryMatcher
level which this diff avoids. I think the other is at hfs.next() (at the storefile scanner
level) that's happening whenever a HFile scanner servers out a data-- and perhaps that's the
additional seek that we need to avoid. But I want to tackle this optimization first as the
two issues seem unrelated.]
> -- 
> The basic idea of the patch I am working on/testing is as follows. The ExplicitColumnTracker
currently returns "INCLUDE" to the ScanQueryMatcher if the KV needs to be included and then
if done, only in the the next call it returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW
hint. For the cases when ExplicitColumnTracker knows it is done with a particular column/row,
the patch attempts to combine the INCLUDE code and done hint into a single match code-- INCLUDE_AND_SEEK_NEXT_COL
and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message