hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6577) RegionScannerImpl.nextRow() should seek to next row
Date Tue, 16 Oct 2012 21:51:03 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477379#comment-13477379
] 

Lars Hofhansl commented on HBASE-6577:
--------------------------------------

This just came up on the mailing list again:
{code}
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167)
at
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521)
- locked <0x000000059584fab8> (a
org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402)
- locked <0x000000059584fab8> (a
org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507)
at
...
{code}

zahoor mentioned there that his KVs have very many version 1500+.
Presumably each new column (likely) starts on a new (HBase) block, because of the many versions,
which is why we see a lot of seeking.

I wonder whether a solution like the following would work:
In HRegionScannerImpl.nextRow(...) we try the current "naive" iteration for N KVs (let's say
100). If by then we have not reached the next row, we'll issue a direct seek.
That way if there are few version we avoid unnecessary seeks, but with many version we can
seek past a lot of KVs (and thus also avoid unnecessary seeks).

I can make a patch for that.

[~jdcryans] Would you be able the recreate the issue you saw with the initial version of this
patch in production?
                
> RegionScannerImpl.nextRow() should seek to next row
> ---------------------------------------------------
>
>                 Key: HBASE-6577
>                 URL: https://issues.apache.org/jira/browse/HBASE-6577
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: 6577-0.94.txt, 6577.txt, 6577-v2.txt
>
>
> RegionScannerImpl.nextRow() is called when a filter filters the entire row. In that case
we should seek to the next row rather then iterating over all versions of all columns to get
there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message