hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Rodionov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12031) Parallel Scanners inside Region
Date Tue, 23 Sep 2014 20:32:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145386#comment-14145386

Vladimir Rodionov commented on HBASE-12031:

You are adding a HFIleReaderContext. We have a HFileContext already. You cannot reuse/amend
this class to your purposes?

HFileContext carries the information on some of the meta data about the HFile - this is per
HFile. FielReaderContext is per Reader (Scanner).
I would not expect a Context to do the actual reading as in + public boolean read(long offset,
byte[] buffer, int bufOffset, int len). I could imagine passing in a context when you read.
Probably, yes but need a place (class) to move the actual read to.
Regards read-ahead and keeping the buffer local to the scanner, it is not enough just having
the scanner do a read-ahead that ensures blockcache is populated? You have to bring the data
local to the scanner? If multiple concurrent scans, we'll have duplicate data buffered?

Not sure how to implement this. Not all data from read ahead buffer should be cached in a
general case. Sharing some data between scanners in RA buffer is not a common case.

Hard to see what you did in readAtOffset.

If scanner is in *pread* mode -  execute OLD code for *pread*


if read -ahead disabled (context == null) - execute OLD code

else if streaming lock enabled  - execute OLD code

else ( Read ahead enabled && streaming lock - not enabled) - execute NEW code:

Check if we can server block from RA buffer - if yes - read & return block, otherwise
 read ahead next buffer, read block and return. 

> Parallel Scanners inside Region
> -------------------------------
>                 Key: HBASE-12031
>                 URL: https://issues.apache.org/jira/browse/HBASE-12031
>             Project: HBase
>          Issue Type: New Feature
>          Components: Performance, Scanners
>    Affects Versions: 0.98.6
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 1.0.0, 2.0.0, 0.98.7, 0.99.1
>         Attachments: HBASE-12031.2.patch, HBASE-12031.3.patch, HBASE-12031.patch, ParallelScannerDesign.pdf,
> This JIRA to improve performance of multiple scanners running on a same region in parallel.
The scenarios where we will get the performance benefits:
> * New TableInputFormat with input splits smaller than HBase Region.
> * Scanning during compaction (Compaction scanner and application scanner over the same
> Some JIRAs related to this one:
> https://issues.apache.org/jira/browse/HBASE-7336
> https://issues.apache.org/jira/browse/HBASE-5979 

This message was sent by Atlassian JIRA

View raw message