hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12411) Avoid seek + read completely?
Date Sun, 09 Nov 2014 08:17:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203846#comment-14203846

Lars Hofhansl commented on HBASE-12411:

Small scans do involve more work if they are in fact not small (i.e. involve multiple RPC,
as each RPC needs to resetup the scanner including all the seeking needed).
The more I think about, the more I think that p-read only is the right choice right now, as
long as we have only one reader per HFile. When more than scanner happens to read from an
HFile the prefeteching is likely not going to help and the right scanner would need to be
the lucky one again and again. Only when we can guarantee a single scanner will be scanning
an HFile (a DFSInputStream to be specific), and that scanner will be scanning enough to benefit
from the pre-fetching does seek + read make sense.

I'll do some perf testing and then post a patch.

> Avoid seek + read completely?
> -----------------------------
>                 Key: HBASE-12411
>                 URL: https://issues.apache.org/jira/browse/HBASE-12411
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: Performance
>            Reporter: Lars Hofhansl
> In the light of HDFS-6735 we might want to consider refraining from seek + read completely
and only perform preads.
> For example currently a compaction can lock out every other scanner over the file which
the compaction is currently reading for compaction.
> At the very least we can introduce an option to avoid seek + read, so we can allow testing
this in various scenarios.
> This will definitely be of great importance for projects like Phoenix which parallelize
queries intra region (and hence readers will used concurrently by multiple scanner with high

This message was sent by Atlassian JIRA

View raw message