hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12411) Avoid seek + read completely?
Date Sun, 16 Nov 2014 23:35:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214133#comment-14214133

stack commented on HBASE-12411:

Is that all it takes?

What about your trick to flip to pread if we are seek+reading already? That will still work
right?  Because compactions have own file, they'll seek+read?

So, seek + read makes (slight sense -- 10 or 20% better throughput?) if long scan and only
one scan going on at a time.  Otherwise, if lots of small gets and scans, pread makes more

seek +read blocks out preads when its running till HDFS-6735 is fixed.

Compactions are long reads.  Makes sense to do seek + read for these.  Giving them their own
file means they won't interfere with ongoing gets/scans.  I'll suppose we'll open a lot of
files with NN when doing a compaction.  Could take a while if a bunch of files to open. We
open in //?  So, this could  make compactions take a bit longer... but compactions are background
task so ok?

Add a toggle so its easy to flip it on and then lets try and get some numbers?

Good stuff [~lhofhansl]

> Avoid seek + read completely?
> -----------------------------
>                 Key: HBASE-12411
>                 URL: https://issues.apache.org/jira/browse/HBASE-12411
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: Performance
>            Reporter: Lars Hofhansl
>         Attachments: 12411.txt
> In the light of HDFS-6735 we might want to consider refraining from seek + read completely
and only perform preads.
> For example currently a compaction can lock out every other scanner over the file which
the compaction is currently reading for compaction.
> At the very least we can introduce an option to avoid seek + read, so we can allow testing
this in various scenarios.
> This will definitely be of great importance for projects like Phoenix which parallelize
queries intra region (and hence readers will used concurrently by multiple scanner with high

This message was sent by Atlassian JIRA

View raw message