hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17910) Use separated StoreFileReader for streaming read
Date Mon, 17 Apr 2017 01:59:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970591#comment-15970591

Duo Zhang commented on HBASE-17910:

A long time ago, single-threaded streaming read was ~15% better than preading it. But then
streaming read blocked out other concurrent reads so in multithreaded case, pread had more
This result is intuitive. But I think in most cases we will use multiple threads to access
a region? Esepcially that even if you have only one thread doing scan, others can also slow
you down by calling get on that region. So maybe we could change the default to pread, and
user can still use streaming read by set ReadType manually when scan. This will be an incompatible

And I think AsyncPrefetchScanner can give a better performance when using pread, and it will
be default implementation for async client. Let me give a try.


> Use separated StoreFileReader for streaming read
> ------------------------------------------------
>                 Key: HBASE-17910
>                 URL: https://issues.apache.org/jira/browse/HBASE-17910
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
> For now we have already supportted using private readers for compaction, by creating
a new StoreFile copy. I think a better way is to allow creating multiple readers from a single
StoreFile instance, thus we can avoid the ugly cloning, and the reader can also be used for
streaming scan, not only for compaction.
> The reason we want to do this is that, we found a read amplification when using short
circult read. {{BlockReaderLocal}} will use an internal buffer to read data first, the buffer
size is based on the configured buffer size and the readahead option in CachingStrategy. For
normal pread request, we should just bypass the buffer, this can be achieved by setting readahead
to 0. But for streaming read I think the buffer is somehow still useful? So we need to use
different FSDataInputStream for pread and streaming read.
> And one more thing is that, we can also remove the streamLock if streaming read always
use its own reader.

This message was sent by Atlassian JIRA

View raw message