hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17910) Use separated StoreFileReader for streaming read
Date Fri, 14 Apr 2017 09:22:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968823#comment-15968823

Duo Zhang commented on HBASE-17910:

I found a problem that when opening an HFileReader, we will read a lot of data such as trailer,
index, and so on. This may have bad impact on performance so for now I think it is only safe
to be used in compaction. So I opened HBASE-17914 to land the current code first.

And also, I will do some tests on whether we can use pread for any user scan. If it turns
out that pread is not slow than streaming read in most cases, then we could use pread for
all user scan by default, unless user set the ReadType manually to STREAM. And if so, I think
it is OK to open new readers as it is request by user directly and user knows the possible

Of course, these stuffs(trailer, index, etc.) can be shared between different readers. Will
open other issues to address it.


> Use separated StoreFileReader for streaming read
> ------------------------------------------------
>                 Key: HBASE-17910
>                 URL: https://issues.apache.org/jira/browse/HBASE-17910
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
> For now we have already supportted using private readers for compaction, by creating
a new StoreFile copy. I think a better way is to allow creating multiple readers from a single
StoreFile instance, thus we can avoid the ugly cloning, and the reader can also be used for
streaming scan, not only for compaction.
> The reason we want to do this is that, we found a read amplification when using short
circult read. {{BlockReaderLocal}} will use an internal buffer to read data first, the buffer
size is based on the configured buffer size and the readahead option in CachingStrategy. For
normal pread request, we should just bypass the buffer, this can be achieved by setting readahead
to 0. But for streaming read I think the buffer is somehow still useful? So we need to use
different FSDataInputStream for pread and streaming read.
> And one more thing is that, we can also remove the streamLock if streaming read always
use its own reader.

This message was sent by Atlassian JIRA

View raw message