hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17917) Use pread by default for all user scan
Date Thu, 20 Apr 2017 08:04:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976277#comment-15976277

Duo Zhang commented on HBASE-17917:

OK, the problem is the new code in PE...

I put 10M rows(10G in size) into a single region, flush, then compact it into one file. The
test command is

./bin/hbase pe --rows=10000000 --cacheBlocks=false --caching=30 --scanReadType=pread/stream
--nomapred scan 1
./bin/hbase pe --rows=1000000 --cacheBlocks=false --caching=30 --scanReadType=pread/stream
--nomapred scan 10

The result is like what [~stack] said.

For one thread test, stream is about 180s, and pread is about 210s.
For 10 threads test, stream is about 68s, and pread is abount 28s.

Whether to set readahead to 0 does not have much impact on the results. But a strange thing
is that pread + asyncPrefetch is much slower than pread, about 360s.

So here, I want to revive an old idea, use pread by default, and switch to stream(by openning
a new reader) if we read from the scanner multiple times. Now after HBASE-17914 we already
have the ability to open multiple readers on the same StoreFile, I think it is much easier
to implement this logic.

And also, we can also do some refactoring to reduce the work when openning a HFileReader.

> Use pread by default for all user scan
> --------------------------------------
>                 Key: HBASE-17917
>                 URL: https://issues.apache.org/jira/browse/HBASE-17917
>             Project: HBase
>          Issue Type: Sub-task
>          Components: scan
>            Reporter: Duo Zhang
> As said in the parent issue. We need some benchmark here first.

This message was sent by Atlassian JIRA

View raw message