hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhaojianbo (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan
Date Wed, 05 Mar 2014 03:43:42 GMT

     [ https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

zhaojianbo updated HBASE-10676:
-------------------------------

    Attachment: HBASE-10676-0.98-branch.patch

> Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce
of scan
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10676
>                 URL: https://issues.apache.org/jira/browse/HBASE-10676
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.98.0
>            Reporter: zhaojianbo
>         Attachments: HBASE-10676-0.98-branch.patch
>
>
> PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding backward seek
operation as the comment said:
> {quote}
> we will not incur a backward seek operation if we have already read this block's header
as part of the previous read's look-ahead. And we also want to skip reading the header again
if it has already been read.
> {quote}
> But that is not the case. In the code of 0.98, prefetchedHeader is threadlocal for one
storefile reader, and in the RegionScanner lifecycle´╝îdifferent rpc handlers will serve scan
requests of the same scanner. Even though one handler of previous scan call prefetched the
next block header, the other handlers of current scan call will still trigger a backward seek
operation. The process is like this:
> # rs handler1 serves the scan call, reads block1 and prefetches the header of block2
> # rs handler2 serves the same scanner's next scan call, because rs handler2 doesn't know
the header of block2 already prefetched by rs handler1, triggers a backward seek and reads
block2, and prefetches the header of block3.
> It is not the sequential read. So I think that the threadlocal is useless, and should
be abandoned. I did the work, and evaluated the performance of one client, two client and
four client scanning the same region with one storefile.  The test environment is
> # A hdfs cluster with a namenode, a secondary namenode , a datanode in a machine
> # A hbase cluster with a zk, a master, a regionserver in the same machine
> # clients are also in the same machine.
> So all the data is local. The storefile is about 22.7GB, 18995949 kvs. Caching is set
1000.
> With the improvement, the client total scan time decreases 21% for the one client case,
11% for the two clients case. But the four clients case is almost the same. The details tests'
data is the following:
> ||case||client||time(ms)||
> | original | 1 | 306222 |
> | new | 1 | 241313 |
> | original | 2 | 416390 |
> | new | 2 | 369064 |
> | original | 4 | 555986 |
> | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message