Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C465210647 for ; Wed, 5 Mar 2014 03:43:47 +0000 (UTC) Received: (qmail 94857 invoked by uid 500); 5 Mar 2014 03:43:46 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 94529 invoked by uid 500); 5 Mar 2014 03:43:43 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 94519 invoked by uid 99); 5 Mar 2014 03:43:42 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Mar 2014 03:43:42 +0000 Date: Wed, 5 Mar 2014 03:43:42 +0000 (UTC) From: "zhaojianbo (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-10676?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaojianbo updated HBASE-10676: ------------------------------- Attachment: (was: HBASE-10676-0.98-trunk.patch) > Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make hi= gher perforamce of scan > -------------------------------------------------------------------------= ----------------------- > > Key: HBASE-10676 > URL: https://issues.apache.org/jira/browse/HBASE-10676 > Project: HBase > Issue Type: Improvement > Affects Versions: 0.98.0 > Reporter: zhaojianbo > Attachments: HBASE-10676-0.98-branch.patch > > > PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding b= ackward seek operation as the comment said: > {quote} > we will not incur a backward seek operation if we have already read this = block's header as part of the previous read's look-ahead. And we also want = to skip reading the header again if it has already been read. > {quote} > But that is not the case. In the code of 0.98, prefetchedHeader is thread= local for one storefile reader, and in the RegionScanner lifecycle=EF=BC=8C= different rpc handlers will serve scan requests of the same scanner. Even t= hough one handler of previous scan call prefetched the next block header, t= he other handlers of current scan call will still trigger a backward seek o= peration. The process is like this: > # rs handler1 serves the scan call, reads block1 and prefetches the heade= r of block2 > # rs handler2 serves the same scanner's next scan call, because rs handle= r2 doesn't know the header of block2 already prefetched by rs handler1, tri= ggers a backward seek and reads block2, and prefetches the header of block3= . > It is not the sequential read. So I think that the threadlocal is useless= , and should be abandoned. I did the work, and evaluated the performance of= one client, two client and four client scanning the same region with one s= torefile. The test environment is > # A hdfs cluster with a namenode, a secondary namenode , a datanode in a = machine > # A hbase cluster with a zk, a master, a regionserver in the same machine > # clients are also in the same machine. > So all the data is local. The storefile is about 22.7GB, 18995949 kvs. Ca= ching is set 1000. > With the improvement, the client total scan time decreases 21% for the on= e client case, 11% for the two clients case. But the four clients case is a= lmost the same. The details tests' data is the following: > ||case||client||time(ms)|| > | original | 1 | 306222 | > | new | 1 | 241313 | > | original | 2 | 416390 | > | new | 2 | 369064 | > | original | 4 | 555986 | > | new | 4 | 562152 | -- This message was sent by Atlassian JIRA (v6.2#6252)