Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Message-ID: <165159.404711285475432967.JavaMail.jira@thor>
Date: Sun, 26 Sep 2010 00:30:32 -0400 (EDT)
From: "andychen (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Subject: [jira] Commented: (HBASE-3040) BlockIndex readIndex too slowly in
 heavy write scenario
In-Reply-To: <18986315.403741285471892752.JavaMail.jira@thor>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914918#action_12914918 ] 

andychen commented on HBASE-3040:
---------------------------------

In HFile.loadFileInfo, we can calculate all indices' size (including data block and meta block) using trailer's information
int allIndexSize = (int)(this.fileSize - this.trailer.dataIndexOffset - FixedFileTrailer.trailerSize());

then, we add an new function--readAllIndex, in readAllIndex, we load all data and meta block using one DFS read
byte[] dataAndMetaIndex = readAllIndex(this.istream, this.trailer.dataIndexOffset, allIndexSize);

Now, we can extract all indices data from local memory instead of remote datanode
Region server used to use readIndex to load indices data from datanode, in this function, there may be 10000 network round trips in case of one storefile has 10000 blocks.
So, we add an other function readIndexEx to get data from local memory which returned by readAllIndex above.
Under our test case, region server load about 1000 block indices spent several microseconds stably

> BlockIndex readIndex too slowly in heavy write scenario
> -------------------------------------------------------
>
>                 Key: HBASE-3040
>                 URL: https://issues.apache.org/jira/browse/HBASE-3040
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.20.6
>         Environment: 1master, 7 region servers, 4 * 7 clients(all clients run on region server host),  sequential put
>            Reporter: andychen
>
> region size is configured with 128M,  block size is 64K, the table has 5 column families
> at the beginning, when region split, master assigns daughters to new region servers, new region server open region, readIndex of this region's storefile(about 1000 blocks) spent 30~50ms, with the data import region server spent more and more time (sometimes up to several seconds) to load 1000 block indices
> at right now, we resolve this issue by getting all indices of one hfile within one DFS read instead of 1000 reads.
> is there any other better resolution?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.