Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of weliam.cloud@gmail.com
 designates 209.85.214.41 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=q2Pszr55fPDDkajQjVIFWXo6A+V2oSgjiuO14vrHyHi1vmrOUJOsT9NZB/fDba9sHc
         rQm0X7id10WvoC7ca6Y4RHpXxvGeUz7CrNkYP6D4cr21VdsSIJ2UsWiC3qb2HW5Dl2Gk
         jJ5910/Zz1J2/VDK9hKBS12pKz9E5opRjZk5w=
MIME-Version: 1.0
In-Reply-To: <AANLkTi=F0+2PYARv1tU7cCpG-G-HVTZx_KZb4=wHMB6N@mail.gmail.com>
References: <AANLkTikV02JT0Ofej+34pqhR=dNmiSxouohiAHbm=Knb@mail.gmail.com>
 <AANLkTikKfTL5h9mJksidT=o8u=F9Le-3hJ6=MH+6e=qq@mail.gmail.com>
 <AANLkTi=F0+2PYARv1tU7cCpG-G-HVTZx_KZb4=wHMB6N@mail.gmail.com>
From: William Kang <weliam.cloud@gmail.com>
Date: Mon, 18 Oct 2010 23:21:21 -0400
Message-ID: <AANLkTin7xSjFRQbh1xQKXce=2-1mkzHaMwGjFW-8aXBD@mail.gmail.com>
Subject: Re: HBase random access in HDFS and block indices
To: user@hbase.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi JG and Ryan,
Thanks for the excellent answers.

So, I am going to push everything to the extremes without considering
the memory first. In theory, if in HBase, every cell size equals to
HBase block size, then there would not be any in block traverse. In
HDFS, very HBase block size equals to each HDFS block size, there
would not be any in-file random access necessary. This would provide
the best performance?

But, the problem is that if the block in HBase is too large, the
memory will run out since HBase load block into memory; if the block
in HDFS is too small, the DN will run out of memory since each HDFS
file takes some memory. So, it is a trade-off problem between memory
and performance. Is it right?

And would it make any difference between random reading the same size
file portion from of a small HDFS block and from a large HDFS block?

Thanks.


William

On Mon, Oct 18, 2010 at 10:58 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> On Mon, Oct 18, 2010 at 7:49 PM, William Kang <weliam.cloud@gmail.com> wr=
ote:
>> Hi,
>> Recently I have spent some efforts to try to understand the mechanisms
>> of HBase to exploit possible performance tunning options. And many
>> thanks to the folks who helped with my questions in this community, I
>> have sent a report. But, there are still few questions left.
>>
>> 1. If a HFile block contains more than one keyvalue pair, will the
>> block index in HFile point out the offset for every keyvalue pair in
>> that block? Or, the block index will just point out the key ranges
>> inside that block, so you have to traverse inside the block until you
>> meet the key you are looking for?
>
> The block index contains the first key for every block. =A0It therefore
> defines in an [a,b) manner the range of each block. Once a block has
> been selected to read from, it is read into memory then iterated over
> until the key in question has been found (or the closest match has
> been found).
>
>> 2. When HBase read block to fetching the data or traverse in it, is
>> this block read into memory?
>
> yes, the entire block at a time is read in a single read operation.
>
>>
>> 3. HBase blocks (64k configurable) are inside HDFS blocks (64m
>> configurable), to read the HBase blocks, we have to random access the
>> HDFS blocks. Even HBase can use in(p, buf, 0, x) to read a small
>> portion of the larger HDFS blocks, it is still a random access. Would
>> this be slow?
>
> Random access reads are not necessarily slow, they require several things=
:
> - disk seeks to the data in question
> - disk seeks to the checksum files in question
> - checksum computation and verification
>
> While not particularly slow, this could probably be optimized a bit.
>
> Most of the issues with random reads in HDFS is parallelizing the
> reads and doing as much io-pushdown/scheduling as possible without
> consuming an excess of sockets and threads. =A0The actual speed can be
> excellent, or not, depending on how busy the IO subsystem is.
>
>
>>
>> Many thanks. I would be grateful for your answers.
>>
>>
>> William
>>
>