Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 58634 invoked from network); 19 Oct 2010 03:22:11 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 Oct 2010 03:22:11 -0000 Received: (qmail 1190 invoked by uid 500); 19 Oct 2010 03:22:10 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 1001 invoked by uid 500); 19 Oct 2010 03:22:10 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 993 invoked by uid 99); 19 Oct 2010 03:22:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Oct 2010 03:22:09 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of weliam.cloud@gmail.com designates 209.85.214.41 as permitted sender) Received: from [209.85.214.41] (HELO mail-bw0-f41.google.com) (209.85.214.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Oct 2010 03:22:03 +0000 Received: by bwz19 with SMTP id 19so1363782bwz.14 for ; Mon, 18 Oct 2010 20:21:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=thpOq62bWBUWdYoDaQ4Cj0hsaXpLkjAIxyou8Nt7yGE=; b=d3pEsWHHkz9RV1JMlL4f3QnHj8rvgPtbGp7/aoreVjW8dBopUbHOQzavuQgOFyWNSY wUv6GwupokbPznERgDkJeHqf1pfLDduHc3eJVfch80LNr/RwQLEqdO3+uiHuvPrHb4Z+ yMHZwFu+HFGEhafyayVLFUnPhcTBDDoAAyrww= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=q2Pszr55fPDDkajQjVIFWXo6A+V2oSgjiuO14vrHyHi1vmrOUJOsT9NZB/fDba9sHc rQm0X7id10WvoC7ca6Y4RHpXxvGeUz7CrNkYP6D4cr21VdsSIJ2UsWiC3qb2HW5Dl2Gk jJ5910/Zz1J2/VDK9hKBS12pKz9E5opRjZk5w= Received: by 10.204.49.11 with SMTP id t11mr5209819bkf.64.1287458501536; Mon, 18 Oct 2010 20:21:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.117.148 with HTTP; Mon, 18 Oct 2010 20:21:21 -0700 (PDT) In-Reply-To: References: From: William Kang Date: Mon, 18 Oct 2010 23:21:21 -0400 Message-ID: Subject: Re: HBase random access in HDFS and block indices To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi JG and Ryan, Thanks for the excellent answers. So, I am going to push everything to the extremes without considering the memory first. In theory, if in HBase, every cell size equals to HBase block size, then there would not be any in block traverse. In HDFS, very HBase block size equals to each HDFS block size, there would not be any in-file random access necessary. This would provide the best performance? But, the problem is that if the block in HBase is too large, the memory will run out since HBase load block into memory; if the block in HDFS is too small, the DN will run out of memory since each HDFS file takes some memory. So, it is a trade-off problem between memory and performance. Is it right? And would it make any difference between random reading the same size file portion from of a small HDFS block and from a large HDFS block? Thanks. William On Mon, Oct 18, 2010 at 10:58 PM, Ryan Rawson wrote: > On Mon, Oct 18, 2010 at 7:49 PM, William Kang wr= ote: >> Hi, >> Recently I have spent some efforts to try to understand the mechanisms >> of HBase to exploit possible performance tunning options. And many >> thanks to the folks who helped with my questions in this community, I >> have sent a report. But, there are still few questions left. >> >> 1. If a HFile block contains more than one keyvalue pair, will the >> block index in HFile point out the offset for every keyvalue pair in >> that block? Or, the block index will just point out the key ranges >> inside that block, so you have to traverse inside the block until you >> meet the key you are looking for? > > The block index contains the first key for every block. =A0It therefore > defines in an [a,b) manner the range of each block. Once a block has > been selected to read from, it is read into memory then iterated over > until the key in question has been found (or the closest match has > been found). > >> 2. When HBase read block to fetching the data or traverse in it, is >> this block read into memory? > > yes, the entire block at a time is read in a single read operation. > >> >> 3. HBase blocks (64k configurable) are inside HDFS blocks (64m >> configurable), to read the HBase blocks, we have to random access the >> HDFS blocks. Even HBase can use in(p, buf, 0, x) to read a small >> portion of the larger HDFS blocks, it is still a random access. Would >> this be slow? > > Random access reads are not necessarily slow, they require several things= : > - disk seeks to the data in question > - disk seeks to the checksum files in question > - checksum computation and verification > > While not particularly slow, this could probably be optimized a bit. > > Most of the issues with random reads in HDFS is parallelizing the > reads and doing as much io-pushdown/scheduling as possible without > consuming an excess of sockets and threads. =A0The actual speed can be > excellent, or not, depending on how busy the IO subsystem is. > > >> >> Many thanks. I would be grateful for your answers. >> >> >> William >> >