Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5036D96DA for ; Thu, 23 Aug 2012 03:52:52 +0000 (UTC) Received: (qmail 41290 invoked by uid 500); 23 Aug 2012 03:52:50 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 41240 invoked by uid 500); 23 Aug 2012 03:52:50 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 41198 invoked by uid 99); 23 Aug 2012 03:52:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Aug 2012 03:52:48 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jmozah@gmail.com designates 209.85.160.41 as permitted sender) Received: from [209.85.160.41] (HELO mail-pb0-f41.google.com) (209.85.160.41) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Aug 2012 03:52:40 +0000 Received: by pbbro12 with SMTP id ro12so580205pbb.14 for ; Wed, 22 Aug 2012 20:52:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:in-reply-to:mime-version:content-transfer-encoding :content-type:message-id:cc:x-mailer:from:subject:date:to; bh=+Ar7UF+dob0a30fwuJ1dCyCunRjG6UTykrEerYJ5uSA=; b=n3CYrklxvzrKnHXwU5Oc6JQ0msdVuVlSZOQTEEGyz3x/StC3IV5Idxf5HVzm7wuw9c Muuqb6KpkoG0EIzNbZfBHB4aca4wNVmpyFLO95o9K6ub+w3cRtYSNlgq1JfzhMl0a9lh QHtzK+mS0G0P/Sp/W5exL1sEdpb6zEXt60FEz6zI1gLJ5BqRZvv/mvGI/1DS2nim3+cg 3bhdtzkGMa1iTdfaZjvuOgLFYfx0nbjHgc3XWWcs0PxLtB8sN489PmRmEWSswHEL50z+ UHCz7KXGH9ZrjbyDV+Oyr4qbse6HIAvd1QlVFquhnPA5NNJYOBXo0QIoXIv6YQtwagBM c8jg== Received: by 10.68.235.236 with SMTP id up12mr1189944pbc.79.1345693939552; Wed, 22 Aug 2012 20:52:19 -0700 (PDT) Received: from [192.168.1.5] ([122.174.24.97]) by mx.google.com with ESMTPS id ty1sm5041100pbc.76.2012.08.22.20.52.16 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 22 Aug 2012 20:52:17 -0700 (PDT) References: <6F9E141D-3381-49F6-A488-639D378F566C@gmail.com> <90EEE1E4-18BB-4C72-978C-3365E989E646@gmail.com> <0CE69E9126D0344088798A3B7F7F80863A4D0C72@szxeml531-mbx.china.huawei.com> <2E362ACC9493D747B488241C66B3B66512CB20@RHV-EXRDA-S11.corp.ebay.com> In-Reply-To: <2E362ACC9493D747B488241C66B3B66512CB20@RHV-EXRDA-S11.corp.ebay.com> Mime-Version: 1.0 (1.0) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Message-Id: <3DC9A1EE-6C75-4C7F-934D-B339E08F0711@gmail.com> Cc: "user@hbase.apache.org" X-Mailer: iPad Mail (9B206) From: J Mohamed Zahoor Subject: Re: Using HBase serving to replace memcached Date: Thu, 23 Aug 2012 09:34:32 +0530 To: "Pamecha, Abhishek" If you need to search row and column qualifiers you can pick row+ col bloom= to help you skip blocks. ./Zahoor@iPad On 22-Aug-2012, at 10:58 PM, "Pamecha, Abhishek" wrote: > Great explanation. May be diverging from the thread's original question, b= ut could you also care to explain the difference if any, in searching for a= rowkey [ that you mentioned below ] Vs searching for a specific column qual= ifier. Are there any optimizations for column qualifier search too or that o= ne just needs to load all blocks that match the rowkey crieteria and then sc= an each one of them from start to end? >=20 > Thanks, > Abhishek >=20 >=20 > -----Original Message----- > From: Anoop Sam John [mailto:anoopsj@huawei.com]=20 > Sent: Wednesday, August 22, 2012 5:35 AM > To: user@hbase.apache.org; J Mohamed Zahoor > Subject: RE: Using HBase serving to replace memcached >=20 >> I could be wrong. I think HFile index block (which is located at the=20 >> end >>> of HFile) is a binary search tree containing all row-key values (of=20 >>> the >>> HFile) in the binary search tree. Searching a specific row-key in the=20= >>> binary search tree could easily find whether a row-key exists (some=20 >>> node in the tree has the same row-key value) or not. Why we need load=20= >>> every block to find if the row exists? >=20 > I think there is some confusion with you people regarding the blooms and t= he block index.I will try to clarify this point. > Block index will be there with every HFile. Within an HFile the data will b= e written as multiple blocks. While reading data block by block only HBase r= ead data from the HDFS layer. The block index contains the information regar= ding the blocks within that HFile. The information include the start and end= rowkeys which resides in that particular block and the block information li= ke offset of that block and its length etc. Now when a request comes for get= ting a rowkey 'x' all the HFiles within that region need to be checked.[KV c= an be present in any of the HFile] Now in order to know this row will be pre= sent in which block within an HFile, this block index will be used. Well thi= s block index will be there in memory always. This lookup will tell only the= possible block in which the row is present. HBase will load that block and w= ill read through it to get the row which we are interested in now. > Bloom is like it will have information about each and every row added into= that HFile[Block index wont have info about each and every row]. This bloom= information will be there in memory always. So when a read request to get r= ow 'x' in an Hfile comes, 1st the bloom is checked whether this row is there= in this file or not. If this is not there, as per the bloom, no block at al= l will be fetched. But if bloom is not enabled, we might find one block whic= h is having a row range such that 'x' comes in between and Hbase will load t= hat block. So usage of blooms can avoid this IO. Hope this is clear for you n= ow. >=20 > -Anoop- > ________________________________________ > From: Lin Ma [linlma@gmail.com] > Sent: Wednesday, August 22, 2012 5:41 PM > To: J Mohamed Zahoor; user@hbase.apache.org > Subject: Re: Using HBase serving to replace memcached >=20 > Thanks Zahoor, >=20 > I read through the document you referred to, I am confused about what mean= s leaf-level index, intermediate-level index and root-level index. It is app= reciate if you could give more details what they are, or point me to the rel= ated documents. >=20 > BTW: the document you pointed me is very good, however I miss some basic b= ackground of 3 terms I mentioned above. :-) >=20 > regards, > Lin >=20 > On Wed, Aug 22, 2012 at 12:51 PM, J Mohamed Zahoor wrot= e: >=20 >> I could be wrong. I think HFile index block (which is located at the=20 >> end >>> of HFile) is a binary search tree containing all row-key values (of=20 >>> the >>> HFile) in the binary search tree. Searching a specific row-key in the=20= >>> binary search tree could easily find whether a row-key exists (some=20 >>> node in the tree has the same row-key value) or not. Why we need load=20= >>> every block to find if the row exists? >>>=20 >>>=20 >> Hmm... >> It is a multilevel index. Only the root Index's (Data, Meta etc) are=20 >> loaded when a region is opened. The rest of the tree (intermediate and=20= >> leaf >> index's) are present in each block level. >> I am assuming a HFile v2 here for the discussion. >> Read this for more clarity http://hbase.apache.org/book/apes03.html >>=20 >> Nice discussion. You made me read lot of things. :-) Now i will dig in=20= >> to the code and check this out. >>=20 >> ./Zahoor >>=20