Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of jmozah@gmail.com designates
 209.85.160.41 as permitted sender)
References: 
 <CAK_MoSv0Bb0b3PvB+NjL6eiziWevsmqnGDHB6imV434KFiBgDg@mail.gmail.com>
 <CAMTnOrKyDWM1m4SEU=Bk1DCNBkDyqbfip6=8B-2sLMtxQ9X07g@mail.gmail.com>
 <CAK_MoSsZjtC72ktPbmJFidQfmCdRPR2AWuDwkf4Mc0ALoUs59A@mail.gmail.com>
 <CAOdyVCywpiFu1xbXUkfxXpqzAXSctCK92=no0OD-Yj4mN416Fg@mail.gmail.com>
 <CADrVvWU=YC+BR+=DrBWQ1N6_iSJ88f9t6Si9ccHfpqBJ637O6Q@mail.gmail.com>
 <CAK_MoSsC1SWyM8zSSuK9CzwXrsitpC9Ovim6SFvGpM9p-8pqMQ@mail.gmail.com>
 <6F9E141D-3381-49F6-A488-639D378F566C@gmail.com>
 <CAK_MoStWJOtO=NHou+vbtixQqgk_h+Zg4FUBeV2PvRTYp2J8aw@mail.gmail.com>
 <90EEE1E4-18BB-4C72-978C-3365E989E646@gmail.com>
 <CAK_MoSs0ct6S2k8RHyA3hbiWWre03BUW9xB8Q_J-KSNNWF+QEg@mail.gmail.com>
 <CADrVvWXN3NVv3CRO9pqr8OpU+6X1kZB+mb0_b6PJYtpVAz88GQ@mail.gmail.com>
 <CAK_MoSvn8Fc3E2v+B3soWG78trTw1wVxHNyJ4kY_Gqs9k=q5ag@mail.gmail.com>
 <0CE69E9126D0344088798A3B7F7F80863A4D0C72@szxeml531-mbx.china.huawei.com>
 <2E362ACC9493D747B488241C66B3B66512CB20@RHV-EXRDA-S11.corp.ebay.com>
In-Reply-To: 
 <2E362ACC9493D747B488241C66B3B66512CB20@RHV-EXRDA-S11.corp.ebay.com>
Mime-Version: 1.0 (1.0)
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii
Message-Id: <3DC9A1EE-6C75-4C7F-934D-B339E08F0711@gmail.com>
Cc: "user@hbase.apache.org" <user@hbase.apache.org>
From: J Mohamed Zahoor <jmozah@gmail.com>
Subject: Re: Using HBase serving to replace memcached
Date: Thu, 23 Aug 2012 09:34:32 +0530
To: "Pamecha, Abhishek" <apamecha@x.com>

If you need to search row and column qualifiers you can pick  row+ col bloom=
 to help you skip blocks.

./Zahoor@iPad

On 22-Aug-2012, at 10:58 PM, "Pamecha, Abhishek" <apamecha@x.com> wrote:

> Great explanation. May be diverging from the thread's original question, b=
ut could you also care to explain the difference  if any, in searching for a=
 rowkey [ that you mentioned below ] Vs searching for a specific column qual=
ifier. Are there any optimizations for column qualifier search too or that o=
ne just needs to load all blocks that match the rowkey crieteria and then sc=
an each one of them from start to end?
>=20
> Thanks,
> Abhishek
>=20
>=20
> -----Original Message-----
> From: Anoop Sam John [mailto:anoopsj@huawei.com]=20
> Sent: Wednesday, August 22, 2012 5:35 AM
> To: user@hbase.apache.org; J Mohamed Zahoor
> Subject: RE: Using HBase serving to replace memcached
>=20
>> I could be wrong. I think HFile index block (which is located at the=20
>> end
>>> of HFile) is a binary search tree containing all row-key values (of=20
>>> the
>>> HFile) in the binary search tree. Searching a specific row-key in the=20=

>>> binary search tree could easily find whether a row-key exists (some=20
>>> node in the tree has the same row-key value) or not. Why we need load=20=

>>> every block to find if the row exists?
>=20
> I think there is some confusion with you people regarding the blooms and t=
he block index.I will try to clarify this point.
> Block index will be there with every HFile. Within an HFile the data will b=
e written as multiple blocks. While reading data block by block only HBase r=
ead data from the HDFS layer. The block index contains the information regar=
ding the blocks within that HFile. The information include the start and end=
 rowkeys which resides in that particular block and the block information li=
ke offset of that block and its length etc. Now when a request comes for get=
ting a rowkey 'x' all the HFiles within that region need to be checked.[KV c=
an be present in any of the HFile] Now in order to know this row will be pre=
sent in which block within an HFile, this block index will be used. Well thi=
s block index will be there in memory always. This lookup will tell only the=
 possible block in which the row is present. HBase will load that block and w=
ill read through it to get the row which we are interested in now.
> Bloom is like it will have information about each and every row added into=
 that HFile[Block index wont have info about each and every row]. This bloom=
 information will be there in memory always. So when a read request to get r=
ow 'x' in an Hfile comes, 1st the bloom is checked whether this row is there=
 in this file or not. If this is not there, as per the bloom, no block at al=
l will be fetched. But if bloom is not enabled, we might find one block whic=
h is having a row range such that 'x' comes in between and Hbase will load t=
hat block. So usage of blooms can avoid this IO. Hope this is clear for you n=
ow.
>=20
> -Anoop-
> ________________________________________
> From: Lin Ma [linlma@gmail.com]
> Sent: Wednesday, August 22, 2012 5:41 PM
> To: J Mohamed Zahoor; user@hbase.apache.org
> Subject: Re: Using HBase serving to replace memcached
>=20
> Thanks Zahoor,
>=20
> I read through the document you referred to, I am confused about what mean=
s leaf-level index, intermediate-level index and root-level index. It is app=
reciate if you could give more details what they are, or point me to the rel=
ated documents.
>=20
> BTW: the document you pointed me is very good, however I miss some basic b=
ackground of 3 terms I mentioned above. :-)
>=20
> regards,
> Lin
>=20
> On Wed, Aug 22, 2012 at 12:51 PM, J Mohamed Zahoor <jmozah@gmail.com> wrot=
e:
>=20
>> I could be wrong. I think HFile index block (which is located at the=20
>> end
>>> of HFile) is a binary search tree containing all row-key values (of=20
>>> the
>>> HFile) in the binary search tree. Searching a specific row-key in the=20=

>>> binary search tree could easily find whether a row-key exists (some=20
>>> node in the tree has the same row-key value) or not. Why we need load=20=

>>> every block to find if the row exists?
>>>=20
>>>=20
>> Hmm...
>> It is a multilevel index. Only the root Index's (Data, Meta etc) are=20
>> loaded when a region is opened. The rest of the tree (intermediate and=20=

>> leaf
>> index's) are present in each block level.
>> I am assuming a HFile v2 here for the discussion.
>> Read this for more clarity http://hbase.apache.org/book/apes03.html
>>=20
>> Nice discussion. You made me read lot of things. :-) Now i will dig in=20=

>> to the code and check this out.
>>=20
>> ./Zahoor
>>=20