hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Stack <st...@duboce.net>
Subject Re: hbase performance period
Date Tue, 04 Nov 2008 07:57:42 GMT
CaiSijie wrote:
> Thank you for your replay.
> My version of HBase is 0.18.0.  Yes, I read data in series.
>
> But what I see is that reading 1st data cost least time and reading 128th
> data cost most time. It means that time increase from reading 1st to 128th
> data item. Then when reading 129th data item, time becomes less and similar
> as reading 1st data item time. So the period is 128. 128th data, 256th data,
> 384th data... need most time and 1st data, 129th data, 257th data, 385th
> data... need least time.
>
> I have tested for many times but it always exists this phenomenon. I can be
> sure that the mapfile index interval is 32.
> And what i should say is that every item data is 100 KiloByte.
>
> I am confused...
Thanks for spending more time on this.

So, what I think is happening is that when you do a get on a key that is 
in the MapFile index, the seek goes directly to the correct offset.  
Otherwise, we seek to an index key that sorts before the asked-for key 
and then we call the SequenceFile.next until we hit the requested key 
(See around line 432 in this file: *http://tinyurl.com/63ejru*.  The 
core of the hbase get is the getClosest method in Hadoop MapFile which 
calls this internalSeek method).

If the asked-for key is in index, its fastest and gets steadily slower 
as we search forward by next'ing through the data file until we hit the 
next index entry.

Because your values are 100k, the progression is noticeable.

I didn't understand why the interval was 128 rather than 32, but I just 
added logging and see that our attempt at setting it to 32, at least in 
this mapreduce context that I tested in, is broken.  I opened an issue, 
HBASE-981 (Thanks for finding this one).

St.Ack



> Sijie Cai 
>
>
>
>
> stack-3 wrote:
>   
>> The only thing that comes to mind is that by default in hadoop, the
>> mapfile index interval is 128; every 128th entry in mapfile gets an
>> entry in the mapfile index. Only, in hbase, we change the default
>> interval to be 32. Check to make sure you are picking up
>> hbase.io.index.interval of 32.
>>
>> Otherwise, I'm not sure as to why you would see the below. Are you
>> saying that there is a step every 128 intervals? That the 129th read
>> takes longer than the read at position 1 and that the read at position
>> 257 takes longer than the read at position 129?
>>
>> The fact that it takes increasingly longer as you read from position 0
>> up to 128 makes sense -- if the index interval is every 128 -- because
>> we do serial search forward from the closest index position.
>>
>> What version of hbase are you using?
>>
>> You are doing your reads in series?
>>
>> This is really interesting stuff. Can you dig in some more and try and
>> figure whats going on?
>>
>> Thanks Cai.
>>
>> St.Ack
>>
>>
>>     
>
>   


Mime
View raw message