hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Will all HFiles managed by a regionserver kept open
Date Tue, 18 Jan 2011 19:08:55 GMT
There should be as many seeks as there is store files in the region
that's serving the data. There's also the family dimension e.g. if you
read from only 1 family then only those store files are read.

So on average, I'd say you'll do 3 seeks since you do a minor
compaction once you reach 4 store files in a family.

What he meant by memory copying is just that the data has to be copied
from the socket when you read from HDFS and then into the outbound
socket for the client after the region server does whatever processing
it needs to do. I guess the more data you read to longer it takes to
copy in RAM?

J-D

On Fri, Jan 14, 2011 at 12:43 AM, Tao Xie <xietao.mailbox@gmail.com> wrote:
> is hdfs seek the most dominant in retrieving data? If records are small
> (~1k) and most requests are random Gets,  how many seek will happen in
> average during a Get. Btw, what do you mean by memory copying?  when will it
> cause large overhead? thanks.
>
> 2011/1/13 Ryan Rawson <ryanobjc@gmail.com>
>
>> retrieving data from disk is the most dominant element, until you are
>> fully cached in which case other factors inside the regionserver
>> become dominant. at this point copying memory, gc, algorithmic
>> complexity, etc become important.
>>

Mime
View raw message