hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sharma, Avani" <agsha...@ebay.com>
Subject RE: regionserver skew
Date Tue, 07 Sep 2010 18:09:33 GMT
Stack,

I don't think that is my case. I am doing random reads across the namespace and the way the
table is designed, they should be distributed across region servers. As I understand, rows
are sorted by the key and we should design the table such that we fetch data across regions
and I have tried to achieve the same. If there is something else you want me to read, please
point me to it. I have read the Hbase Architecture doc and also the one Lars George has posted

I have one 2G file and other smaller ones on the cluster, but currently I am fetching data
from this 2G lookup only. 
The number of regions is as follows:
Server1: regions=41, 2G heap , also the hbase master, regionserver, namenode, tasktracker,
jobtracker, datanode
Server2: regions=36, 4G heap , datanode, tasktracker and regionserver
Server3: regions=37 - this server gets 0 requests or 0 hitRatio, 4G heap , datanode, tasktracker
and regionserver
Total:114

That link mentioned that some servers have 0 hitRatio and says that is acceptable (?) , but
that's for inserts- I am not sure if same applies to reads.
http://search-hadoop.com/m/ESeeZ1B082l
How do I confirm where the .META is hosted. Currently, I look the master log and check the
machine it is hitting for .META table.

My main concern is that before the upgrade to 0.20.6,  .5M rows took 520 seconds (which you
though was slow) on this 3-node cluster and now, after the upgrade and whatever other changes
hbase/hdfs went through, it takes nearly an hour to do the same (with the same data and same
rows being fetched). There is something really wrong with HDFS/Hbase here.
I need help with diagnosing this. Let me know if you need any logs from me for this. I did
send some logs last time. Did you get a chance to look at those?

Thanks.

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Monday, September 06, 2010 12:04 PM
To: user@hbase.apache.org
Subject: Re: regionserver skew

On Fri, Sep 3, 2010 at 6:22 PM, Sharma, Avani <agsharma@ebay.com> wrote:
> I read on the mailing list that the region server that has .META table handles more requests.
That sounds okay, but in my case the 3rd regionserver has 0 requests! And I feel that's what
slowing down the read performance. Also the hit ratio at the other regionserver is 87% or
so. Only the one that hosts .META has 95+% hit ratio.
>

Are your reads distributed across the whole namespace or are they only
fetching some subset? If a subset, it can be the case that the subset
is totally hosted by a single regionserver and while your test is
running, its only pulling form this single server.  Is that your case?
 (You do understand how rows are distributed on an hbase cluster?)

Also,  how many regions do you have?  You said you have 2G of data
total at one stage.  That likely does not make for many regions.  If
so, it could also be the case that all the server that is not fielding
requests may not be actually carrying data, or little data.  Is this
your case?

St.Ack

Mime
View raw message