hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Initial region loads in hbase..
Date Mon, 30 Aug 2010 17:44:45 GMT
On Mon, Aug 30, 2010 at 1:40 PM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
> Vidhya,
>
> Thanks for keeping it up :)
>
> If you trace all the references to 0000003404803994 in the region
> server, what do you see? I think that most of the time is spent
> opening the region sequentially, it'd be nice to confirm.
>
> J-D
>
> On Mon, Aug 30, 2010 at 10:20 AM, Vidhyashankar Venkataraman
> <vidhyash@yahoo-inc.com> wrote:
>> Thank you for the responses guys.. I will be looking at the problem later this week,
I guess. The initial eyeballing suggested that the RS's respond to the master's open very
quickly.. And I saw our rrd stats (which is similar to ganglia) and I couldn't see any obvious
IO/CPU/nw bottlenecks: which means I haven't explored enough :) I will be coming back to the
problem soon and will let you know about my findings..
>>
>>  I have been running scans and planning to run incremental loads on the db and get
some useful numbers.. Scans are blazing fast with just one storefile per region: 75-95 MBps
per node! (this is if locality between RS and data is kind of preserved: i.e., after a major
compact is issued)..
>>
>> I will update on my status soon..
>> Thank you again
>> Vidhya
>>
>> On 8/27/10 5:42 PM, "Jonathan Gray" <jgray@facebook.com> wrote:
>>
>> Vidhya,
>>
>> Could you post a snippet of an RS log during this time?  You should be able to see
what's happening between when the OPEN message gets there and the OPEN completes.
>>
>> Like Stack said, it's probably that its single-threaded in the version you're using
and with all the file opening, your NN and DNs are under heavy load.  Do you see io-wait
or anything else jump up across the cluster at this time?  You have ganglia setup on this
bad boy?
>>
>> JG
>>
>>> -----Original Message-----
>>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>>> Stack
>>> Sent: Friday, August 27, 2010 5:36 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: Initial region loads in hbase..
>>>
>>> In 0.20, open on a regionserver is single-threaded.  Could that be it?
>>>  The server has lots of regions to open and its taking time?  Is the
>>> meta table being beat up?  Could this be holding up region opens?
>>>
>>> Good luck V,
>>> St.Ack
>>>
>>>
>>> On Fri, Aug 27, 2010 at 5:01 PM, Vidhyashankar Venkataraman
>>> <vidhyash@yahoo-inc.com> wrote:
>>> > Hi guys,
>>> >  A couple of days back, I had posted a problem on regions taking too
>>> much time to load when I restart Hbase.. I have a table that has around
>>> 80 K regions on 650 nodes (!) ..
>>> >  I was checking the logs in the master and I notice that the time it
>>> takes from assigning a region to a region server to the point when it
>>> recognizes that the region is open in that server takes around 20-30
>>> minutes!
>>> >   Apart from master being the bottleneck here, can you guys let me
>>> know what the other possible cases are as to why this may happen?
>>> >
>>> > Thank you
>>> > Vidhya
>>> >
>>> > Below is an example for region with start key 0000003404803994 where
>>> the assignment takes place at 22:59 while the confirmation that it got
>>> open came at 23:19...
>>> >
>>> > 2010-08-27 22:59:02,642 DEBUG
>>> org.apache.hadoop.hbase.master.BaseScanner: Current assignment of
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> is not valid;  serverAddress=, startCode=0 unknown.
>>> > 2010-08-27 22:59:02,643 DEBUG
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED
>>> region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK_REGION_OFFLINE
>>> > 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster:
>>> Event NodeCreated with state SyncConnected with path
>>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>>> > 2010-08-27 22:59:02,645 DEBUG
>>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
>>> NodeCreated with path
>>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>>> > 2010-08-27 22:59:02,645 DEBUG
>>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
>>> Got zkEvent NodeCreated state:SyncConnected
>>> path:/hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>>> > 2010-08-27 22:59:02,645 DEBUG
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
>>> <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b
>>> 3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.h
>>> base.master.HMaster>Created ZNode
>>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper
>>> > 2010-08-27 22:59:02,646 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Created/updated
>>> UNASSIGNED zNode
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> in state M2ZK_REGION_OFFLINE
>>> > 2010-08-27 22:59:02,646 DEBUG
>>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [
>>> M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1
>>> > 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster:
>>> Event NodeChildrenChanged with state SyncConnected with path
>>> /hbase/UNASSIGNED
>>> > 2010-08-27 22:59:02,646 DEBUG
>>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
>>> NodeChildrenChanged with path /hbase/UNASSIGNED
>>> > 2010-08-27 22:59:02,646 DEBUG
>>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
>>> Got zkEvent NodeChildrenChanged state:SyncConnected
>>> path:/hbase/UNASSIGNED
>>> > 2010-08-27 22:59:02,647 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Assigning for
>>> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
>>> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total
>>> nregions to assign=1, regions to give other servers than this=0,
>>> isMetaAssign=false
>>> > 2010-08-27 22:59:02,647 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Assigning
>>> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
>>> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions
>>> > 2010-08-27 22:59:02,647 INFO
>>> org.apache.hadoop.hbase.master.RegionManager: Assigning region
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> to b3130020.yst.yahoo.net,60020,1282940735627
>>> > 2010-08-27 22:59:02,653 DEBUG
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update
>>> UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists
>>> in state = M2ZK_REGION_OFFLINE
>>> > 2010-08-27 22:59:02,653 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> in state M2ZK_REGION_OFFLINE
>>> >
>>> >
>>> >
>>> >
>>> > AND THEN,
>>> >
>>> > 2010-08-27 23:18:53,591 INFO
>>> org.apache.hadoop.hbase.master.RegionServerOperation:
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> open on b3130020.yst.yahoo.net,60020,1282940735627
>>> > 2010-08-27 23:18:53,591 INFO
>>> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> in region .META.,,1 with startcode=1282940735627,
>>> server=b3130020.yst.yahoo.net:60020
>>> > 2010-08-27 23:18:53,677 DEBUG
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Deleting ZNode
>>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper as
>>> region is open...
>>> >
>>> >
>>> >
>>
>>
>

Just an FYI. hadoop-cacti-jtg
(http://www.jointhegrid.com/hadoop/index.jsp) had excellent graph
coverage for all the jmx attributes in hbase-20.3.

http://www.jointhegrid.com/hadoop-cacti-jtg-walk/hbase_region_blockcache.jsp
http://www.jointhegrid.com/hadoop-cacti-jtg-walk/hbase_master.jsp

Has their been any major changes/additions to hbase jmx recently?

Edward

Mime
View raw message