hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vidhyashankar Venkataraman <vidhy...@yahoo-inc.com>
Subject Re: Initial region loads in hbase..
Date Mon, 30 Aug 2010 17:20:52 GMT
Thank you for the responses guys.. I will be looking at the problem later this week, I guess.
The initial eyeballing suggested that the RS's respond to the master's open very quickly..
And I saw our rrd stats (which is similar to ganglia) and I couldn't see any obvious IO/CPU/nw
bottlenecks: which means I haven't explored enough :) I will be coming back to the problem
soon and will let you know about my findings..

 I have been running scans and planning to run incremental loads on the db and get some useful
numbers.. Scans are blazing fast with just one storefile per region: 75-95 MBps per node!
(this is if locality between RS and data is kind of preserved: i.e., after a major compact
is issued)..

I will update on my status soon..
Thank you again
Vidhya

On 8/27/10 5:42 PM, "Jonathan Gray" <jgray@facebook.com> wrote:

Vidhya,

Could you post a snippet of an RS log during this time?  You should be able to see what's
happening between when the OPEN message gets there and the OPEN completes.

Like Stack said, it's probably that its single-threaded in the version you're using and with
all the file opening, your NN and DNs are under heavy load.  Do you see io-wait or anything
else jump up across the cluster at this time?  You have ganglia setup on this bad boy?

JG

> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> Stack
> Sent: Friday, August 27, 2010 5:36 PM
> To: user@hbase.apache.org
> Subject: Re: Initial region loads in hbase..
>
> In 0.20, open on a regionserver is single-threaded.  Could that be it?
>  The server has lots of regions to open and its taking time?  Is the
> meta table being beat up?  Could this be holding up region opens?
>
> Good luck V,
> St.Ack
>
>
> On Fri, Aug 27, 2010 at 5:01 PM, Vidhyashankar Venkataraman
> <vidhyash@yahoo-inc.com> wrote:
> > Hi guys,
> >  A couple of days back, I had posted a problem on regions taking too
> much time to load when I restart Hbase.. I have a table that has around
> 80 K regions on 650 nodes (!) ..
> >  I was checking the logs in the master and I notice that the time it
> takes from assigning a region to a region server to the point when it
> recognizes that the region is open in that server takes around 20-30
> minutes!
> >   Apart from master being the bottleneck here, can you guys let me
> know what the other possible cases are as to why this may happen?
> >
> > Thank you
> > Vidhya
> >
> > Below is an example for region with start key 0000003404803994 where
> the assignment takes place at 22:59 while the confirmation that it got
> open came at 23:19...
> >
> > 2010-08-27 22:59:02,642 DEBUG
> org.apache.hadoop.hbase.master.BaseScanner: Current assignment of
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> is not valid;  serverAddress=, startCode=0 unknown.
> > 2010-08-27 22:59:02,643 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED
> region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK_REGION_OFFLINE
> > 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Event NodeCreated with state SyncConnected with path
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,645 DEBUG
> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
> NodeCreated with path
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,645 DEBUG
> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
> Got zkEvent NodeCreated state:SyncConnected
> path:/hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,645 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
> <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b
> 3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.h
> base.master.HMaster>Created ZNode
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Created/updated
> UNASSIGNED zNode
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> in state M2ZK_REGION_OFFLINE
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [
> M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Event NodeChildrenChanged with state SyncConnected with path
> /hbase/UNASSIGNED
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
> NodeChildrenChanged with path /hbase/UNASSIGNED
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
> Got zkEvent NodeChildrenChanged state:SyncConnected
> path:/hbase/UNASSIGNED
> > 2010-08-27 22:59:02,647 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Assigning for
> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total
> nregions to assign=1, regions to give other servers than this=0,
> isMetaAssign=false
> > 2010-08-27 22:59:02,647 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Assigning
> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions
> > 2010-08-27 22:59:02,647 INFO
> org.apache.hadoop.hbase.master.RegionManager: Assigning region
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> to b3130020.yst.yahoo.net,60020,1282940735627
> > 2010-08-27 22:59:02,653 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update
> UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists
> in state = M2ZK_REGION_OFFLINE
> > 2010-08-27 22:59:02,653 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> in state M2ZK_REGION_OFFLINE
> >
> >
> >
> >
> > AND THEN,
> >
> > 2010-08-27 23:18:53,591 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation:
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> open on b3130020.yst.yahoo.net,60020,1282940735627
> > 2010-08-27 23:18:53,591 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> in region .META.,,1 with startcode=1282940735627,
> server=b3130020.yst.yahoo.net:60020
> > 2010-08-27 23:18:53,677 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Deleting ZNode
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper as
> region is open...
> >
> >
> >


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message