hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chunhui shen (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)
Date Fri, 17 Feb 2012 03:34:00 GMT
StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions
(timeout = 3 mins)
------------------------------------------------------------------------------------------------------------------

                 Key: HBASE-5422
                 URL: https://issues.apache.org/jira/browse/HBASE-5422
             Project: HBase
          Issue Type: Bug
          Components: master
            Reporter: chunhui shen


In our produce environment
We find a lot of timeout on RIT when cluster up, there are about 7w regions in the cluster(
25 regionservers ).

First, we could see the following log:(See the region 33cf229845b1009aa8a3f7b0f85c9bd0)
master's log
2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x348f4a94723da5
Async create of unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 
2012-02-13 18:07:42,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback:
rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409,
server=r03f11025.yh.aliyun.com,60020,1329127549907 
2012-02-13 18:07:42,996 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback:
rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409

2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in
transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0.
state=PENDING_OPEN, ts=1329127662996
2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has
been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0.

2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED,
server=r03f11025.yh.aliyun.com,60020,1329127549907, region=33cf229845b1009aa8a3f7b0f85c9bd0

2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Handling OPENED event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 
2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x348f4a94723da5
Deleting existing unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected
state RS_ZK_REGION_OPENED 
2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x348f4a94723da5
Successfully deleted unassigned node for region 33cf229845b1009aa8a3f7b0f85c9bd0 in expected
state RS_ZK_REGION_OPENED 
2012-02-13 18:38:07,573 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Opened region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on r03f11025.yh.aliyun.com,60020,1329127549907

2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous
transition plan was found (or we are ignoring an existing plan) for item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0.
so generated a random one; hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0.,
src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, exclude=null) available
servers 
2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning
region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to r01b05043.yh.aliyun.com,60020,1329127549041

2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in
transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0.
state=PENDING_OPEN, ts=1329132528086 
2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has
been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0.




Regionserver's log
2012-02-13 18:07:43,537 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received
request to open region: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0.

2012-02-13 18:11:16,560 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
Processing open of item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 





Through the RS's log, we could find it is larger than 3mins from receive openRegion request
to start processing openRegion, causing timeout on RIT in master for the region.

Let's see the code of StartupBulkAssigner, we could find regionPlans are not added when assigning
regions, therefore, when one region opened, it will not updateTimers of other regions whose
destination is the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message