hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bijieshan <bijies...@huawei.com>
Subject Re: HRegion.openHRegion IOException caused an endless loop of opening—opening failed
Date Sat, 28 May 2011 03:08:43 GMT
During that time, there's too many regions were assigning.
I have read the related code, but the problem is still scratch my head over. The fact is the
region could not open for the zk state is not the expect one.

2011-05-20 16:02:58,993 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x1300c11b4f30051
Attempt to transition the unassigned node for d7555a12586e6c788ca55017224b5a51 from M_ZK_REGION_OFFLINE
to RS_ZK_REGION_OPENING failed, the node existed but was in the state RS_ZK_REGION_OPENING
set by the server 157-5-111-11,20020,1305875930161

So the question is, under what condition could cause the inconsistently states?

This is the a segment of HMaster logs around that time(There's so many logs like this)

15:49:47,864 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region ufdr,051410,1305873959469.14cfc2222fff69c0b44bf2cdc9e20dd1.
to 157-5-111-13,20020,1305877624933
2011-05-20 15:49:47,867 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED,
server=157-5-111-14,20020,1305877627727, region=5910a81f573f8e9e255db473e9407ab4
2011-05-20 15:49:47,867 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
was=ufdr,051998,1305873973067.193c64299a34361f21e637ad203c8abb. state=PENDING_OPEN, ts=1305877600490
2011-05-20 15:49:47,867 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Handling OPENED event for 5910a81f573f8e9e255db473e9407ab4; deleting unassigned node
2011-05-20 15:49:47,867 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous
transition plan was found (or we are ignoring an existing plan) for ufdr,051998,1305873973067.193c64299a34361f21e637ad203c8abb.
so generated a random one; hri=ufdr,051998,1305873973067.193c64299a34361f21e637ad203c8abb.,
src=, dest=157-5-111-12,20020,1305877626108; 4 (online=4, exclude=null) available servers

Jieshan Bean


I was asking about what was going on in the master during that time, I
really would like to see it. It should be some time after that

2011-05-20 15:49:48,122 ERROR
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
open of region=ufdr,010142,1305873720296.46a1a44714226105c11f82a2f7c6d8fa.

About resetting the znode, as you can see in TimeoutMonitor we don't
really care if it was reset or not as it should take care of doing it.
The issue here is getting at the root of the problem.


View raw message