hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bijieshan <bijies...@huawei.com>
Subject Re: ServerShutdownHandler process finished, but the region was assigned to the shutted RS again
Date Thu, 23 Jun 2011 01:59:51 GMT
Yes, it seems pretty strange.
You're right, there's so many logs, so I just picked off part of the logs.
I feel a hiding bug here. And I'm still working on it.

I feel the function of AssignmentManager#regionOnline has a problem:
1. ServerShutdownHandler will remove the server from the AssignmentManager.servers.
2. regionOnline will add the server into AssignmentManager.servers again.

What about a competition happened here?

  public void regionOnline(HRegionInfo regionInfo, HServerInfo serverInfo) {  
    synchronized (this.regionsInTransition) {
        ---------------
    }
    synchronized (this.regions) {
      // Add check
      ---------------
      addToServers(hsiWithoutLoad, regionInfo);
      this.regions.notifyAll();
    }
    // Remove plan if one.
    clearRegionPlan(regionInfo);
    // Update timers for all regions in transition going against this server.
    updateTimers(serverInfo);
  }
Thanks J-D.

Jieshan Bean



------------------------------------------------

As far as I can tell your server is alive because it opened the region
just fine:

Handling transition=M_ZK_REGION_OFFLINE, server=167-6-1-11:20000,
region=612342de1fe4733f72299d70addb6d11
Handling transition=RS_ZK_REGION_OPENING,
server=167-6-1-11,20020,1308106004710,
region=612342de1fe4733f72299d70addb6d11
Handling transition=RS_ZK_REGION_OPENING,
server=167-6-1-11,20020,1308106004710,
region=612342de1fe4733f72299d70addb6d11
Handling transition=RS_ZK_REGION_OPENED,
server=167-6-1-11,20020,1308106004710,
region=612342de1fe4733f72299d70addb6d11

It has a different start code from when it was assigned (1308106004710
vs 1308105402003) which is pretty strange, and on top of that the line
with M_ZK_REGION_OFFLINE shows port 20000 instead of 20020.

I believe that this is only a small snippet of a bigger log, right? I
bet there is more we could learn in there, also it'd be nice if you
told us what was happening on your cluster at the same time,
understanding the context would help.

J-D

On Wed, Jun 22, 2011 at 2:31 AM, bijieshan <bijieshan@huawei.com> wrote:
> 1.The Rs 167-6-1-11,20020,1308105402003 finished shutdown at "10:46:37,774":
>   10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished
processing of shutdown of 167-6-1-11,20020,1308105402003
>
> 2.Overwriting happened, it seemed the RS was still exist in the set of AssignmentManager#regions:
> 10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 612342de1fe4733f72299d70addb6d11
on serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
>
> 3.Region was assigned to this dead RS at "10:50:20,671":
> 10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-11,20020,1308105402003
>
> I'm still working on this issue. I hope anyone can give out any suggestions.
>
> Thanks!
>
> Jieshan Bean
>
>
> Here's the logs tracked the region of "612342de1fe4733f72299d70addb6d11":
>
> 10:39:51,467 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000
Async create of unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
> 10:39:51,994 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback:
rs=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=OFFLINE,
ts=1308105591467, server=167-6-1-11,20020,1308105402003
> 10:39:52,182 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback:
rs=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=OFFLINE,
ts=1308105591467
> 10:41:43,316 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED,
server=167-6-1-11,20020,1308105402003, region=612342de1fe4733f72299d70addb6d11
> 10:43:22,938 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
> 10:43:22,938 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000
Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected
state RS_ZK_REGION_OPENED
> 10:43:23,153 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000
Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected
state RS_ZK_REGION_OPENED
> 10:43:23,157 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened
region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-11,20020,1308105402003
> 10:43:52,300 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning
2994 region(s) that 167-6-1-11,20020,1308105402003 was carrying (skipping 0 regions(s) that
are already in transition)
> 10:45:54,914 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000
Creating (or updating) unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
> 10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition
plan was found (or we are ignoring an existing plan) for Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.
so generated a random one; hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.,
src=, dest=167-6-1-12,20020,1308105397174; 2 (online=2, exclude=null) available servers
> 10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition
plan was found (or we are ignoring an existing plan) for Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.
so generated a random one; hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.,
src=, dest=167-6-1-12,20020,1308105397174; 2 (online=2, exclude=null) available servers
> 10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-12,20020,1308105397174
> 10:45:55,029 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING,
server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
> 10:45:55,063 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED,
server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
> 10:45:55,063 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
> 10:45:55,063 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000
Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected
state RS_ZK_REGION_OPENED
> 10:45:55,072 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000
Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected
state RS_ZK_REGION_OPENED
> 10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 612342de1fe4733f72299d70addb6d11
on serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
> 10:45:55,081 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened
region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-12,20020,1308105397174
> 10:46:37,774 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished
processing of shutdown of 167-6-1-11,20020,1308105402003
> 10:48:45,067 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.,
src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
> 10:48:45,067 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment
of region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. (offlining)
> 10:48:45,068 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=167-6-1-12,20020,1308105397174,
load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.
> 10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned
node: /hbase/unassigned/612342de1fe4733f72299d70addb6d11 (region=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.,
server=167-6-1-12,20020,1308105397174, state=RS_ZK_REGION_CLOSED)
> 10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned
node: /hbase/unassigned/612342de1fe4733f72299d70addb6d11 (region=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.,
server=167-6-1-12,20020,1308105397174, state=RS_ZK_REGION_CLOSED)
> 10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED,
server=167-6-1-12,20020,1308105397174, region=612342de1fe4733f72299d70addb6d11
> 10:50:20,527 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling
CLOSED event for 612342de1fe4733f72299d70addb6d11
> 10:50:20,580 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
was=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. state=CLOSED,
ts=1308106125379
> 10:50:20,580 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000
Creating (or updating) unassigned node for 612342de1fe4733f72299d70addb6d11 with OFFLINE state
> 10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing
plan for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.;
plan=hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174,
dest=167-6-1-11,20020,1308105402003
> 10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing
plan for region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.;
plan=hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11., src=167-6-1-12,20020,1308105397174,
dest=167-6-1-11,20020,1308105402003
> 10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 167-6-1-11,20020,1308105402003
> 10:50:20,692 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE,
server=167-6-1-11:20000, region=612342de1fe4733f72299d70addb6d11
> 10:50:20,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING,
server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
> 10:50:20,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING,
server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
> 10:50:20,779 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED,
server=167-6-1-11,20020,1308106004710, region=612342de1fe4733f72299d70addb6d11
> 10:51:33,245 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
OPENED event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
> 10:51:33,245 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000
Deleting existing unassigned node for 612342de1fe4733f72299d70addb6d11 that is in expected
state RS_ZK_REGION_OPENED
> 10:51:33,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x330912836370000
Successfully deleted unassigned node for region 612342de1fe4733f72299d70addb6d11 in expected
state RS_ZK_REGION_OPENED
> 10:51:33,312 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened
region Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 167-6-1-11,20020,1308106004710
>

Mime
View raw message