hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HBASE-2781) ZKW.createUnassignedRegion doesn't make sure existing znode is in the right state
Date Wed, 14 Jul 2010 00:11:50 GMT

     [ https://issues.apache.org/jira/browse/HBASE-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jean-Daniel Cryans resolved HBASE-2781.
---------------------------------------

    Hadoop Flags: [Reviewed]
      Resolution: Fixed

Committed to trunk, thanks for the patch Karthik!

> ZKW.createUnassignedRegion doesn't make sure existing znode is in the right state
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2781
>                 URL: https://issues.apache.org/jira/browse/HBASE-2781
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Karthik Ranganathan
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: HBASE-2781-0.21.patch
>
>
> In ZKW.createUnassignedRegion I see this comment:
> {code}
>       // check if this node already exists - 
>       //   - it should not exist
>       //   - if it does, it should be in the CLOSED state
> {code}
> And what I got is:
> {noformat}
> 2010-06-23 15:42:05,823 INFO  [IPC Server handler 3 on 60362] master.ServerManager(457):
Processing MSG_REPORT_PROCESS_OPEN: test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
from h136.sfo.stumble.net,60365,1277332849712; 1 of 4
> 2010-06-23 15:42:05,867 INFO  [RegionServer:1.worker] regionserver.HRegionServer$Worker(1338):
Worker: MSG_REGION_OPEN: test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:05,870 DEBUG [RegionServer:1.worker] regionserver.RSZookeeperUpdater(157):
Updating ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_OPENING]
expected version = 0
> 2010-06-23 15:42:05,871 DEBUG [main-EventThread] master.HMaster(1158): Event NodeDataChanged
with state SyncConnected with path /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,871 DEBUG [main-EventThread] master.ZKMasterAddressWatcher(64): Got
event NodeDataChanged with path /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,871 DEBUG [main-EventThread] master.ZKUnassignedWatcher(95): ZK-EVENT-PROCESS:
Got zkEvent NodeDataChanged state:SyncConnected path:/1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,872 INFO  [main-EventThread] regionserver.HRegionServer(379): Got
ZooKeeper event, state: SyncConnected, type: NodeDataChanged, path: /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,872 DEBUG [MASTER_OPENREGION-10.10.1.136:60362-1] handler.MasterOpenRegionHandler(77):
Event = RS2ZK_REGION_OPENING, region = 13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,874 DEBUG [RegionServer:1.worker] regionserver.HRegion(297): Creating
region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,154 INFO  [RegionServer:1.worker] regionserver.HRegion(366): Onlined
test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.; next sequenceid=1
> 2010-06-23 15:42:06,154 DEBUG [RegionServer:1.worker] regionserver.RSZookeeperUpdater(157):
Updating ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_OPENED] expected
version = 1\
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:06,249 ERROR [RegionServer:1.worker] regionserver.HRegionServer(1488):
Failed to mark region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. as opened
> java.io.IOException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
= ConnectionLoss for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
= ConnectionLoss for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegionServer(1569): closing
region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(487): Closing test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.:
disabling compactions & flushes
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(512): Updates disabled
for region, no outstanding scanners on test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(519): No more row
locks outstanding on region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,994 INFO  [RegionServer:1] regionserver.HRegion(531): Closed test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:09,105 INFO  [master] master.ProcessServerShutdown(126): Region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
was in transition 
> name=test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464., state=PENDING_OPEN on
dead server h136.sfo.stumble.net,60365,1277332849712 - marking unassigned
> 2010-06-23 15:42:10,065 INFO  [IPC Server handler 2 on 60362] master.RegionManager(340):
Assigning region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. to h136.sfo.stumble.net,60363,1277332849671
> 2010-06-23 15:42:10,067 DEBUG [IPC Server handler 2 on 60362] zookeeper.ZooKeeperWrapper(1079):
While creating UNASSIGNED region 13bef4950ac6827ac32d87682b8b2464 exists, state = RS2ZK_REGION_OPENING
> 2010-06-23 15:42:10,126 WARN  [IPC Server handler 2 on 60362] zookeeper.ZooKeeperWrapper(1024):
<localhost:/1,org.apache.hadoop.hbase.master.HMaster>Failed to create ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
in ZooKeeper
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists
for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:10,127 DEBUG [IPC Server handler 2 on 60362] master.RegionManager(350):
Created UNASSIGNED zNode test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. in state
M2ZK_REGION_OFFLINE
> 2010-06-23 15:42:10,245 INFO  [RegionServer:0] regionserver.HRegionServer(511): MSG_REGION_OPEN:
test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:11,248 INFO  [IPC Server handler 1 on 60362] master.ServerManager(457):
Processing MSG_REPORT_PROCESS_OPEN: test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
from h136.sfo.stumble.net,60363,1277332849671; 7 of 13
> 2010-06-23 15:42:13,795 INFO  [RegionServer:0.worker] regionserver.HRegionServer$Worker(1338):
Worker: MSG_REGION_OPEN: test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:13,797 ERROR [RegionServer:0.worker] regionserver.RSZookeeperUpdater(107):
ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 is not in CLOSED/OFFLINE state (state
= RS2ZK_REGION_OPENING), will NOT open region.
> 2010-06-23 15:42:13,798 ERROR [RegionServer:0.worker] regionserver.HRegionServer(814):
Error opening test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> java.io.IOException: ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 is not in CLOSED/OFFLINE
state (state = RS2ZK_REGION_OPENING), will NOT open region.
> 2010-06-23 15:42:13,800 ERROR [RegionServer:0.worker] regionserver.RSZookeeperUpdater(141):
Aborting open of region 13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:13,800 DEBUG [RegionServer:0.worker] regionserver.RSZookeeperUpdater(157):
Updating ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_CLOSED] expected
version = 0
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:13,802 ERROR [RegionServer:0.worker] regionserver.HRegionServer(1473):
Failed to abort open region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> java.io.IOException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode
= BadVersion for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode
= BadVersion for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> {noformat}
> Basically:
>  # A region server was opening the region
>  # It was expired just before reporting that the region is opened, leaving the znode
in the state RS2ZK_REGION_OPENING
>  # The region gets reassigned, it sees that state, doesn't change it, but still outputs
in the end "Created UNASSIGNED zNode test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
in state M2ZK_REGION_OFFLINE"
>  # When the region server opens the region, it sees that the state is wrong and aborts
opening the region
> I think that the way to fix it is to change the state to what it should be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message