hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Reopened: (HBASE-2781) ZKW.createUnassignedRegion doesn't make sure existing znode is in the right state
Date Wed, 14 Jul 2010 04:37:50 GMT

     [ https://issues.apache.org/jira/browse/HBASE-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jean-Daniel Cryans reopened HBASE-2781:
---------------------------------------


Hate to do this but this patch was missing a few things. TestReplication failed again, for
the same reason, because some parts of RegionManager are still calling createUnassignedRegion
instead of createOrUpdateUnassignedRegion. Should we just redirect all the calls to the latter
and delete the former?

> ZKW.createUnassignedRegion doesn't make sure existing znode is in the right state
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2781
>                 URL: https://issues.apache.org/jira/browse/HBASE-2781
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Karthik Ranganathan
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: HBASE-2781-0.21.patch
>
>
> In ZKW.createUnassignedRegion I see this comment:
> {code}
>       // check if this node already exists - 
>       //   - it should not exist
>       //   - if it does, it should be in the CLOSED state
> {code}
> And what I got is:
> {noformat}
> 2010-06-23 15:42:05,823 INFO  [IPC Server handler 3 on 60362] master.ServerManager(457):
Processing MSG_REPORT_PROCESS_OPEN: test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
from h136.sfo.stumble.net,60365,1277332849712; 1 of 4
> 2010-06-23 15:42:05,867 INFO  [RegionServer:1.worker] regionserver.HRegionServer$Worker(1338):
Worker: MSG_REGION_OPEN: test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:05,870 DEBUG [RegionServer:1.worker] regionserver.RSZookeeperUpdater(157):
Updating ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_OPENING]
expected version = 0
> 2010-06-23 15:42:05,871 DEBUG [main-EventThread] master.HMaster(1158): Event NodeDataChanged
with state SyncConnected with path /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,871 DEBUG [main-EventThread] master.ZKMasterAddressWatcher(64): Got
event NodeDataChanged with path /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,871 DEBUG [main-EventThread] master.ZKUnassignedWatcher(95): ZK-EVENT-PROCESS:
Got zkEvent NodeDataChanged state:SyncConnected path:/1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,872 INFO  [main-EventThread] regionserver.HRegionServer(379): Got
ZooKeeper event, state: SyncConnected, type: NodeDataChanged, path: /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,872 DEBUG [MASTER_OPENREGION-10.10.1.136:60362-1] handler.MasterOpenRegionHandler(77):
Event = RS2ZK_REGION_OPENING, region = 13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,874 DEBUG [RegionServer:1.worker] regionserver.HRegion(297): Creating
region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,154 INFO  [RegionServer:1.worker] regionserver.HRegion(366): Onlined
test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.; next sequenceid=1
> 2010-06-23 15:42:06,154 DEBUG [RegionServer:1.worker] regionserver.RSZookeeperUpdater(157):
Updating ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_OPENED] expected
version = 1\
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:06,249 ERROR [RegionServer:1.worker] regionserver.HRegionServer(1488):
Failed to mark region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. as opened
> java.io.IOException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
= ConnectionLoss for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
= ConnectionLoss for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegionServer(1569): closing
region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(487): Closing test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.:
disabling compactions & flushes
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(512): Updates disabled
for region, no outstanding scanners on test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(519): No more row
locks outstanding on region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,994 INFO  [RegionServer:1] regionserver.HRegion(531): Closed test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:09,105 INFO  [master] master.ProcessServerShutdown(126): Region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
was in transition 
> name=test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464., state=PENDING_OPEN on
dead server h136.sfo.stumble.net,60365,1277332849712 - marking unassigned
> 2010-06-23 15:42:10,065 INFO  [IPC Server handler 2 on 60362] master.RegionManager(340):
Assigning region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. to h136.sfo.stumble.net,60363,1277332849671
> 2010-06-23 15:42:10,067 DEBUG [IPC Server handler 2 on 60362] zookeeper.ZooKeeperWrapper(1079):
While creating UNASSIGNED region 13bef4950ac6827ac32d87682b8b2464 exists, state = RS2ZK_REGION_OPENING
> 2010-06-23 15:42:10,126 WARN  [IPC Server handler 2 on 60362] zookeeper.ZooKeeperWrapper(1024):
<localhost:/1,org.apache.hadoop.hbase.master.HMaster>Failed to create ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
in ZooKeeper
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists
for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:10,127 DEBUG [IPC Server handler 2 on 60362] master.RegionManager(350):
Created UNASSIGNED zNode test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. in state
M2ZK_REGION_OFFLINE
> 2010-06-23 15:42:10,245 INFO  [RegionServer:0] regionserver.HRegionServer(511): MSG_REGION_OPEN:
test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:11,248 INFO  [IPC Server handler 1 on 60362] master.ServerManager(457):
Processing MSG_REPORT_PROCESS_OPEN: test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
from h136.sfo.stumble.net,60363,1277332849671; 7 of 13
> 2010-06-23 15:42:13,795 INFO  [RegionServer:0.worker] regionserver.HRegionServer$Worker(1338):
Worker: MSG_REGION_OPEN: test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:13,797 ERROR [RegionServer:0.worker] regionserver.RSZookeeperUpdater(107):
ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 is not in CLOSED/OFFLINE state (state
= RS2ZK_REGION_OPENING), will NOT open region.
> 2010-06-23 15:42:13,798 ERROR [RegionServer:0.worker] regionserver.HRegionServer(814):
Error opening test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> java.io.IOException: ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 is not in CLOSED/OFFLINE
state (state = RS2ZK_REGION_OPENING), will NOT open region.
> 2010-06-23 15:42:13,800 ERROR [RegionServer:0.worker] regionserver.RSZookeeperUpdater(141):
Aborting open of region 13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:13,800 DEBUG [RegionServer:0.worker] regionserver.RSZookeeperUpdater(157):
Updating ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_CLOSED] expected
version = 0
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:13,802 ERROR [RegionServer:0.worker] regionserver.HRegionServer(1473):
Failed to abort open region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> java.io.IOException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode
= BadVersion for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode
= BadVersion for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> {noformat}
> Basically:
>  # A region server was opening the region
>  # It was expired just before reporting that the region is opened, leaving the znode
in the state RS2ZK_REGION_OPENING
>  # The region gets reassigned, it sees that state, doesn't change it, but still outputs
in the end "Created UNASSIGNED zNode test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
in state M2ZK_REGION_OFFLINE"
>  # When the region server opens the region, it sees that the state is wrong and aborts
opening the region
> I think that the way to fix it is to change the state to what it should be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message