hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7799) reassigning region stuck in open still may not work correctly due to leftover ZK node
Date Mon, 11 Feb 2013 17:37:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575925#comment-13575925
] 

ramkrishna.s.vasudevan commented on HBASE-7799:
-----------------------------------------------

Okie.  If you can attach the logs it would be great.  I can take a more closer look at this
if you don't mind.
                
> reassigning region stuck in open still may not work correctly due to leftover ZK node
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7799
>                 URL: https://issues.apache.org/jira/browse/HBASE-7799
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>
> (logs grepped by region name, and abridged.
> META server was dead so OpenRegionHandler for the region took a while, and was interrupted:
> {code}
> 2013-02-08 14:35:01,555 DEBUG [RS_OPEN_REGION-10.11.2.92,64485,1360362800564-2] handler.OpenRegionHandler(255):
Interrupting thread Thread[PostOpenDeployTasks:871d1c3bdf98a2c93b527cb6cc61327d,5,main]
> {code}
> Then master tried to force region offline and reassign:
> {code}
> 2013-02-08 14:35:06,500 INFO  [MASTER_SERVER_OPERATIONS-10.11.2.92,64483,1360362800340-1]
master.RegionStates(347): Found opening region {IntegrationTestRebalanceAndKillServersTargeted,7333332c,1360362805563.871d1c3bdf98a2c93b527cb6cc61327d.
state=OPENING, ts=1360362901596, server=10.11.2.92,64485,1360362800564} to be reassigned by
SSH for 10.11.2.92,64485,1360362800564
> 2013-02-08 14:35:06,500 INFO  [MASTER_SERVER_OPERATIONS-10.11.2.92,64483,1360362800340-1]
master.RegionStates(242): Region {NAME => 'IntegrationTestRebalanceAndKillServersTargeted,7333332c,1360362805563.871d1c3bdf98a2c93b527cb6cc61327d.',
STARTKEY => '7333332c', ENDKEY => '7ffffff8', ENCODED => 871d1c3bdf98a2c93b527cb6cc61327d,}
transitioned from {IntegrationTestRebalanceAndKillServersTargeted,7333332c,1360362805563.871d1c3bdf98a2c93b527cb6cc61327d.
state=OPENING, ts=1360362901596, server=10.11.2.92,64485,1360362800564} to {IntegrationTestRebalanceAndKillServersTargeted,7333332c,1360362805563.871d1c3bdf98a2c93b527cb6cc61327d.
state=CLOSED, ts=1360362906500, server=null}
> 2013-02-08 14:35:06,505 DEBUG [10.11.2.92,64483,1360362800340-GeneralBulkAssigner-1]
master.AssignmentManager(1530): Forcing OFFLINE; was={IntegrationTestRebalanceAndKillServersTargeted,7333332c,1360362805563.871d1c3bdf98a2c93b527cb6cc61327d.
state=CLOSED, ts=1360362906500, server=null}
> 2013-02-08 14:35:06,506 DEBUG [10.11.2.92,64483,1360362800340-GeneralBulkAssigner-1]
zookeeper.ZKAssign(176): master:64483-0x13cbbf1025d0000 Async create of unassigned node for
871d1c3bdf98a2c93b527cb6cc61327d with OFFLINE state
> {code}
> But didn't delete the original ZK node?
> {code}
> 2013-02-08 14:35:06,509 WARN  [main-EventThread] master.OfflineCallback(59): Node for
/hbase/region-in-transition/871d1c3bdf98a2c93b527cb6cc61327d already exists
> 2013-02-08 14:35:06,509 DEBUG [main-EventThread] master.OfflineCallback(69): rs={IntegrationTestRebalanceAndKillServersTargeted,7333332c,1360362805563.871d1c3bdf98a2c93b527cb6cc61327d.
state=OFFLINE, ts=1360362906506, server=null}, server=10.11.2.92,64488,1360362800651
> 2013-02-08 14:35:06,512 DEBUG [main-EventThread] master.OfflineCallback$ExistCallback(106):
rs={IntegrationTestRebalanceAndKillServersTargeted,7333332c,1360362805563.871d1c3bdf98a2c93b527cb6cc61327d.
state=OFFLINE, ts=1360362906506, server=null}, server=10.11.2.92,64488,1360362800651
> {code}
> So it went into infinite cycle of failing to assign due to this:
> {code}
> 2013-02-08 14:35:06,517 INFO  [PRI IPC Server handler 7 on 64488] regionserver.HRegionServer(3435):
Received request to open region: IntegrationTestRebalanceAndKillServersTargeted,7333332c,1360362805563.871d1c3bdf98a2c93b527cb6cc61327d.
on 10.11.2.92,64488,1360362800651
> 2013-02-08 14:35:06,521 WARN  [RS_OPEN_REGION-10.11.2.92,64488,1360362800651-0] zookeeper.ZKAssign(762):
regionserver:64488-0x13cbbf1025d0004 Attempt to transition the unassigned node for 871d1c3bdf98a2c93b527cb6cc61327d
from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the node existed but was in the state
RS_ZK_REGION_OPENING set by the server [wrong server name redacted, see HBASE-7798]
> {code}
> Transitioning failed-to-open similarly fails.
> It seems like master needs to nuke ZK node unconditionally to offline?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message