hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13061) RegionStates can remove wrong region from server holdings
Date Wed, 18 Feb 2015 02:06:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325306#comment-14325306

Ted Yu commented on HBASE-13061:


> RegionStates can remove wrong region from server holdings
> ---------------------------------------------------------
>                 Key: HBASE-13061
>                 URL: https://issues.apache.org/jira/browse/HBASE-13061
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 1.0.0, 2.0.0
>            Reporter: Andrey Stepachev
>            Assignee: Andrey Stepachev
>         Attachments: HBASE-13061.patch
> Got failed test in HBASE-13017. It seems that with zk nodes were ordered in one way and
test didn't trigger error, but with new meta rows ordered differently test became flakey.
> That leads to interesting sequence of offline/online regions and triggers bug and NPE
in AM (thats seen in TestZKLessAMOnCluster)
> That can happen if region was moved from RS1 to other region server RS2, and thats happens
that RS2 failed. Region remains in PENDING_OPEN. SSH will offline it from RS1(without removing
from oldAssignments because of disabled table). When AssingnmentManager come and assign region
it then removes oldAssignment of region from serverHoldings. And thats happen to be our just
assigned RS1.
> Small bit of logs. Most interesting are last 3 lines, region b73fe9f1185361e846b0e1ceb7d6d64e
added to server and immediately removed from it. Later that triggers NPE in disable table
> {code}
> 2015-02-18 01:21:18,338 INFO  [Thread-436] master.RegionStates(1109): Transition {b73fe9f1185361e846b0e1ceb7d6d64e
state=PENDING_OPEN, ts=1424222478324, server=octobook.home,65370,1424222474885} to {b73fe9f1185361e846b0e1ceb7d6d64e
state=OFFLINE, ts=1424222478338, server=octobook.home,65370,1424222474885}
> 2015-02-18 01:21:18,339 INFO  [Thread-436] master.RegionStateStore(218): Updating row
with state=OFFLINE
> 2015-02-18 01:21:18,340 DEBUG [Thread-436] master.RegionStates(591): Old server name
for {ENCODED => b73fe9f1185361e846b0e1ceb7d6d64e, NAME => 'testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.',
STARTKEY => 'I', ENDKEY => 'Q'} is null
> 2015-02-18 01:21:18,340 INFO  [Thread-436] master.RegionStates(1109): Transition {b73fe9f1185361e846b0e1ceb7d6d64e
state=OFFLINE, ts=1424222478338, server=octobook.home,65370,1424222474885} to {b73fe9f1185361e846b0e1ceb7d6d64e
state=OPEN, ts=1424222478340, server=octobook.home,65359,1424222474743}
> 2015-02-18 01:21:18,341 INFO  [Thread-436] master.RegionStateStore(218): Updating row
with state=OPEN&sn=octobook.home,65359,1424222474743
> 2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(457): Onlined b73fe9f1185361e846b0e1ceb7d6d64e
on octobook.home,65359,1424222474743 {ENCODED => b73fe9f1185361e846b0e1ceb7d6d64e, NAME
=> 'testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.',
> 2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(481): Adding  b73fe9f1185361e846b0e1ceb7d6d64e
to server octobook.home,65359,1424222474743
> 2015-02-18 01:21:18,342 INFO  [Thread-436] master.RegionStates(467): Offlined b73fe9f1185361e846b0e1ceb7d6d64e
from octobook.home,65359,1424222474743
> 2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(496): Removing b73fe9f1185361e846b0e1ceb7d6d64e
from server octobook.home,65359,1424222474743
> {code}

This message was sent by Atlassian JIRA

View raw message