Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B35CC17B20 for ; Wed, 18 Feb 2015 15:59:15 +0000 (UTC) Received: (qmail 55891 invoked by uid 500); 18 Feb 2015 15:59:12 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 55838 invoked by uid 500); 18 Feb 2015 15:59:12 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 55826 invoked by uid 99); 18 Feb 2015 15:59:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Feb 2015 15:59:12 +0000 Date: Wed, 18 Feb 2015 15:59:12 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-13061) RegionStates can remove wrong region from server holdings MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326079#comment-14326079 ] Hudson commented on HBASE-13061: -------------------------------- SUCCESS: Integrated in HBase-1.0 #756 (See [https://builds.apache.org/job/HBase-1.0/756/]) HBASE-13061 RegionStates can remove wrong region from server holdings (Andrey Stepachev) (tedyu: rev 78594e8691638764d6dfaf55514068a46dc507a5) * hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java > RegionStates can remove wrong region from server holdings > --------------------------------------------------------- > > Key: HBASE-13061 > URL: https://issues.apache.org/jira/browse/HBASE-13061 > Project: HBase > Issue Type: Bug > Components: Region Assignment > Affects Versions: 1.0.0, 2.0.0 > Reporter: Andrey Stepachev > Assignee: Andrey Stepachev > Fix For: 2.0.0, 1.0.1, 1.1.0 > > Attachments: HBASE-13061.patch > > > Got failed test in HBASE-13017. It seems that with zk nodes were ordered in one way and test didn't trigger error, but with new meta rows ordered differently test became flakey. > That leads to interesting sequence of offline/online regions and triggers bug and NPE in AM (thats seen in TestZKLessAMOnCluster) > That can happen if region was moved from RS1 to other region server RS2, and thats happens that RS2 failed. Region remains in PENDING_OPEN. SSH will offline it from RS1(without removing from oldAssignments because of disabled table). When AssingnmentManager come and assign region it then removes oldAssignment of region from serverHoldings. And thats happen to be our just assigned RS1. > Small bit of logs. Most interesting are last 3 lines, region b73fe9f1185361e846b0e1ceb7d6d64e added to server and immediately removed from it. Later that triggers NPE in disable table handler. > {code} > 2015-02-18 01:21:18,338 INFO [Thread-436] master.RegionStates(1109): Transition {b73fe9f1185361e846b0e1ceb7d6d64e state=PENDING_OPEN, ts=1424222478324, server=octobook.home,65370,1424222474885} to {b73fe9f1185361e846b0e1ceb7d6d64e state=OFFLINE, ts=1424222478338, server=octobook.home,65370,1424222474885} > 2015-02-18 01:21:18,339 INFO [Thread-436] master.RegionStateStore(218): Updating row testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e. with state=OFFLINE > 2015-02-18 01:21:18,340 DEBUG [Thread-436] master.RegionStates(591): Old server name for {ENCODED => b73fe9f1185361e846b0e1ceb7d6d64e, NAME => 'testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.', STARTKEY => 'I', ENDKEY => 'Q'} is null > 2015-02-18 01:21:18,340 INFO [Thread-436] master.RegionStates(1109): Transition {b73fe9f1185361e846b0e1ceb7d6d64e state=OFFLINE, ts=1424222478338, server=octobook.home,65370,1424222474885} to {b73fe9f1185361e846b0e1ceb7d6d64e state=OPEN, ts=1424222478340, server=octobook.home,65359,1424222474743} > 2015-02-18 01:21:18,341 INFO [Thread-436] master.RegionStateStore(218): Updating row testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e. with state=OPEN&sn=octobook.home,65359,1424222474743 > 2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(457): Onlined b73fe9f1185361e846b0e1ceb7d6d64e on octobook.home,65359,1424222474743 {ENCODED => b73fe9f1185361e846b0e1ceb7d6d64e, NAME => 'testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState,I,1424222477651.b73fe9f1185361e846b0e1ceb7d6d64e.', STARTKEY => 'I', ENDKEY => 'Q'} > 2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(481): Adding b73fe9f1185361e846b0e1ceb7d6d64e to server octobook.home,65359,1424222474743 > 2015-02-18 01:21:18,342 INFO [Thread-436] master.RegionStates(467): Offlined b73fe9f1185361e846b0e1ceb7d6d64e from octobook.home,65359,1424222474743 > 2015-02-18 01:21:18,342 DEBUG [Thread-436] master.RegionStates(496): Removing b73fe9f1185361e846b0e1ceb7d6d64e from server octobook.home,65359,1424222474743 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)