Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0FA9D11DD0 for ; Mon, 25 Aug 2014 06:45:59 +0000 (UTC) Received: (qmail 68847 invoked by uid 500); 25 Aug 2014 06:45:58 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 68796 invoked by uid 500); 25 Aug 2014 06:45:58 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 68781 invoked by uid 99); 25 Aug 2014 06:45:58 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Aug 2014 06:45:58 +0000 Date: Mon, 25 Aug 2014 06:45:58 +0000 (UTC) From: "Hadoop QA (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-11536) Puts of region location to Meta may be out of order which causes inconsistent of region location MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-11536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108818#comment-14108818 ] Hadoop QA commented on HBASE-11536: ----------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664095/11536-trunk.txt against trunk revision . ATTACHMENT ID: 12664095 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 6 warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpoint Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10559//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10559//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10559//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10559//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10559//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10559//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10559//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10559//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10559//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10559//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10559//console This message is automatically generated. > Puts of region location to Meta may be out of order which causes inconsistent of region location > ------------------------------------------------------------------------------------------------ > > Key: HBASE-11536 > URL: https://issues.apache.org/jira/browse/HBASE-11536 > Project: HBase > Issue Type: Bug > Components: Region Assignment > Reporter: Liu Shaohui > Assignee: Liu Shaohui > Priority: Critical > Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 > > Attachments: 10.237.12.13.log, 10.237.12.15.log, 11536-trunk.txt, HBASE-11536-0.94-v1.diff > > > In product hbase cluster, we found inconsistency of region location in the meta table. Region cdfa2ed711bbdf054d9733a92fd43eb5 is onlined in regionserver 10.237.12.13:11600 but the region location in Meta table is 10.237.12.15:11600. > This is because of the out-of-order puts for meta table. > # HMaster try to assign the region to 10.237.12.15:11600. > # RegionServer: 10.237.12.15:11600. During the opening the region, the put of region location(10.237.12.15:11600) to meta table is timeout(60s) and the htable retry for second time. (regionserver serving meta has got the request of the put. The timeout is beause ther is a bad disk in this regionserver and sync of hlog is very slow. > ) > During the retry in htable, the OpenRegionHandler is timeout(100s) and the PostOpenDeployTasksThread is interrupted. Through the htable is closed in the MetaEditor finally, the share connection the htable used is not closed and the call of put for meta table is on-flying in the connection. Assumed that this on-flying call of put to meta is named call A. > # RegionServer: 10.237.12.15:11600. For the timeout of OpenRegionHandler, the OpenRegionHandler marks the assign state of this region to FAILED_OPEN. > # HMaster watchs this event of FAILED_OPEN and assigns the region to another regionserver: 10.237.12.13:11600 > # RegionServer: 10.237.12.13:11600. This regionserver opens the region successfully . Assumed that the put of region location(10.237.12.13:11600) to meta table in this regionserver is named B. > There is no order guarantee for call A and B. If call A is processed after call B in regionserver serving meta region, the region location in meta table will be wrong. > From the raw scan of meta table we found: > {code} > scan '.META.', {RAW => true, LIMIT => 1, VERSIONS => 10, STARTROW => 'xxx.adfa2ed711bbdf054d9733a92fd43eb5.'} > {code} > {quote} > xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, timestamp=1404885460553(=> Wed Jul 09 13:57:40 +0800 2014), value=10.237.12.15:11600 --> Retry put from 10.237.12.15 > xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, timestamp=1404885456731(=> Wed Jul 09 13:57:36 +0800 2014), value=10.237.12.13:11600 --> put from 10.237.12.13 > > xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, timestamp=1404885353122( Wed Jul 09 13:55:53 +0800 2014), value=10.237.12.15:11600 --> First put from 10.237.12.15 > {quote} > Related hbase log is attached in this issue and disscusions are welcomed. > For there is no order guarantee for puts from different htables, one solution for this issue is to give an increased id for each assignment of a region and use this id as the timestamp of put of region location to meta table. The region location with large assign id will be got by hbase clients. -- This message was sent by Atlassian JIRA (v6.2#6252)