Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8462D200CD8 for ; Wed, 19 Jul 2017 03:34:09 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 82CCD168003; Wed, 19 Jul 2017 01:34:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A3B1A168001 for ; Wed, 19 Jul 2017 03:34:08 +0200 (CEST) Received: (qmail 29820 invoked by uid 500); 19 Jul 2017 01:34:07 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 29808 invoked by uid 99); 19 Jul 2017 01:34:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Jul 2017 01:34:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id F1BBF1A0AAC for ; Wed, 19 Jul 2017 01:34:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -97.002 X-Spam-Level: X-Spam-Status: No, score=-97.002 tagged_above=-999 required=6.31 tests=[KAM_TIME=3, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id LO96XdI4qGoS for ; Wed, 19 Jul 2017 01:34:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 1083D5F2F1 for ; Wed, 19 Jul 2017 01:34:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id EA9B6E090E for ; Wed, 19 Jul 2017 01:34:03 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 135B621EA2 for ; Wed, 19 Jul 2017 01:34:02 +0000 (UTC) Date: Wed, 19 Jul 2017 01:34:02 +0000 (UTC) From: "huaxiang sun (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-18363) Hbck option to undeploy in memory replica parent region MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 19 Jul 2017 01:34:09 -0000 [ https://issues.apache.org/jira/browse/HBASE-18363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092454#comment-16092454 ] huaxiang sun commented on HBASE-18363: -------------------------------------- I checked the hbck code, "-fixAssignments" should be able to fix this in-memory state. I simulated this case {code} 2017-07-18 18:19:10,192 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down 2017-07-18 18:19:10,192 INFO [main] zookeeper.ZooKeeper: Session: 0x15d5869d2f50014 closed 2017-07-18 18:19:10,192 INFO [main] util.HBaseFsck: Checking and fixing region consistency *ERROR: Region { meta => null, hdfs => null, deployed => dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520;t1,r1,1500328224175_0001.d761ef3cc03d8a0124bb751f216f9285., replicaId => 1 } not in META, but deployed on dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520 ERROR: No regioninfo in Meta or HDFS. { meta => null, hdfs => null, deployed => dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520;t1,r1,1500328224175_0001.d761ef3cc03d8a0124bb751f216f9285., replicaId => 1 }* 2017-07-18 18:19:10,200 INFO [main] util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially. 2017-07-18 18:19:10,205 INFO [main] util.HBaseFsck: Computing mapping of all store files 2017-07-18 18:19:10,214 INFO [main] util.HBaseFsck: Validating mapping using HDFS state 2017-07-18 18:19:10,220 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=localhost:2181 2017-07-18 18:19:10,220 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hbase Fsck0x0, quorum=localhost:2181, baseZNode=/hbase 2017-07-18 18:19:10,221 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2017-07-18 18:19:10,222 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /127.0.0.1:60970, server: localhost/127.0.0.1:2181 2017-07-18 18:19:10,223 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15d5869d2f50016, negotiated timeout = 40000 2017-07-18 18:19:10,230 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down 2017-07-18 18:19:10,230 INFO [main] zookeeper.ZooKeeper: Session: 0x15d5869d2f50016 closed 2017-07-18 18:19:10,231 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=localhost:2181 2017-07-18 18:19:10,231 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hbase Fsck0x0, quorum=localhost:2181, baseZNode=/hbase 2017-07-18 18:19:10,232 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2017-07-18 18:19:10,233 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /127.0.0.1:60971, server: localhost/127.0.0.1:2181 2017-07-18 18:19:10,234 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15d5869d2f50017, negotiated timeout = 40000 2017-07-18 18:19:10,236 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down 2017-07-18 18:19:10,236 INFO [main] zookeeper.ZooKeeper: Session: 0x15d5869d2f50017 closed 2017-07-18 18:19:10,236 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=localhost:2181 2017-07-18 18:19:10,236 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hbase Fsck0x0, quorum=localhost:2181, baseZNode=/hbase 2017-07-18 18:19:10,238 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2017-07-18 18:19:10,238 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /127.0.0.1:60972, server: localhost/127.0.0.1:2181 2017-07-18 18:19:10,239 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15d5869d2f50018, negotiated timeout = 40000 2017-07-18 18:19:10,258 INFO [main] zookeeper.ZooKeeper: Session: 0x15d5869d2f50018 closed Summary:2017-07-18 18:19:10,258 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down Table hbase:meta is okay. Number of regions: 1 Deployed on: dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520 Table t1 is okay. Number of regions: 4 Deployed on: dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520 Table hbase:quota is okay. Number of regions: 1 Deployed on: dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520 Table hbase:namespace is okay. Number of regions: 1 Deployed on: dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520 1 inconsistencies detected. {code} I was able to fix this issue by running "hbase hbck -fixAssignments". Resolve it as invalid. > Hbck option to undeploy in memory replica parent region > -------------------------------------------------------- > > Key: HBASE-18363 > URL: https://issues.apache.org/jira/browse/HBASE-18363 > Project: HBase > Issue Type: Bug > Components: hbck > Affects Versions: 1.4.0, 2.0.0-alpha-1 > Reporter: huaxiang sun > Assignee: huaxiang sun > Priority: Minor > > We run into cases that with read replica, after split, sometimes, the parent replica region is left in master's in memory onlineRegion list. This results in the region got assigned to a region server. Though the root cause will be fixed by HBASE-18025. We need to enhance hbck tool to fix this in-memory state. Currently, hbck only allows the fix for primary region (in this case, the primary region is gone) with fixAssignment option, please see the following line of code. We will enhance it so it can be applied to replica region as well. > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java#L2216 -- This message was sent by Atlassian JIRA (v6.4.14#64029)