Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 19 Jul 2017 01:34:02 +0000 (UTC)
From: "huaxiang sun (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.13086355.1499800634000.277552.1500428042077@Atlassian.JIRA>
In-Reply-To: <JIRA.13086355.1499800634000@Atlassian.JIRA>
References: <JIRA.13086355.1499800634000@Atlassian.JIRA> <JIRA.13086355.1499800634743@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HBASE-18363) Hbck option to undeploy in memory
 replica parent region
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 19 Jul 2017 01:34:09 -0000


    [ https://issues.apache.org/jira/browse/HBASE-18363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092454#comment-16092454 ] 

huaxiang sun commented on HBASE-18363:
--------------------------------------

I checked the hbck code, "-fixAssignments" should be able to fix this in-memory state. I simulated this case
{code}
2017-07-18 18:19:10,192 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2017-07-18 18:19:10,192 INFO  [main] zookeeper.ZooKeeper: Session: 0x15d5869d2f50014 closed
2017-07-18 18:19:10,192 INFO  [main] util.HBaseFsck: Checking and fixing region consistency
*ERROR: Region { meta => null, hdfs => null, deployed => dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520;t1,r1,1500328224175_0001.d761ef3cc03d8a0124bb751f216f9285., replicaId => 1 } not in META, but deployed on dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
ERROR: No regioninfo in Meta or HDFS. { meta => null, hdfs => null, deployed => dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520;t1,r1,1500328224175_0001.d761ef3cc03d8a0124bb751f216f9285., replicaId => 1 }*
2017-07-18 18:19:10,200 INFO  [main] util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially.
2017-07-18 18:19:10,205 INFO  [main] util.HBaseFsck: Computing mapping of all store files

2017-07-18 18:19:10,214 INFO  [main] util.HBaseFsck: Validating mapping using HDFS state
2017-07-18 18:19:10,220 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=localhost:2181
2017-07-18 18:19:10,220 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hbase Fsck0x0, quorum=localhost:2181, baseZNode=/hbase
2017-07-18 18:19:10,221 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2017-07-18 18:19:10,222 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /127.0.0.1:60970, server: localhost/127.0.0.1:2181
2017-07-18 18:19:10,223 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15d5869d2f50016, negotiated timeout = 40000
2017-07-18 18:19:10,230 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2017-07-18 18:19:10,230 INFO  [main] zookeeper.ZooKeeper: Session: 0x15d5869d2f50016 closed
2017-07-18 18:19:10,231 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=localhost:2181
2017-07-18 18:19:10,231 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hbase Fsck0x0, quorum=localhost:2181, baseZNode=/hbase
2017-07-18 18:19:10,232 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2017-07-18 18:19:10,233 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /127.0.0.1:60971, server: localhost/127.0.0.1:2181
2017-07-18 18:19:10,234 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15d5869d2f50017, negotiated timeout = 40000
2017-07-18 18:19:10,236 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2017-07-18 18:19:10,236 INFO  [main] zookeeper.ZooKeeper: Session: 0x15d5869d2f50017 closed
2017-07-18 18:19:10,236 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=localhost:2181
2017-07-18 18:19:10,236 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hbase Fsck0x0, quorum=localhost:2181, baseZNode=/hbase
2017-07-18 18:19:10,238 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2017-07-18 18:19:10,238 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /127.0.0.1:60972, server: localhost/127.0.0.1:2181
2017-07-18 18:19:10,239 INFO  [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15d5869d2f50018, negotiated timeout = 40000
2017-07-18 18:19:10,258 INFO  [main] zookeeper.ZooKeeper: Session: 0x15d5869d2f50018 closed
Summary:2017-07-18 18:19:10,258 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down

Table hbase:meta is okay.
    Number of regions: 1
    Deployed on:  dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
Table t1 is okay.
    Number of regions: 4
    Deployed on:  dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
Table hbase:quota is okay.
    Number of regions: 1
    Deployed on:  dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
Table hbase:namespace is okay.
    Number of regions: 1
    Deployed on:  dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
1 inconsistencies detected.

{code}

I was able to fix this issue by running "hbase hbck -fixAssignments".

Resolve it as invalid.

> Hbck option to undeploy in memory replica parent region 
> --------------------------------------------------------
>
>                 Key: HBASE-18363
>                 URL: https://issues.apache.org/jira/browse/HBASE-18363
>             Project: HBase
>          Issue Type: Bug
>          Components: hbck
>    Affects Versions: 1.4.0, 2.0.0-alpha-1
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>            Priority: Minor
>
> We run into cases that with read replica, after split, sometimes, the parent replica region is left in  master's in memory onlineRegion list. This results in the region got assigned to a region server. Though the root cause will be fixed by HBASE-18025. We need to enhance hbck tool to fix this in-memory state. Currently, hbck only allows the fix for primary region (in this case, the primary region is gone) with fixAssignment option, please see the following line of code. We will enhance it so it can be applied to replica region as well.
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java#L2216


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)