Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7BE06171C6 for ; Tue, 9 Jun 2015 20:46:04 +0000 (UTC) Received: (qmail 62585 invoked by uid 500); 9 Jun 2015 20:46:01 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 62341 invoked by uid 500); 9 Jun 2015 20:46:01 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 62113 invoked by uid 99); 9 Jun 2015 20:46:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Jun 2015 20:46:01 +0000 Date: Tue, 9 Jun 2015 20:46:01 +0000 (UTC) From: "Ted Yu (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-13875) Clock skew between master and region server may render restored region without server address MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Ted Yu created HBASE-13875: ------------------------------ Summary: Clock skew between master and region server may render restored region without server address Key: HBASE-13875 URL: https://issues.apache.org/jira/browse/HBASE-13875 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu We observed the following issue in cluster testing on a restored table (table_gwbh9rxyz3). {code} 2015-06-08 14:29:47,313|beaver.component.hbase|INFO|6196|140144585275136|MainThread| 'get 'table_gwbh9rxyz3','row1', {COLUMN => 'family1'}' ... 2015-06-08 14:31:38,203|beaver.machine|INFO|6196|140144585275136|MainThread|ERROR: No server address listed in hbase:meta for region table_gwbh9rxyz3,,1433773371699. 48652273628a291653d8c43aaa02179a. containing row row1 {code} Here was related log snippet from master - part for RestoreSnapshotHandler#handleTableOperation(): {code} 2015-06-08 14:28:41,968 DEBUG [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] snapshot.RestoreSnapshotHelper: starting restore 2015-06-08 14:28:41,969 DEBUG [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] snapshot.RestoreSnapshotHelper: get table regions: hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3 2015-06-08 14:28:41,984 DEBUG [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] snapshot.RestoreSnapshotHelper: found 1 regions for table=table_gwbh9rxyz3 2015-06-08 14:28:41,984 INFO [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] snapshot.RestoreSnapshotHelper: region to restore: 48652273628a291653d8c43aaa02179a 2015-06-08 14:28:42,001 DEBUG [RestoreSnapshot-pool584-t1] backup.HFileArchiver: Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, file:hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66, to hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/archive/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66 2015-06-08 14:28:42,002 INFO [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: Deleted [] 2015-06-08 14:28:42,002 INFO [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: Added 0 2015-06-08 14:28:42,014 INFO [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: Deleted [{ENCODED => 48652273628a291653d8c43aaa02179a, NAME => 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY => '', ENDKEY => ''}] 2015-06-08 14:28:42,022 DEBUG [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: Verify snapshot=table_gwbh9rxyz3-ru-20150608 against=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 2015-06-08 14:28:42,022 DEBUG [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: Sentinel is not yet finished with restoring snapshot={ ss=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 type=FLUSH } 2015-06-08 14:28:42,038 INFO [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: Added 2 2015-06-08 14:28:42,038 INFO [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: Overwritten [{ENCODED => 48652273628a291653d8c43aaa02179a, NAME => 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY => '', ENDKEY => ''}] {code} Here was log snippet from region server - corresponding to table being enabled after snapshot restore: {code} 2015-06-08 14:28:41,914 DEBUG [RS_OPEN_REGION-ip-172-31-46-239:51852-2] zookeeper.ZKAssign: regionserver:51852-0x24dd2833c34000b, quorum=ip-172-31-46-239.ec2.internal:2181,ip-172-31-46-241.ec2.internal:2181,ip-172-31-46-242.ec2.internal:2181, baseZNode=/services/slider/users/hbase/hbasesliderapp Attempting to retransition opening state of node 48652273628a291653d8c43aaa02179a 2015-06-08 14:28:41,916 INFO [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] regionserver.HRegionServer: Post open deploy tasks for table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a. 2015-06-08 14:28:41,920 INFO [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] hbase.MetaTableAccessor: Updated row table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a. with server=ip-172-31-46-239.ec2.internal,51852,1433758173941 2015-06-08 14:28:41,920 DEBUG [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] regionserver.HRegionServer: Finished post open deploy task for table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a {code} What happened was that due to clock skew, server location (ip-172-31-46-239.ec2.internal) for the region was eclipsed by the delete marker put in by MetaTableAccessor#overwriteRegions() Thanks for [~devaraj] for pair debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)