Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 532CA18CDB for ; Fri, 15 May 2015 22:49:09 +0000 (UTC) Received: (qmail 35045 invoked by uid 500); 15 May 2015 22:49:07 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 34981 invoked by uid 500); 15 May 2015 22:49:07 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Delivered-To: moderator for user@hbase.apache.org Received: (qmail 20750 invoked by uid 99); 15 May 2015 22:42:16 -0000 X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.129 X-Spam-Level: *** X-Spam-Status: No, score=3.129 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, KAM_LOTSOFHASH=0.25, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=PxhNUHYqSAHCFLDmJyVUusrAMJpMsLabBfz3PwouAeA=; b=L6f+PGcLq0qO0kXpX3vyuuc4CHQaPg2RZcaxYoF81MX7kWvJfSC5VPaDdziQDFqC+Y EMvZHGyrnrRzsLf1JKXc0R7b2bKr+u9fKgGbiXjhkeqLwHX1UlX0QxJbFqD00478aM4m I0dNn5A8aHWEJ3OQub9fhg7Jm6yjCIYO1WRBqHdTQ7VUZHnnglM1guAM5X/9YX/S3BZ0 lsjARLK9z8mjmD3LyTzoI/lW57LFVVFsydQ7h2/ikJaVVpFh/UsB/PcQ+CUo9IAhJZ6A NjwceUHHuSSbQN7XtUbo8fld5qgK5UYUwCF7QJxiaDoGCHw9Y5x0kcb6QB62It5U5YWZ d5nw== MIME-Version: 1.0 X-Received: by 10.60.84.65 with SMTP id w1mr10538347oey.2.1431729637519; Fri, 15 May 2015 15:40:37 -0700 (PDT) Date: Fri, 15 May 2015 15:40:37 -0700 Message-ID: Subject: hbase 0.94.7 snapshot problem From: Neutron sharc To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=089e0111b92a75789e0516268804 --089e0111b92a75789e0516268804 Content-Type: text/plain; charset=UTF-8 Hi HBase community, I'm seeing a problem with hbase snapshot with 0.94.7 (CDH 4.2.0) When I manually run "snapshot , " to take a snapshot, I keep getting error about "Failed taking snapshot { ss=ss_rich_pin_data_v1 table=rich_pin_data_v1 type=SKIPFLUSH } due to exception:No region directory found for region {xyz...}". I tried move around the region at problem, but another region will see same issue the next time. I tried a workaround (setting hbase.regionserver.ipc.address to 0.0.0.0) suggested somewhere, but that doesn't work. (here is the link https://groups.google.com/a/cloudera.org/forum/#!topic/scm-users/B3fSsY6BgWI ). Below is an excerpt from master log: 2015-05-15 22:17:18,807 INFO org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Running SKIPFLUSH table snapshot ss_rich_pin_data_v1 C_M_SNAPSHOT_TABLE on table rich_pin_data_v1 2015-05-15 22:17:19,308 INFO org.apache.hadoop.hbase.procedure.Procedure: Starting procedure 'ss_rich_pin_data_v1' 2015-05-15 22:17:54,346 ERROR org.apache.hadoop.hbase.procedure.Procedure: Procedure 'ss_rich_pin_data_v1' execution failed! org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via timer-java.util.Timer@14004920:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign Exception Start:1431728239316, End:1431728274317, diff:35001, max:35000 ms at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:85) at org.apache.hadoop.hbase.procedure.Procedure.waitForLatch(Procedure.java:369) at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:208) at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:68) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign Exception Start:1431728239316, End:1431728274317, diff:35001, max:35000 ms at org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:71) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2015-05-15 22:17:54,347 INFO org.apache.hadoop.hbase.procedure.ZKProcedureUtil: Clearing all znodes for procedure ss_rich_pin_data_v1including nodes /hbase/online-snapshot/acquired /hbase/online-snapshot/reached /hbase/online-snapshot/abort 2015-05-15 22:17:54,383 INFO org.apache.hadoop.hbase.master.snapshot.EnabledTableSnapshotHandler: Done waiting - snapshot for ss_rich_pin_data_v1 finished! 2015-05-15 22:17:54,841 ERROR org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed taking snapshot { ss=ss_rich_pin_data_v1 table=rich_pin_data_v1 type=SKIPFLUSH } due to exception:No region directory found for region:{NAME => 'rich_pin_data_v1,,1389326617112.081c4e6d88c46ff9be61b231b8ed2aca.', STARTKEY => '', ENDKEY => '0030a5c15b50587297a8fa0bd585a12b', ENCODED => 081c4e6d88c46ff9be61b231b8ed2aca,} org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: No region directory found for region:{NAME => 'rich_pin_data_v1,,1389326617112.081c4e6d88c46ff9be61b231b8ed2aca.', STARTKEY => '', ENDKEY => '0030a5c15b50587297a8fa0bd585a12b', ENCODED => 081c4e6d88c46ff9be61b231b8ed2aca,} at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegion(MasterSnapshotVerifier.java:167) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegions(MasterSnapshotVerifier.java:152) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshot(MasterSnapshotVerifier.java:115) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(TakeSnapshotHandler.java:156) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2015-05-15 22:17:54,841 INFO org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Stop taking snapshot={ ss=ss_rich_pin_data_v1 table=rich_pin_data_v1 type=SKIPFLUSH } because: Failed to take snapshot '{ ss=ss_rich_pin_data_v1 table=rich_pin_data_v1 type=SKIPFLUSH }' due to exception Appreciate any help! -Neutronsharc --089e0111b92a75789e0516268804--