Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 71447C65C for ; Wed, 6 Jun 2012 20:23:24 +0000 (UTC) Received: (qmail 78207 invoked by uid 500); 6 Jun 2012 20:23:24 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 78054 invoked by uid 500); 6 Jun 2012 20:23:24 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 77786 invoked by uid 99); 6 Jun 2012 20:23:23 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jun 2012 20:23:23 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 6B846142860 for ; Wed, 6 Jun 2012 20:23:23 +0000 (UTC) Date: Wed, 6 Jun 2012 20:23:23 +0000 (UTC) From: "Zhihong Ted Yu (JIRA)" To: issues@hbase.apache.org Message-ID: <1025803722.44720.1339014203442.JavaMail.jiratomcat@issues-vm> In-Reply-To: <1216440114.29283.1338599303196.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (HBASE-6153) RS aborted due to rename problem (maybe a race) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290367#comment-13290367 ] Zhihong Ted Yu commented on HBASE-6153: --------------------------------------- ip-10-68-7-146.ec2.internal went down: {code} 2012-05-31 18:34:42,541 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server ip-10-68-7-146.ec2.internal,60020,1338343120038: Replay of HLog required. Forcing server shutdown {code} The above lagged the other log snippets by 3 hours. More log around 05-31 15:11 from ip-10-68-7-146.ec2.internal should help clarify. > RS aborted due to rename problem (maybe a race) > ----------------------------------------------- > > Key: HBASE-6153 > URL: https://issues.apache.org/jira/browse/HBASE-6153 > Project: HBase > Issue Type: Bug > Affects Versions: 0.92.0 > Reporter: Devaraj Das > Assignee: Devaraj Das > > I had a RS crash with the following: > 2012-05-31 18:34:42,534 DEBUG org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35 to hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35 > 2012-05-31 18:34:42,536 WARN org.apache.hadoop.hbase.regionserver.Store: Unable to rename hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35 to hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35 > 2012-05-31 18:34:42,541 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server ip-10-68-7-146.ec2.internal,60020,1338343120038: Replay of HLog required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: TestLoadAndVerify_1338488017181,\x15\xD9\x01\x00\x00\x00\x00\x00/000087_0,1338491364569.8974506aa04c5a04e5cc23c11de0039d. > at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1288) > at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1172) > at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1114) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:400) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:374) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.FileNotFoundException: File does not exist: /apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35 > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1901) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.(DFSClient.java:1892) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:636) > at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) > at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:387) > at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.(StoreFile.java:1008) > at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:470) > at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:548) > at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:595) > On the NameNode logs: > 2012-05-31 18:34:42,588 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: failed to rename /apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35 to /apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35 because destination's parent does not exist > I haven't looked deeply yet but I guess it is a race of some sort. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira