Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 83488 invoked from network); 28 Feb 2009 17:36:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Feb 2009 17:36:37 -0000 Received: (qmail 15032 invoked by uid 500); 28 Feb 2009 17:36:36 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 15013 invoked by uid 500); 28 Feb 2009 17:36:36 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 15000 invoked by uid 99); 28 Feb 2009 17:36:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Feb 2009 09:36:35 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Feb 2009 17:36:34 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 3E4FE234C4A8 for ; Sat, 28 Feb 2009 09:36:13 -0800 (PST) Message-ID: <1117567651.1235842573254.JavaMail.jira@brutus> Date: Sat, 28 Feb 2009 09:36:13 -0800 (PST) From: "Ben Maurer (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Created: (HBASE-1228) Hang after crash MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hang after crash ---------------- Key: HBASE-1228 URL: https://issues.apache.org/jira/browse/HBASE-1228 Project: Hadoop HBase Issue Type: Bug Affects Versions: 0.19.0 Reporter: Ben Maurer Fix For: 0.19.1 After an exception that forced an HRegionServer to shut down, I'm seeing it hang in the following method for at least a few minutes: "regionserver/0:0:0:0:0:0:0:0:60020" prio=10 tid=0x00002aaaf41a9000 nid=0x10f6 in Object.wait() [0x00000000422dd000..0x00000000422ddb10] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3025) - locked <0x00002aaad8fa2410> (a java.util.LinkedList) - locked <0x00002aaad8fa2078> (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3105) - locked <0x00002aaad8fa2078> (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3054) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:959) - locked <0x00002aaad8fa1f10> (a org.apache.hadoop.io.SequenceFile$Writer) at org.apache.hadoop.hbase.regionserver.HLog.close(HLog.java:431) - locked <0x00002aaab378b290> (a java.lang.Integer) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:498) at java.lang.Thread.run(Thread.java:619) I believe the file system may have been closed and thus there is trouble flushing the HLog. The HLog should be pro actively closed before shutdown begins, to maximize the chances of it surviving the crash. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.