Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 38D4799C9 for ; Fri, 13 Jul 2012 17:57:38 +0000 (UTC) Received: (qmail 66620 invoked by uid 500); 13 Jul 2012 17:57:36 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 66534 invoked by uid 500); 13 Jul 2012 17:57:36 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 66454 invoked by uid 99); 13 Jul 2012 17:57:36 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Jul 2012 17:57:36 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 8F8C3142856 for ; Fri, 13 Jul 2012 17:57:36 +0000 (UTC) Date: Fri, 13 Jul 2012 17:57:36 +0000 (UTC) From: "Mayank Bansal (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1481043401.49344.1342202256590.JavaMail.jiratomcat@issues-vm> In-Reply-To: <1563621462.45303.1342139675455.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Updated] (HDFS-3655) datenode recoverRbw could hang sometime MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3655?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated HDFS-3655: -------------------------------- Affects Version/s: 2.0.1-alpha 0.22.0 1.0.3 =20 > datenode recoverRbw could hang sometime > --------------------------------------- > > Key: HDFS-3655 > URL: https://issues.apache.org/jira/browse/HDFS-3655 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Affects Versions: 0.22.0, 1.0.3, 2.0.1-alpha > Reporter: Ming Ma > Fix For: 0.22.1 > > Attachments: HDFS-3655-0.22.patch > > > This bug seems to apply to 0.22 and hadoop 2.0. I will upload the initial= fix done by my colleague Xiaobo Peng shortly ( there is some logistics iss= ue being worked on so that he can upload patch himself later ). > recoverRbw try to kill the old writer thread, but it took the lock (FSDat= aset monitor object) which the old writer thread is waiting on ( for exampl= e the call to data.getTmpInputStreams ). > "DataXceiver for client /10.110.3.43:40193 [Receiving block blk_-30375423= 85914640638_57111747 client=3DDFSClient_attempt_201206021424_0001_m_000401_= 0]" daemon prio=3D10 tid=3D0x00007facf8111800 nid=3D0x6b64 in Object.wait()= [0x00007facd1ddb000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1186) > =E2=96=A0locked <0x00000007856c1200> (a org.apache.hadoop.util.Daemon) > at java.lang.Thread.join(Thread.java:1239) > at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(Re= plicaInPipeline.java:158) > at org.apache.hadoop.hdfs.server.datanode.FSDataset.recoverRbw(FSDataset.= java:1347) > =E2=96=A0locked <0x00000007838398c0> (a org.apache.hadoop.hdfs.server.dat= anode.FSDataset) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockRecei= ver.java:119) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlockInterna= l(DataXceiver.java:391) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXc= eiver.java:327) > at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteB= lock(DataTransferProtocol.java:405) > at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processO= p(DataTransferProtocol.java:344) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav= a:183) > at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp= a For more information on JIRA, see: http://www.atlassian.com/software/jira