Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 42003 invoked from network); 22 Sep 2009 21:44:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Sep 2009 21:44:40 -0000 Received: (qmail 29286 invoked by uid 500); 22 Sep 2009 21:44:40 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 29242 invoked by uid 500); 22 Sep 2009 21:44:40 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 29232 invoked by uid 99); 22 Sep 2009 21:44:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Sep 2009 21:44:40 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Sep 2009 21:44:37 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 538CD234C495 for ; Tue, 22 Sep 2009 14:44:16 -0700 (PDT) Message-ID: <435190932.1253655856341.JavaMail.jira@brutus> Date: Tue, 22 Sep 2009 14:44:16 -0700 (PDT) From: "Kan Zhang (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-637) DataNode sends an Success ack when block write fails In-Reply-To: <1421257280.1253512818716.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758456#action_12758456 ] Kan Zhang commented on HDFS-637: -------------------------------- I like Raghu's suggestion, which is to simply set a boolean flag in the catch clause (the clause has to be moved to the outset) and add a checking of the flag to the if (Thread.interrupted()) {} block. That way the exit logic is easier to understand (with the comments there). Having the exit logic in one place also facilitates adding further checking on the reason of the interrupt (i.e., local error or downstream error). > DataNode sends an Success ack when block write fails > ---------------------------------------------------- > > Key: HDFS-637 > URL: https://issues.apache.org/jira/browse/HDFS-637 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Priority: Blocker > Fix For: 0.21.0 > > Attachments: interrupted.patch > > > When I work on HDFS-624, I saw TestFileAppend3#TC7 occasionally fails. After lots of debug, I saw that the client unexpected received a response of "-2 SUCCESS SUCCESS" in which -2 is the packet sequence number. This happened in a pipeline of 2 datanodes and one of them failed. It turned out when block receiver fails, it shuts down itself and interrupts the packet responder but responder tries to handle interruption with the condition "Thread.isInterrupted()" but unfortunately a thread's interrupt status is not set in some cases as explained in the Thread#interrupt javadoc: > If this thread is blocked in an invocation of the wait(), wait(long), or wait(long, int) methods of the Object class, or of the join(), join(long), join(long, int), sleep(long), or sleep(long, int), methods of this class, then its interrupt status will be cleared and it will receive an InterruptedException. > So datanode does not detect the interruption and continues as if no error occurs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.