Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 32534 invoked from network); 9 Dec 2009 18:22:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Dec 2009 18:22:41 -0000 Received: (qmail 30292 invoked by uid 500); 9 Dec 2009 18:22:41 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 30261 invoked by uid 500); 9 Dec 2009 18:22:40 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 30237 invoked by uid 99); 9 Dec 2009 18:22:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Dec 2009 18:22:40 +0000 X-ASF-Spam-Status: No, hits=-10.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Dec 2009 18:22:38 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 4576D234C04C for ; Wed, 9 Dec 2009 10:22:18 -0800 (PST) Message-ID: <752874469.1260382938263.JavaMail.jira@brutus> Date: Wed, 9 Dec 2009 18:22:18 +0000 (UTC) From: "Hairong Kuang (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-101) DFS write pipeline : DFSClient sometimes does not detect second datanode failure MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788202#action_12788202 ] Hairong Kuang commented on HDFS-101: ------------------------------------ When I thought more about it, it may make sense to let the client decide how to handle the case when DNi has communication problem with DNi+1 because the client is the one who decides the pipeline recovery policy. DNi itself has no problem receiving packets and storing it to the disk except that it can not talk to DNi+1. I think in this case it is OK that we let DNi continue to run. > DFS write pipeline : DFSClient sometimes does not detect second datanode failure > --------------------------------------------------------------------------------- > > Key: HDFS-101 > URL: https://issues.apache.org/jira/browse/HDFS-101 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 0.20.1 > Reporter: Raghu Angadi > Assignee: Hairong Kuang > Priority: Blocker > Fix For: 0.21.0 > > > When the first datanode's write to second datanode fails or times out DFSClient ends up marking first datanode as the bad one and removes it from the pipeline. Similar problem exists on DataNode as well and it is fixed in HADOOP-3339. From HADOOP-3339 : > "The main issue is that BlockReceiver thread (and DataStreamer in the case of DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty coarse control. We don't know what state the responder is in and interrupting has different effects depending on responder state. To fix this properly we need to redesign how we handle these interactions." > When the first datanode closes its socket from DFSClient, DFSClient should properly read all the data left in the socket.. Also, DataNode's closing of the socket should not result in a TCP reset, otherwise I think DFSClient will not be able to read from the socket. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.