Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Message-ID: <1971322274.1260210678109.JavaMail.jira@brutus>
Date: Mon, 7 Dec 2009 18:31:18 +0000 (UTC)
From: "Todd Lipcon (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Subject: [jira] Updated: (HDFS-101) DFS write pipeline : DFSClient sometimes
 does not detect second datanode failure
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HDFS-101:
-----------------------------

    Affects Version/s: 0.20.1

That plan sounds entirely reasonable. I'm adding 0.20.1 as an "affects version" since I can reproduce this in that version. Feel free to let me know if there's anything I can do to help.

> DFS write pipeline : DFSClient sometimes does not detect second datanode failure 
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-101
>                 URL: https://issues.apache.org/jira/browse/HDFS-101
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Raghu Angadi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.21.0
>
>
> When the first datanode's write to second datanode fails or times out DFSClient ends up marking first datanode as the bad one and removes it from the pipeline. Similar problem exists on DataNode as well and it is fixed in HADOOP-3339. From HADOOP-3339 : 
> "The main issue is that BlockReceiver thread (and DataStreamer in the case of DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty coarse control. We don't know what state the responder is in and interrupting has different effects depending on responder state. To fix this properly we need to redesign how we handle these interactions."
> When the first datanode closes its socket from DFSClient, DFSClient should properly read all the data left in the socket.. Also, DataNode's closing of the socket should not result in a TCP reset, otherwise I think DFSClient will not be able to read from the socket.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.