hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-915) Hung DN stalls write pipeline for far longer than its timeout
Date Wed, 14 Mar 2012 20:12:37 GMT

     [ https://issues.apache.org/jira/browse/HDFS-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon updated HDFS-915:

    Attachment: hdfs-915-0.20.txt

Here's a patch that we've tested for a long time in an 0.20-based build. We need to re-investigate
this to see if it's still relevant for branch-1 and trunk, as well as add a test case.
> Hung DN stalls write pipeline for far longer than its timeout
> -------------------------------------------------------------
>                 Key: HDFS-915
>                 URL: https://issues.apache.org/jira/browse/HDFS-915
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 0.20.1
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-915-0.20.txt, local-dn.log
> After running kill -STOP on the datanode in the middle of a write pipeline, the client
takes far longer to recover than it should. The ResponseProcessor times out in the correct
interval, but doesn't interrupt the DataStreamer, which appears to not be subject to the same
timeout. The client only recovers once the OS actually declares the TCP stream dead, which
can take a very long time.
> I've experienced this on 0.20.1, haven't tried it yet on trunk or 0.21.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message