hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2182) Exceptions in DataXceiver#run can result in a zombie datanode
Date Thu, 21 Jul 2011 21:24:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069220#comment-13069220

Eli Collins commented on HDFS-2182:

Ah, yea, it's calling start(), not run(). DataXceiver has a reference do datanode, so it can
just set shouldRun to false in the case of a non-IOE. Much simpler. 

> Exceptions in DataXceiver#run can result in a zombie datanode 
> --------------------------------------------------------------
>                 Key: HDFS-2182
>                 URL: https://issues.apache.org/jira/browse/HDFS-2182
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>            Reporter: Eli Collins
>             Fix For: 0.23.0
>         Attachments: hdfs-2182-1.patch
> DataXceiver#run currently swallows all exceptions, it should instead plumb them up to
DataXceiverServer#run so it can decide whether the exception should be tolerated or the daemon
should exit. An IOE should be tolerated (because it's likely just an issue with a particular
thread, or an intermittent failure), as it is today, but eg j.l.Error should not. 
> This came up in the following bug I'm seeing on a test cluster: if there's eg a NoClassDefFoundError
thrown in DataXceiver#run (because the host jars were replaced out from underneath it, it
ran out of descriptors, etc.) we'll end up with a datanode that is alive but always fails
because it can't create any DataXceiver threads. In this case the datanode should shut itself
down rather than continue to run.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message