hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-2182) Exceptions in DataXceiver#run can result in a zombie datanode
Date Thu, 21 Jul 2011 20:14:58 GMT

     [ https://issues.apache.org/jira/browse/HDFS-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eli Collins updated HDFS-2182:
------------------------------

    Description: 
DataXceiver#run currently swallows all exceptions, it should instead plumb them up to DataXceiverServer#run
so it can decide whether the exception should be tolerated or the daemon should exit. An IOE
should be tolerated (because it's likely just an issue with a particular thread, or an intermittent
failure), as it is today, but eg j.l.Error should not. 

This came up in the following bug I'm seeing on a test cluster: if there's eg a NoClassDefFoundError
thrown in DataXceiver#run (because the host jars were replaced out from underneath it, it
ran out of descriptors, etc.) we'll end up with a datanode that is alive but always fails
because it can't create any DataXceiver threads. In this case the datanode should shut itself
down rather than continue to run.

  was:
DataXceiver#run currently swallows all exceptions, it should instead plumb them up to DataXceiverServer#run
so it can decide whether the exception should be tolerated or the daemon should exit. An IOE
should be tolerated (because it's likely just an issue with a particular thread, or an intermittent
failure), as it is today, but eg j.l.Error should be not. 

This came up in the following bug I'm seeing on a test cluster: if there's eg a NoClassDefFoundError
thrown in DataXceiver#run (because the host jars were replaced out from underneath it, it
ran out of descriptors, etc.) we'll end up with a datanode that is alive but always fails
because it can't create any DataXceiver threads. In this case the datanode should shut itself
down rather than continue to run.


> Exceptions in DataXceiver#run can result in a zombie datanode 
> --------------------------------------------------------------
>
>                 Key: HDFS-2182
>                 URL: https://issues.apache.org/jira/browse/HDFS-2182
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>            Reporter: Eli Collins
>             Fix For: 0.23.0
>
>
> DataXceiver#run currently swallows all exceptions, it should instead plumb them up to
DataXceiverServer#run so it can decide whether the exception should be tolerated or the daemon
should exit. An IOE should be tolerated (because it's likely just an issue with a particular
thread, or an intermittent failure), as it is today, but eg j.l.Error should not. 
> This came up in the following bug I'm seeing on a test cluster: if there's eg a NoClassDefFoundError
thrown in DataXceiver#run (because the host jars were replaced out from underneath it, it
ran out of descriptors, etc.) we'll end up with a datanode that is alive but always fails
because it can't create any DataXceiver threads. In this case the datanode should shut itself
down rather than continue to run.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message