hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-890) Have a way of creating datanodes that throws an meaningful exception on failure
Date Sat, 09 Jan 2010 14:52:54 GMT

    [ https://issues.apache.org/jira/browse/HDFS-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798364#action_12798364
] 

Steve Loughran commented on HDFS-890:
-------------------------------------

{{DataNode.makeInstance()}} is used in

# {{MiniDFSCluster.startDataNodes()}}; this code mistakenly assumes it can never get a null
reference back; it should move to any new method call. Similary
# {{DataNode.createDataNode()}} which again is used in {{MiniDFSCluster.restartDataNode()}}
which also assumes it never sees a null
# {{DataNode.main()}} which catches and logs any exception, and looks for a null value by
skipping daemon startup and exiting with a 0 exit code.
# {{TestHDFSServerPorts}}
# Mapreduce's {{TestMRServerPorts}} tests, which also assume that they don't see null back

Most of this code expects to see exceptions on failure, so will handle a stricter startup
operation with ease. The intersting one is  {{DataNode.main()}}, which, if it caught the exception,
would now exit with a -1 code, rather than a 0 exit code. This would be a change in behaviour
which would be visible to shell scripts: it would now be an error to attempt to start a datanode
none of whose data dirs were usable.

I would argue this is a feature, such an exit code would be beneficial to people wondering
why their datanodes weren't coming up and weren't being reported. It is the unix way, and
is much easier to test for. However, it would be a change in behaviour. 



> Have a way of creating datanodes that throws an meaningful exception on failure
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-890
>                 URL: https://issues.apache.org/jira/browse/HDFS-890
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>    Affects Versions: 0.22.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> In HDFS-884, I proposed printing out more details on why things fail. This is hard to
test, because you need to subvert the log4j back end that your test harness will itself have
grabbed.
> There is a way to make it testable, and to make it easier for anyone creating datanodes
in process to recognise and handle failure: have a static CreateDatanode() method that throws
exceptions when directories cannot be created or other problems arise. Right now some problems
trigger failure, others just return a null reference saying "something went wrong but we won't
tell you what -hope you know where the logs go". 
> The HDFS-884 patch would be replaced by something that threw an exception; the existing
methods would catch this, log it and return null. The new method would pass it straight up.

> This is easier to test, better for others. If people think this is good, I will code
it up and mark the old API as deprecated. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message