hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3464) [HOD] HOD can improve error messages by reporting failures on compute nodes back to hod client
Date Mon, 02 Jun 2008 11:21:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601595#action_12601595

Hemanth Yamijala commented on HADOOP-3464:

Few comments:

- When ringmaster fails, we are printing out the errors as a array of strings in a single
line. For better readability, they should be printed one per line.
- When ringmaster fails due to problems with hadoop pkgs, the error message is not helpful.
It says something like int cannot be NoneType or some such. This should be improved.
- We use ringmaster.addMasterParams to report errors from the hodrings. This is confusing.
We should define a new API, something like setHodRingError and report errors back using that
- The PID of the hodring process is part of the 'host' reporting the error. It appears this
is important, as removing the PID caused the functionality to break. However, when we print
these messages to the client, the name is printed as hostname_pid, which does not make too
much sense. So, we can try and see if the pid part can be avoided.
- At few places we are constructing an XML-RPC client object. If already constructed, can
be reuse this ?
- When hodrings fail due to a config error, we don't report this back. This is because error
reporting happens only if the getCommand method has been called by a hodring. In case of config
errors, getCommand is not called and so these errors are not caught. The requirement is that
we should be able to report Master command failures - that is if an internal HDFS daemon fails,
or MapRed daemon fails. If there are n nodes in the ring, atleast 2 (in case of internal)
or 1 hodring should come up successfully for the masters. If the number of reported failures
exceeds this, we can report a failure to the service registry client.
- When a hadoop daemon fails, the message simply says failed to launch hadoop command. Typically
the daemon.err file has more useful information. If possible, this should be fetched and displayed
to the client.

Will try and submit a patch addressing these points.

> [HOD] HOD can improve error messages by reporting failures on compute nodes back to hod
> ----------------------------------------------------------------------------------------------
>                 Key: HADOOP-3464
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3464
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hod
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>             Fix For: 0.18.0
>         Attachments: HADOOP-3464, HADOOP-3464.1
> This issue addresses error messages w.r.t failures on compute nodes, while HADOOP-3151
addresses error messages in hod client.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message