hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5478) Provide a node health check script and run it periodically to check the node health status
Date Tue, 12 May 2009 18:01:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708535#action_12708535

Allen Wittenauer commented on HADOOP-5478:

Torque specifically looks for a line that begins with ERROR on stdout, but reports the whole
line in the node status.  So running pbsnodes will show the full status message and provides
an easy way to audit all nodes on a giving torque server.  We really need the equivalent of
dfsadmin -report for the JobTracker to provide this same level of output.

Additionally, torque ignores the exit status. In the vast majority of cases, the node is going
to be good.  So the approach they take is that if a script has a syntax error (and would therefore
have a 'fail' as an exit code), the node should be considered good anyway.

> Provide a node health check script and run it periodically to check the node health status
> ------------------------------------------------------------------------------------------
>                 Key: HADOOP-5478
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5478
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Aroop Maliakkal
>            Assignee: Vinod K V
> Hadoop must have some mechanism to find the health status of a node . It should run the
health check script periodically and if there is any errors, it should black list the node.
This will be really helpful when we run static mapred clusters. Else we may have to run some
scripts/daemons periodically to find the node status and take it offline manually.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message