hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sreekanth Ramakrishnan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5478) Provide a node health check script and run it periodically to check the node health status
Date Mon, 15 Jun 2009 09:27:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719484#action_12719484
] 

Sreekanth Ramakrishnan commented on HADOOP-5478:
------------------------------------------------

Adding a little more to discussion, following is approach which I am taking to generate a
new patch:

* Introduce a new health monitor service which is spawned off by task tracker when it starts.
* The service periodically reports the status of the node to the task tracker.
* The protocol is modeled out of {{TaskUmbricalProtocol}}
* The service would receive the host address and port as the command line arguments while
starting up.
* The service then periodically sends the status update to task tracker based on the host
and port specified to the service. 
* When TaskTracker is shutdown, the {{NodeHealthChecker}} would not be able to contact {{TaskTracker}}
and would shut itself down.  The reason why this is done, is because task tracker's {{shutdown()}}
or {{close()}} is not called when we do a {{stop-mapred.sh}} or task tracker can be killed
with direct {{kill -9 ttpid}} in this case the TT might not inform all the clients which contact
it to report services.


> Provide a node health check script and run it periodically to check the node health status
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5478
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5478
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Aroop Maliakkal
>            Assignee: Vinod K V
>         Attachments: hadoop-5478-1.patch, hadoop-5478-2.patch
>
>
> Hadoop must have some mechanism to find the health status of a node . It should run the
health check script periodically and if there is any errors, it should black list the node.
This will be really helpful when we run static mapred clusters. Else we may have to run some
scripts/daemons periodically to find the node status and take it offline manually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message