hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sreekanth Ramakrishnan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-5478) Provide a node health check script and run it periodically to check the node health status
Date Thu, 18 Jun 2009 10:03:07 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Sreekanth Ramakrishnan updated HADOOP-5478:

    Attachment: hadoop-5478-6.patch

Attaching latest patch incorporating Hemanths comments:

Major Changes:

* Now {{NodeHealthChecker}} reports health status on same port which tasks use to report status.
* {{NodeHealthChecker}} can now be run in seperate VM or as thread, thread based start up
can be used in {{MiniMRCluster}}
* Changed testcase to also test conditions with blacklisting across jobs and also verifying
cluster capacity after we blacklist tracker.
* Also added new configuration entry which takes node health scripts arguments.

> Provide a node health check script and run it periodically to check the node health status
> ------------------------------------------------------------------------------------------
>                 Key: HADOOP-5478
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5478
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Aroop Maliakkal
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: active.png, blacklist1.png, cluster_setup.pdf, hadoop-5478-1.patch,
hadoop-5478-2.patch, hadoop-5478-3.patch, hadoop-5478-4.patch, hadoop-5478-5.patch, hadoop-5478-6.patch
> Hadoop must have some mechanism to find the health status of a node . It should run the
health check script periodically and if there is any errors, it should black list the node.
This will be really helpful when we run static mapred clusters. Else we may have to run some
scripts/daemons periodically to find the node status and take it offline manually.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message