hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-3893) Add hadoop health check/diagnostics to run from command line, JSP pages, other tools
Date Mon, 04 Aug 2008 10:05:44 GMT
Add hadoop health check/diagnostics to run from command line, JSP pages, other tools

                 Key: HADOOP-3893
                 URL: https://issues.apache.org/jira/browse/HADOOP-3893
             Project: Hadoop Core
          Issue Type: New Feature
          Components: dfs, mapred
    Affects Versions: 0.19.0
            Reporter: Steve Loughran
            Priority: Minor

If the lifecycle ping() is for short-duration "are we still alive" checks, Hadoop still needs
something bigger to check the overall system health,.This would be for end users, but also
for automated cluster deployment, a complete validation of the cluster, 

It could be a command line tool, and something that runs on different nodes, checked via IPC
or JSP. the idea would be to do thorough checks with good diagnostics.  Oh, and they should
be executable through JUnit too.

For example
 -if running on windows, check that cygwin is on the path, fail with a pointer to a wiki issue
if not
 -datanodes should check that it can create locks on the filesystem, create files, timestamps
are (roughly) aligned with local time.
 -namenodes should try and create files/locks in the filesystem
 -task tracker should try and exec() something
 -run through the classpath and look for problems; duplicate JARs, unsupported java, xerces
versions, etc.

* The number of tests should be extensible -rather than one single class with all the tests,
there'd be something separate for name, task, data, job tracker nodes
* They can't be in the nodes themselves, as they should be executable even if the nodes don't
come up. 
* output could be in human readable text or html, and a form that could be processed through
hadoop itself in future
* these tests could have side effects, such as actually trying to submit work to a cluster

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message