hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Meng Mao" <meng...@gmail.com>
Subject Re: best command line way to check up/down status of HDFS?
Date Wed, 02 Jul 2008 14:55:59 GMT
I realy like method 3.

I am doing sceenscraping of the jobtracker JSP page, but I thought that was
only a partial solution, since the format of the page could change at any
moment, and because it's potentially much more computationally intensive,
depending on how much information I want to extract. One thing I thought of
would be to create a custom 'naked' JSP that has very little formatting.

On Wed, Jul 2, 2008 at 6:19 AM, Steve Loughran <stevel@apache.org> wrote:

> Meng Mao wrote:
>
>> For a Nagios script I'm writing, I'd like a command-line method that
>> checks
>> if HDFS is up and running.
>> Is there a better way than to attempt a hadoop dfs command and check the
>> error code?
>>
>
> 1. There is JMX support built in to Hadoop. If you can bring up Hadoop
> running a JMX agent that is compatible with Nagios, you can keep a close eye
> on the internals.
>
> 2.. I'm making some lifecycle changes to Hadoop; if/when accepted every
> service (name,data, job,...) will have an internal ping() operation to check
> their health -this can be checked in-process only. I'm also adding the
> smartfrog support to do that in-processing pinging, fallback etc; I dont
> know how nagios would work there, but JMX support for these ops should also
> be possible.
>
> 3. When a datanode comes up it starts jetty on a specific port -you can do
> a GET against that jetty instance to see if it is responding. This is a good
> test as it really does verify that the service is live and responding.
> Indeed, that is the official definition of "liveness", at least according to
> Lamport.
>  * review the code to make sure it turns caching off, or you can be burned
> probing for health long hall, seeing the happy page and thinking all is
> well. I forgot to do that in happyaxis.jsp, which is why axis 1.x health
> checks dont work long-haul.
>  * I could imagine improving those pages with better ones, like something
> that checks that the available freespace is within a certain range, and
> returns an error code if there is less, e.g.
>  http://datanode7:5000/checkDiskSpace?mingb=1500
> would test for a min disk space of 1500GB.
>
> There are also web pages for job trackers & the like; better for remote
> health checking than jps checks. JPS (and killall) is better for fallback
> when the things stop responding, but  not adequate for liveness checks.
>
>


-- 
hustlin, hustlin, everyday I'm hustlin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message