hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: nagios to monitor hadoop datanodes!
Date Wed, 08 Oct 2008 15:42:57 GMT
Edward Capriolo wrote:
> The simple way would be use use nrpe and check_proc. I have never
> tested, but a command like 'ps -ef | grep java  | grep NameNode' would
> be a fairly decent check. That is not very robust but it should let
> you know if the process is alive.
> 
> You could also monitor the web interfaces associated with the
> different servers remotely.
> 
> check_tcp!hadoop1:56070
> 
> Both the methods I suggested are quick hacks. I am going to
> investigate the JMX options as well  and work them into cacti

We're developing liveness and pings under a couple of JIRA issues; 
nothing will be released before 0.20

https://issues.apache.org/jira/browse/HADOOP-3628
https://issues.apache.org/jira/browse/HADOOP-3969

I don't consider hitting the web page a quick hack; for HADOOP-3969 I'd 
quite like to have the public liveness test a page you can GET or HEAD, 
as that way it becomes trivial for your existing web page health 
checking code to pull in all the hadoop services. The best bit: when it 
fails, the ops team can point their browser at the same URL and see what 
is up. And if you are a standalone developer -you are the ops team!

-steve

-- 
Steve Loughran                  http://www.1060.org/blogxter/publish/5
Author: Ant in Action           http://antbook.org/

Mime
View raw message