hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: monit? daemontools? jsvc? something else?
Date Thu, 06 Jan 2011 08:34:11 GMT
So Allen, what do you use to monitor those processes/nodes?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Allen Wittenauer <awittenauer@linkedin.com>
> To: "<common-user@hadoop.apache.org>" <common-user@hadoop.apache.org>
> Sent: Wed, January 5, 2011 11:54:22 PM
> Subject: Re: monit? daemontools? jsvc? something else?
> 
> 
> On Jan 5, 2011, at 7:57 PM, Lance Norskog wrote:
> 
> > Isn't this what  Ganglia is for?
> > 
> 
>     No.
> 
>      Ganglia does metrics, not monitoring.
> 
> 
> > On 1/5/11, Allen  Wittenauer <awittenauer@linkedin.com>  wrote:
> >> 
> >> On Jan 4, 2011, at 10:29 PM, Otis Gospodnetic  wrote:
> >> 
> >>> Ah, more manual work! :(
> >>> 
> >>> You guys never have JVM die "just because"? I just had a DN's  JVM die the
> >>> other day "just because and with no obvious  cause".  Restarting it 
brought
> >>> it
> >>> back to  life, everything recovered smoothly.  Had some automated tool 

>done
> >>> the
> >>> restart for me, I'd be even  happier.
> >> 
> >>     In the case of Hadoop,  no.  There has usually been at least a core 
>dump,
> >> message in  syslog, message in datanode log, etc, etc.   [You *do* have  
>cores
> >> enabled, right?]
> >> 
> >>      We also have in place a monitor that checks the # of active nodes.  If
 
>it
> >> falls below a certain percentage, then we get alerted and check  on them en
> >> masse.   Worrying about one or two nodes going down  probably means you 
need
> >> more nodes. :D
> >> 
> >> 
> > 
> > 
> > -- 
> > Lance Norskog
> > goksron@gmail.com
> 
> 

Mime
View raw message