hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: monit? daemontools? jsvc? something else?
Date Thu, 06 Jan 2011 08:39:14 GMT

----- Original Message ----
> From: Allen Wittenauer <awittenauer@linkedin.com>

> > You guys never have JVM die "just because"? I  just had a DN's JVM die the 
> > other day "just because and with no obvious  cause".  Restarting it brought 
> > back to life, everything  recovered smoothly.  Had some automated tool done 
> > restart for  me, I'd be even happier.
>     In the case of Hadoop,  no.  There has usually been at least a core dump, 
>message in syslog,  message in datanode log, etc, etc.   [You *do* have cores 
>enabled,  right?]

Hm, "cores enabled".... what do you mean by that?  Are you referring to JVM heap 
dump -XX JVM argument (-XX:+HeapDumpOnOutOfMemoryError)?  If not, I'm all 

>     We also have in place a monitor that checks  the # of active nodes.  If it 
>falls below a certain percentage, then we get  alerted and check on them en 
>masse.   Worrying about one or two nodes going  down probably means you need 
>more nodes. :D

That's probably right. :)
So what do you use for monitoring the # of active nodes?

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/

View raw message