hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer <awittena...@linkedin.com>
Subject Re: monit? daemontools? jsvc? something else?
Date Thu, 06 Jan 2011 04:54:22 GMT

On Jan 5, 2011, at 7:57 PM, Lance Norskog wrote:

> Isn't this what Ganglia is for?
> 

	No.

	Ganglia does metrics, not monitoring.


> On 1/5/11, Allen Wittenauer <awittenauer@linkedin.com> wrote:
>> 
>> On Jan 4, 2011, at 10:29 PM, Otis Gospodnetic wrote:
>> 
>>> Ah, more manual work! :(
>>> 
>>> You guys never have JVM die "just because"? I just had a DN's JVM die the
>>> other day "just because and with no obvious cause".  Restarting it brought
>>> it
>>> back to life, everything recovered smoothly.  Had some automated tool done
>>> the
>>> restart for me, I'd be even happier.
>> 
>> 	In the case of Hadoop, no.  There has usually been at least a core dump,
>> message in syslog, message in datanode log, etc, etc.   [You *do* have cores
>> enabled, right?]
>> 
>> 	We also have in place a monitor that checks the # of active nodes.  If it
>> falls below a certain percentage, then we get alerted and check on them en
>> masse.   Worrying about one or two nodes going down probably means you need
>> more nodes. :D
>> 
>> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com


Mime
View raw message