hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer <awittena...@linkedin.com>
Subject Re: monit? daemontools? jsvc? something else?
Date Wed, 05 Jan 2011 08:12:10 GMT

On Jan 4, 2011, at 10:29 PM, Otis Gospodnetic wrote:

> Ah, more manual work! :(
> You guys never have JVM die "just because"? I just had a DN's JVM die the 
> other day "just because and with no obvious cause".  Restarting it brought it 
> back to life, everything recovered smoothly.  Had some automated tool done the 
> restart for me, I'd be even happier.

	In the case of Hadoop, no.  There has usually been at least a core dump, message in syslog,
message in datanode log, etc, etc.   [You *do* have cores enabled, right?]

	We also have in place a monitor that checks the # of active nodes.  If it falls below a certain
percentage, then we get alerted and check on them en masse.   Worrying about one or two nodes
going down probably means you need more nodes. :D

View raw message