hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "ManagementTools" by JeffHammerbacher
Date Mon, 16 Nov 2009 11:58:48 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "ManagementTools" page has been changed by JeffHammerbacher.
http://wiki.apache.org/hadoop/ManagementTools?action=diff&rev1=1&rev2=2

--------------------------------------------------

  On a big cluster you don't want to have your phone page you every time a node goes down.
The only invididual machines you care about are the NameNode, the Secondary NameNode and the
JobTracker. Worker nodes come and go. What matters there is the total cluster availability,
the availability of the live data, and whether the rate of node failure is too high to get
useful work done.
  
  The other thing to be aware of is that the troublesome workers are not the dead ones; they
are easy to detect; they don't report for duty. The troublesome ones are the nodes where the
disk is playing up so badly that the system is really slow, so their work takes too long.
Or their RAM isn't working properly so only 1GB of it appears there, and every job fails with
memory problems. Or some strange motherboard/CPU/OS combination causes a machine to find race
conditions in code where none surface elsewhere. That's what you need to identify: the troublemakers.
Once found, you can set up Hadoop to blacklist nodes.
+ 
+ For detailed information, see Ed Capriolo's [[presentation|http://www.cloudera.com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/]]
from Hadoop World NYC 2009.
  
  == Nagios ==
  
@@ -16, +18 @@

  
  == JMX Support ==
  
- Hadoop has JMX support, so with the right JMX bridge for your chosen management tools, it
should be possible to keep an eye on Hadoop from your favorite management console.
+ Hadoop has JMX support, so with the right JMX bridge for your chosen management tools, it
should be possible to keep an eye on Hadoop from your favorite management console. For more
on JMX and Hadoop, see Philip Zeyliger's [[blog post|http://www.cloudera.com/blog/2009/03/12/hadoop-metrics/]]
on the Cloudera blog.
  
  === JMX Bridging to Zenoss ===
  
@@ -36, +38 @@

  While monitoring individual nodes is useful in a pro-active sense, the
  bigger your grid gets, the less important it becomes"
  
+ == Cacti ==
+ 
+ See Ed Capriolo's [[blog post|http://www.cloudera.com/blog/2009/07/07/hadoop-graphing-with-cacti/]]
about how About.com uses Cacti for monitoring their Hadoop cluster.
+ 

Mime
View raw message