hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Hadoop Cluster Administration Tools?
Date Wed, 30 Apr 2008 09:52:06 GMT
Bradford Stephens wrote:
> Greetings,
> 
> I'm compiling a list of (free/OSS) tools commonly used to administer Linux
> clusters to help my company transition away from Win solutions.
> 
> I use Ganglia for monitoring the general stats of the machines (Although I
> didn't get the hadoop metrics to work). I also use ntop to check out network
> performance (especially with Nutch).

Once you move to larger farms, you have to move away from running stuff 
by hand to even more automation. You dont really want to work with 
individual machines, just have some central configuration that you 
adjust and let it propagate out. The management tools can detect 
machines refusing to play and hadoop should stop sticking data and work 
on them.

-LinuxCOE is how we build images; InstaLinux: http://www.instalinux.com/ 
is a public instance of this. It can create .iso kickstart images that 
pulls RPM or deb packages down off local/remote servers

-Configuration Management becomes your next problem. A lot of the CM 
tools let you declare the state of the machines, they then work to keep 
the machines in that state, detect when they are out of it, and push 
your machines back in to the desired state, or, failing that, start 
paging you. The line between CM and monitoring tools gets kind of blurred.

There are a few open source tools that can do this
http://en.wikipedia.org/wiki/Comparison_of_open_source_configuration_management_software

I'd point you at
  -Smartfrog (personal bias there,  as I work on it)
  -puppet
  -bcfg2
  -LCFG
  -Quattor

Then I'd go search the LISA archives to see what other people are up to; 
there are some good papers there. Like this one, "On Designing and 
Deploying Internet-Scale Services":
http://research.microsoft.com/~jamesrh/TalksAndPapers/JamesRH_Lisa.pdf

-steve

-- 
Steve Loughran                  http://www.1060.org/blogxter/publish/5
Author: Ant in Action           http://antbook.org/

Mime
View raw message