hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer <awittena...@linkedin.com>
Subject Re: Cluster Machines
Date Tue, 03 Nov 2009 22:49:48 GMT
On 11/3/09 2:29 PM, "John Martyniak" <john@beforedawnsolutions.com> wrote:
> Would you mind telling me the kinds of configured servers that you are
> running?

Our 'real' grid is comprised of shiny Sun 4275s.  But our 'non-real' grid is
composed of two types of machines with radically different disk
configurations (size *and* number!).  Keeping the two different types of
machines is a bit of a pain.  We're going to be replacing that grid in the
next month or so with a homogeneous config and giving those machines back to
wherever they came from.

> Also have you had any experience running namenodes or zookeeper on a
> VM?  I have a couple of much larger boxes that are being used to run
> VMs, and was thinking of putting both of those on dedicated VM
> instances, in order to build redundancy/fault tolerance.

I haven't but I'll admit I've been thinking about it. Especially for the
JobTracker since it seems to like to fall over if you blow on it. [Of
course, I also have higher expectations of my software stack, much to the
chagrin of the developers around here. :) ]

In the case of Solaris, we'd use a zone which makes the IO hit negligible.
But a full blown instance of Xen or VMware or whatever is a bit scarier.
I'm concerned about the typically slow IO that one can encounter when VM'ing
a service.

> Regarding the dual drive, I wasn't thinking of doing that for
> upgradeability, it was more for spindle separation, 1 drive would be
> for Hadoop/HDFS etc functions and the other would be for OS
> operations, so there would be no contention between the drives, just
> on the bus.

This is another spot where "know your workload" comes in.  Unless you are
doing streaming or taxing memory by paging, I suspect your OS disk is going
to be bored.  

> So I take your point about the drives and Hadoop/HDFS being able to
> handle what was necessary.  Since I don't have a pool, I should make
> two volumes on one physical drive, something like 750 GB and 750 GB
> and dedicate one for HDFS and one for MR.

Waaaaaaaaaaaaaaay too much for MR.  But that's the idea.  We're currently
toying with 100GB for MR.  Which is still -very- high.  [But we really don't
know our workload that well..... soooo :) ]

View raw message