hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Martyniak <j...@beforedawnsolutions.com>
Subject Re: Cluster Machines
Date Wed, 04 Nov 2009 14:16:13 GMT

Those are some nice "toys" to play with.

I agree on the VM IO issues, but I am going to try and see, my cluster  
won't be that big to start so any IO issues might not manifest  
themselves until it is bigger.  But we will see.

So I will probably also do 100 or 200 GB for MR, and the rest for  
data.  We will see how it goes.

Thanks for the info, it was very helpful.


On Nov 3, 2009, at 5:49 PM, Allen Wittenauer wrote:

> On 11/3/09 2:29 PM, "John Martyniak" <john@beforedawnsolutions.com>  
> wrote:
>> Would you mind telling me the kinds of configured servers that you  
>> are
>> running?
> Our 'real' grid is comprised of shiny Sun 4275s.  But our 'non-real'  
> grid is
> composed of two types of machines with radically different disk
> configurations (size *and* number!).  Keeping the two different  
> types of
> machines is a bit of a pain.  We're going to be replacing that grid  
> in the
> next month or so with a homogeneous config and giving those machines  
> back to
> wherever they came from.
>> Also have you had any experience running namenodes or zookeeper on a
>> VM?  I have a couple of much larger boxes that are being used to run
>> VMs, and was thinking of putting both of those on dedicated VM
>> instances, in order to build redundancy/fault tolerance.
> I haven't but I'll admit I've been thinking about it. Especially for  
> the
> JobTracker since it seems to like to fall over if you blow on it. [Of
> course, I also have higher expectations of my software stack, much  
> to the
> chagrin of the developers around here. :) ]
> In the case of Solaris, we'd use a zone which makes the IO hit  
> negligible.
> But a full blown instance of Xen or VMware or whatever is a bit  
> scarier.
> I'm concerned about the typically slow IO that one can encounter  
> when VM'ing
> a service.
>> Regarding the dual drive, I wasn't thinking of doing that for
>> upgradeability, it was more for spindle separation, 1 drive would be
>> for Hadoop/HDFS etc functions and the other would be for OS
>> operations, so there would be no contention between the drives, just
>> on the bus.
> This is another spot where "know your workload" comes in.  Unless  
> you are
> doing streaming or taxing memory by paging, I suspect your OS disk  
> is going
> to be bored.
>> So I take your point about the drives and Hadoop/HDFS being able to
>> handle what was necessary.  Since I don't have a pool, I should make
>> two volumes on one physical drive, something like 750 GB and 750 GB
>> and dedicate one for HDFS and one for MR.
> Waaaaaaaaaaaaaaay too much for MR.  But that's the idea.  We're  
> currently
> toying with 100GB for MR.  Which is still -very- high.  [But we  
> really don't
> know our workload that well..... soooo :) ]

View raw message