hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Martyniak <j...@beforedawnsolutions.com>
Subject Re: Cluster Machines
Date Wed, 04 Nov 2009 14:16:13 GMT
Allen,

Those are some nice "toys" to play with.

I agree on the VM IO issues, but I am going to try and see, my cluster  
won't be that big to start so any IO issues might not manifest  
themselves until it is bigger.  But we will see.

So I will probably also do 100 or 200 GB for MR, and the rest for  
data.  We will see how it goes.

Thanks for the info, it was very helpful.

-John

On Nov 3, 2009, at 5:49 PM, Allen Wittenauer wrote:

> On 11/3/09 2:29 PM, "John Martyniak" <john@beforedawnsolutions.com>  
> wrote:
>> Would you mind telling me the kinds of configured servers that you  
>> are
>> running?
>
> Our 'real' grid is comprised of shiny Sun 4275s.  But our 'non-real'  
> grid is
> composed of two types of machines with radically different disk
> configurations (size *and* number!).  Keeping the two different  
> types of
> machines is a bit of a pain.  We're going to be replacing that grid  
> in the
> next month or so with a homogeneous config and giving those machines  
> back to
> wherever they came from.
>
>> Also have you had any experience running namenodes or zookeeper on a
>> VM?  I have a couple of much larger boxes that are being used to run
>> VMs, and was thinking of putting both of those on dedicated VM
>> instances, in order to build redundancy/fault tolerance.
>
> I haven't but I'll admit I've been thinking about it. Especially for  
> the
> JobTracker since it seems to like to fall over if you blow on it. [Of
> course, I also have higher expectations of my software stack, much  
> to the
> chagrin of the developers around here. :) ]
>
> In the case of Solaris, we'd use a zone which makes the IO hit  
> negligible.
> But a full blown instance of Xen or VMware or whatever is a bit  
> scarier.
> I'm concerned about the typically slow IO that one can encounter  
> when VM'ing
> a service.
>
>> Regarding the dual drive, I wasn't thinking of doing that for
>> upgradeability, it was more for spindle separation, 1 drive would be
>> for Hadoop/HDFS etc functions and the other would be for OS
>> operations, so there would be no contention between the drives, just
>> on the bus.
>
> This is another spot where "know your workload" comes in.  Unless  
> you are
> doing streaming or taxing memory by paging, I suspect your OS disk  
> is going
> to be bored.
>
>> So I take your point about the drives and Hadoop/HDFS being able to
>> handle what was necessary.  Since I don't have a pool, I should make
>> two volumes on one physical drive, something like 750 GB and 750 GB
>> and dedicate one for HDFS and one for MR.
>
> Waaaaaaaaaaaaaaay too much for MR.  But that's the idea.  We're  
> currently
> toying with 100GB for MR.  Which is still -very- high.  [But we  
> really don't
> know our workload that well..... soooo :) ]
>


Mime
View raw message