hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Cluster hard drive ratios
Date Thu, 05 May 2011 11:48:04 GMT
On 04/05/11 19:59, Matt Goeke wrote:
> Mike,
>
> Thanks for the response. It looks like this discussion forked on the CDH
> list so I have two different conversations now. Also, you're dead on
> that one of the presentations I was referencing was Ravi's.
>
> With your setup I agree that it would have made no sense to go the 2.5"
> drive route given it would have forced you into the 500-750GB SATA
> drives and all it would allow is more spindles but less capacity at a
> higher cost. The servers we have been considering are actually the
> R710's so dual hexacore with 12 spindles of actual capacity is more of a
> 1:1 in terms of cores to spindles vs the 2:1 I have been reviewing. My
> original issue attempted to focus more around at what point do you
> actually see a plateau in write performance of cores:spindles but since
> we are headed that direction anyway it looks like it was more to sate
> curiosity than driving specifications.

some people are using this as it gives best storage density. You can 
also go for single hexacore servers as in a big cluster the savings 
there translate into even more storage. It all depends on the application.

> As to your point, I forgot to include the issue of rebalancing in the
> original email but you are absolutely right. That was another major
> concern especially as we would get closer to filling capacity of a 24TB
> box. I think the original plan was bonded GBe but I think our
> infrastructure team has told us 10GBe would be standard.


1. If you want to play with bonded GBe then I have some notes I can send 
you -its harder than you think.
2. I don't know anyone who is running 10 GBe + Hadoop, though I see 
hints that StumbleUpon are doing this with Arista switches. You'd have 
to have a very chatty app or 10GBe on the mainboard to justify it.
3. I do know of installations with 24TB HDD and GBe, yes, the overhead 
of a node failure is higher. But with less nodes, P(failure) may be 
lower. The big fear is loss-of-rack, which can come from ToR switch 
failure or from network config errors. Hadoop isn't partition aware & 
will treat a rack outage as the loss of 40+ servers, try to replicate 
all that data, and that's when you're in trouble (look at the AWS EBS 
outage for an example cascade failure).
4. There are JIRA issues for better handling of drive failure, including 
hotswapping and rebalancing data within a single machine.
5. I'd like support for the ability to say "a node is going down, don't 
replicate", and the same for a rack, to ease maintenance.

-Steve

Mime
View raw message