hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Cluster hard drive ratios
Date Wed, 04 May 2011 19:19:57 GMT


Sorry my math is off. I keep thinking in terms of TB per core and not drives. :-)

To be honest I don't know if I would recommend 6 core cpus. 

We're running on what is now considered 'old hardware' (Intel Xeon e5500 series) .
Yes we saw that w 8 cores and 4 drives, we were limited by the # of drives.
Pushing that up to 8 or 12 drives would mean that the disks would be less of a bottleneck.

But then you're looking at memory. 
Which is also a limiting factor...

You're going to have to look at 10GBe. And then your ToR is going to be an issue.
Not all hardware vendors (networking) are equal. You'll want to make sure that your trunk
between racks is more than 10GBe if all of your ports are running 10GBe. 

Everybody has an opinion on this. Outside of Facebook and Yahoo! I don't know of anyone who
is really running large clouds and is willing to talk about it. 

> Date: Wed, 4 May 2011 13:59:38 -0500
> Subject: RE: Cluster hard drive ratios
> From: msgdh8@gmail.com
> To: common-user@hadoop.apache.org
> Mike,
> Thanks for the response. It looks like this discussion forked on the CDH
> list so I have two different conversations now. Also, you're dead on
> that one of the presentations I was referencing was Ravi's.
> With your setup I agree that it would have made no sense to go the 2.5"
> drive route given it would have forced you into the 500-750GB SATA
> drives and all it would allow is more spindles but less capacity at a
> higher cost. The servers we have been considering are actually the
> R710's so dual hexacore with 12 spindles of actual capacity is more of a
> 1:1 in terms of cores to spindles vs the 2:1 I have been reviewing. My
> original issue attempted to focus more around at what point do you
> actually see a plateau in write performance of cores:spindles but since
> we are headed that direction anyway it looks like it was more to sate
> curiosity than driving specifications.
> As to your point, I forgot to include the issue of rebalancing in the
> original email but you are absolutely right. That was another major
> concern especially as we would get closer to filling capacity of a 24TB
> box. I think the original plan was bonded GBe but I think our
> infrastructure team has told us 10GBe would be standard.
> Matt
> -----Original Message-----
> From: Michael Segel [mailto:michael_segel@hotmail.com]
> Sent: Wednesday, May 04, 2011 1:26 PM
> To: common-user@hadoop.apache.org
> Subject: RE: Cluster hard drive ratios
> Hi Matt.
> I think you attended Ravi's presentation....
> One of the reasons we used 4 drives per node is that our nodes are in 1U
> boxes and you can only fit 4 3.5" SATA drives in those boxes. Could we
> have gone for more drives using 2.5" SATA drives? Yes, but then you will
> reduce the amount of disk per node and you would increase your cost per
> node.
> Looking at newer boxes. (C Series from Dell which didn't exist when we
> started...) 12 drives would be 2 drives per core if you went with 6
> cores, or 3 drives per core if you went with 4 core cpus.
> The issue raised here is that using 2TB drives, you now have 24TB of
> disk per node.
> So if you lost a node, that's a lot of background replication occurring.
> IMHO, this would be less of an issue if you went with 10GBe (Solarflare
> cards, which Dell is a reseller) and then a good 10GBe ToR.
> I haven't tried this configuration, so I don't know how well it would
> perform.
> My guess with 10GBe, you'd be ok...
> -Mike
> ----------------------------------------
> > Date: Wed, 4 May 2011 08:43:33 -0700
> > Subject: Cluster hard drive ratios
> > From: msgdh8@gmail.com
> > To: cdh-user@cloudera.org
> > CC: common-user@hadoop.apache.org
> >
> > I have been reviewing quite a few presentations on the web from
> > various businesses, in addition to the ones I watched first hand at
> > the cloudera data summit last week, and I am curious as to others
> > thoughts around hard drive ratios. Various sources including Cloudera
> > have sited 1 HDD x 2 cores x 4 GB ECC but this makes me wonder what
> > the upper bound for HDDs is in this ratio. We have specced out various
> > machines from Dell and it is possible to get dual hexacores with 14
> > drives (2 raided for OS and 12x2TB) but this seems to conflict with
> > that original ratio and some of the specs I have witnessed in
> > presentations (which are mostly 4 drive configurations). I would
> > assume all you incur is additional complexity and more potential for
> > hardware failure on a specific machine but I have seen little to no
> > data stating at what point there is a plateau in write speed
> > performance. Can anyone give personal experience around this type of
> > setup?
> >
> > If we accept that we are incurring the negatives I stated above but we
> > gain higher data density in the cluster then is this setup fine or we
> > overlooking something?
> >
> > Thanks,
> > Matt
> This e-mail message may contain privileged and/or confidential information,
> and is intended to be received only by persons entitled
> to receive such information. If you have received this e-mail in error,
> please notify the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other use
> of this e-mail by you is strictly prohibited.
> All e-mails and attachments sent and received are subject to monitoring,
> reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for
> checking for the presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage
> caused by any such code transmitted by or accompanying
> this e-mail or any attachment.
> The information contained in this email may be subject to the export control
> laws and regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR) and
> sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this
> information you are obligated to comply with all
> applicable U.S. export laws and regulations.
View raw message