Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 58EB836AF for ; Wed, 4 May 2011 19:20:28 +0000 (UTC) Received: (qmail 55249 invoked by uid 500); 4 May 2011 19:20:25 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 55205 invoked by uid 500); 4 May 2011 19:20:25 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 55197 invoked by uid 99); 4 May 2011 19:20:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 May 2011 19:20:25 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of michael_segel@hotmail.com designates 65.55.34.219 as permitted sender) Received: from [65.55.34.219] (HELO col0-omc4-s17.col0.hotmail.com) (65.55.34.219) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 May 2011 19:20:18 +0000 Received: from COL117-W37 ([65.55.34.200]) by col0-omc4-s17.col0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 4 May 2011 12:19:56 -0700 Message-ID: X-Originating-IP: [65.167.11.254] From: Michael Segel To: Subject: RE: Cluster hard drive ratios Date: Wed, 4 May 2011 14:19:57 -0500 Importance: Normal In-Reply-To: References: <997402fd-0123-42af-b5ed-8646363ff7f9@a18g2000yqj.googlegroups.com>,,<761C4A563D55A247B1793166DCE227EE019EC2F3@na1000exm13.na.ds.monsanto.com>, Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginalArrivalTime: 04 May 2011 19:19:56.0492 (UTC) FILETIME=[40C1E8C0:01CC0A90] Hey! Sorry my math is off. I keep thinking in terms of TB per core and not drive= s. :-) To be honest I don't know if I would recommend 6 core cpus.=20 We're running on what is now considered 'old hardware' (Intel Xeon e5500 se= ries) . Yes we saw that w 8 cores and 4 drives=2C we were limited by the # of drive= s. Pushing that up to 8 or 12 drives would mean that the disks would be less o= f a bottleneck.=20 But then you're looking at memory.=20 Which is also a limiting factor... You're going to have to look at 10GBe. And then your ToR is going to be an = issue. Not all hardware vendors (networking) are equal. You'll want to make sure t= hat your trunk between racks is more than 10GBe if all of your ports are ru= nning 10GBe.=20 Everybody has an opinion on this. Outside of Facebook and Yahoo! I don't kn= ow of anyone who is really running large clouds and is willing to talk abou= t it.=20 ---------------------------------------- > Date: Wed=2C 4 May 2011 13:59:38 -0500 > Subject: RE: Cluster hard drive ratios > From: msgdh8@gmail.com > To: common-user@hadoop.apache.org > > Mike=2C > > Thanks for the response. It looks like this discussion forked on the CDH > list so I have two different conversations now. Also=2C you're dead on > that one of the presentations I was referencing was Ravi's. > > With your setup I agree that it would have made no sense to go the 2.5" > drive route given it would have forced you into the 500-750GB SATA > drives and all it would allow is more spindles but less capacity at a > higher cost. The servers we have been considering are actually the > R710's so dual hexacore with 12 spindles of actual capacity is more of a > 1:1 in terms of cores to spindles vs the 2:1 I have been reviewing. My > original issue attempted to focus more around at what point do you > actually see a plateau in write performance of cores:spindles but since > we are headed that direction anyway it looks like it was more to sate > curiosity than driving specifications. > > As to your point=2C I forgot to include the issue of rebalancing in the > original email but you are absolutely right. That was another major > concern especially as we would get closer to filling capacity of a 24TB > box. I think the original plan was bonded GBe but I think our > infrastructure team has told us 10GBe would be standard. > > Matt > > -----Original Message----- > From: Michael Segel [mailto:michael_segel@hotmail.com] > Sent: Wednesday=2C May 04=2C 2011 1:26 PM > To: common-user@hadoop.apache.org > Subject: RE: Cluster hard drive ratios > > Hi Matt. > > I think you attended Ravi's presentation.... > > One of the reasons we used 4 drives per node is that our nodes are in 1U > boxes and you can only fit 4 3.5" SATA drives in those boxes. Could we > have gone for more drives using 2.5" SATA drives? Yes=2C but then you wil= l > reduce the amount of disk per node and you would increase your cost per > node. > > Looking at newer boxes. (C Series from Dell which didn't exist when we > started...) 12 drives would be 2 drives per core if you went with 6 > cores=2C or 3 drives per core if you went with 4 core cpus. > > The issue raised here is that using 2TB drives=2C you now have 24TB of > disk per node. > > So if you lost a node=2C that's a lot of background replication occurring= . > IMHO=2C this would be less of an issue if you went with 10GBe (Solarflare > cards=2C which Dell is a reseller) and then a good 10GBe ToR. > > I haven't tried this configuration=2C so I don't know how well it would > perform. > My guess with 10GBe=2C you'd be ok... > > HTH > > -Mike > > ---------------------------------------- > > Date: Wed=2C 4 May 2011 08:43:33 -0700 > > Subject: Cluster hard drive ratios > > From: msgdh8@gmail.com > > To: cdh-user@cloudera.org > > CC: common-user@hadoop.apache.org > > > > I have been reviewing quite a few presentations on the web from > > various businesses=2C in addition to the ones I watched first hand at > > the cloudera data summit last week=2C and I am curious as to others > > thoughts around hard drive ratios. Various sources including Cloudera > > have sited 1 HDD x 2 cores x 4 GB ECC but this makes me wonder what > > the upper bound for HDDs is in this ratio. We have specced out various > > machines from Dell and it is possible to get dual hexacores with 14 > > drives (2 raided for OS and 12x2TB) but this seems to conflict with > > that original ratio and some of the specs I have witnessed in > > presentations (which are mostly 4 drive configurations). I would > > assume all you incur is additional complexity and more potential for > > hardware failure on a specific machine but I have seen little to no > > data stating at what point there is a plateau in write speed > > performance. Can anyone give personal experience around this type of > > setup? > > > > If we accept that we are incurring the negatives I stated above but we > > gain higher data density in the cluster then is this setup fine or we > > overlooking something? > > > > Thanks=2C > > Matt > > This e-mail message may contain privileged and/or confidential informatio= n=2C > and is intended to be received only by persons entitled > to receive such information. If you have received this e-mail in error=2C > please notify the sender immediately. Please delete it and > all attachments from any servers=2C hard drives or any other media. Other= use > of this e-mail by you is strictly prohibited. > > All e-mails and attachments sent and received are subject to monitoring= =2C > reading and archival by Monsanto=2C including its > subsidiaries. The recipient of this e-mail is solely responsible for > checking for the presence of "Viruses" or other "Malware". > Monsanto=2C along with its subsidiaries=2C accepts no liability for any d= amage > caused by any such code transmitted by or accompanying > this e-mail or any attachment. > > The information contained in this email may be subject to the export cont= rol > laws and regulations of the United States=2C potentially > including but not limited to the Export Administration Regulations (EAR) = and > sanctions regulations issued by the U.S. Department of > Treasury=2C Office of Foreign Asset Controls (OFAC). As a recipient of th= is > information you are obligated to comply with all > applicable U.S. export laws and regulations. =