Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C93EB1064E for ; Mon, 6 May 2013 03:42:51 +0000 (UTC) Received: (qmail 36265 invoked by uid 500); 6 May 2013 03:42:46 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 35719 invoked by uid 500); 6 May 2013 03:42:40 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 35668 invoked by uid 99); 6 May 2013 03:42:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 May 2013 03:42:38 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rahul.rec.dgp@gmail.com designates 209.85.220.169 as permitted sender) Received: from [209.85.220.169] (HELO mail-vc0-f169.google.com) (209.85.220.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 May 2013 03:42:34 +0000 Received: by mail-vc0-f169.google.com with SMTP id gd11so2805163vcb.0 for ; Sun, 05 May 2013 20:42:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=/5Tf+ntQApzC0HgorOZywMOQf7ZKzrckY/cWyGbMg5w=; b=YRcHLq/ksPCCwnjHxUyqRJ+2LrGpDsY850TrCXR2Qez+gH91K8KS2hYqE7fhpZsYp4 g6VVBBgdLHag15iVXyzBOoDnet1Ga6AkbfH+CzynZctliaclw58xUjb8KuasVsOLVCHZ P2UOlmUwcgrFVrgnlA/rMzH6Aja13L6uqbhTZuhMOgq9cglNiE18Sf/aHupBp9IIv5q6 cgbV5a5dIO33UzIUIWP8Mibz4K7/NMXkFNevXYdwMNburgKHCnt5nypFkVoqGi0gebUM 2c8iAqK6uE3PA9mBGPa0QWvrHUrFgCE+O47fZd3MPOPU/mluMil744stIsKSaVJSqssX zGyg== X-Received: by 10.52.179.105 with SMTP id df9mr5446573vdc.49.1367811732547; Sun, 05 May 2013 20:42:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.59.1.170 with HTTP; Sun, 5 May 2013 20:41:52 -0700 (PDT) In-Reply-To: References: <1367252654.91414.YahooMailNeo@web162202.mail.bf1.yahoo.com> <993F1D70-F3A3-4F63-9D30-8DC40C0C7231@turn.com> From: Rahul Bhattacharjee Date: Mon, 6 May 2013 09:11:52 +0530 Message-ID: Subject: Re: Hardware Selection for Hadoop To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=bcaec517234d70336004dc047c22 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec517234d70336004dc047c22 Content-Type: text/plain; charset=UTF-8 OK. I do not know if I understand the spindle / core thing. I will dig more into that. Thanks for the info. One more thing , whats the significance of multiple NIC. Thanks, Rahul On Mon, May 6, 2013 at 12:17 AM, Ted Dunning wrote: > > Data nodes normally are also task nodes. With 8 physical cores it isn't > that unreasonable to have 64GB whereas 24GB really is going to pinch. > > Achieving highest performance requires that you match the capabilities of > your nodes including CPU, memory, disk and networking. The standard wisdom > is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of > disk bandwidth available as network bandwidth. > > If you look at the different configurations mentioned in this thread, you > will see different limitations. > > For instance: > > 2 x Quad cores Intel >> 2-3 TB x 6 SATA <==== 6 disk < desired 8 or more >> 64GB mem <==== slightly larger than necessary >> 2 1GBe NICs teaming <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB > > > This configuration is mostly limited by networking bandwidth > > 2 x Quad cores Intel >> 2-3 TB x 6 SATA <==== 6 disk < desired 8 or more >> 24GB mem <==== 24GB << 8 x 6GB >> 2 10GBe NICs teaming <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB > > > This configuration is weak on disk relative to CPU and very weak on disk > relative to network speed. The worst problem, however, is likely to be > small memory. This will likely require us to decrease the number of slots > by half or more making it impossible to even use the 6 disks that we have > and making the network even more outrageously over-provisioned. > > > > > On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee < > rahul.rec.dgp@gmail.com> wrote: > >> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN. >> >> >> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum < >> Patai.Sangbutsarakum@turn.com> wrote: >> >>> 2 x Quad cores Intel >>> 2-3 TB x 6 SATA >>> 64GB mem >>> 2 NICs teaming >>> >>> my 2 cents >>> >>> >>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop >>> wrote: >>> >>> Hi, >>> >>> I have to propose some hardware requirements in my company for a Proof >>> of Concept with Hadoop. I was reading Hadoop Operations and also saw >>> Cloudera Website. But just wanted to know from the group - what is the >>> requirements if I have to plan for a 5 node cluster. I dont know at this >>> time, the data that need to be processed at this time for the Proof of >>> Concept. So - can you suggest something to me? >>> >>> Regards, >>> Raj >>> >>> >>> >> > --bcaec517234d70336004dc047c22 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
OK. I do not know if I understand the spind= le / core thing. I will dig more into that.

Thanks for the info.

One more thing , whats th= e significance of multiple NIC.

Thanks,
Rahul


On Mon, May 6, 2013 at 12:17 AM, Te= d Dunning <tdunning@maprtech.com> wrote:

Data nodes normall= y are also task nodes. =C2=A0With 8 physical cores it isn't that unreas= onable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match t= he capabilities of your nodes including CPU, memory, disk and networking. = =C2=A0The standard wisdom is 4-6GB of RAM per core, at least a spindle per = core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned i= n this thread, you will see different limitations.

For instance:

2 x Quad cores Intel
2-3 TB x 6 SATA =C2=A0 =C2= =A0 =C2=A0 =C2=A0 <=3D=3D=3D=3D 6 disk < desired 8 or more
64GB me= m =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<=3D=3D=3D=3D s= lightly larger than necessary
2 1GBe NICs teaming =C2=A0 =C2=A0 <=3D=3D=3D=3D 2 x 100 MB << 400M= B =3D 2/3 x 6 x 100MB

This configura= tion is mostly limited by networking bandwidth

2 x Quad cores Intel
2-3 TB x 6 SATA =C2=A0 =C2= =A0 =C2=A0 =C2=A0 <=3D=3D=3D=3D 6 disk < desired 8 or more
24GB me= m =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<=3D=3D=3D=3D 2= 4GB << 8 x 6GB
2 10GBe NICs teaming =C2=A0 =C2=A0<=3D=3D=3D=3D 2 x 1000 MB > 400MB = =3D 2/3 x 6 x 100MB
=C2=A0
This configuration is weak on disk relative to CPU an= d very weak on disk relative to network speed. =C2=A0The worst problem, how= ever, is likely to be small memory. =C2=A0This will likely require us to de= crease the number of slots by half or more making it impossible to even use= the 6 disks that we have and making the network even more outrageously ove= r-provisioned.
=C2=A0



On Sun, Ma= y 5, 2013 at 9:41 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com> wrote:
IMHO ,64 G looks bit high for= DN. 24 should be good enough for DN.


On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum &l= t;Patai.= Sangbutsarakum@turn.com> wrote:
2 x Quad cores Intel
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents


On Apr 29, 2013, at 9:24 AM, Raj Hadoop <hadoopraj@yahoo.com>
=C2=A0wrote:

Hi,
=C2=A0
I have to propose some hardware requirements in my company for a Proof= of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloud= era Website. But just wanted to know from the group - what is the requireme= nts if I have to plan for a 5 node cluster. I dont know at this time, the data that need= to be processed at this time for the Proof of Concept. So - can you sugges= t something to me?
=C2=A0
Regards,
Raj




--bcaec517234d70336004dc047c22--