Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C1C83E6E6 for ; Thu, 3 Jan 2013 23:09:58 +0000 (UTC) Received: (qmail 59137 invoked by uid 500); 3 Jan 2013 23:09:53 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 59051 invoked by uid 500); 3 Jan 2013 23:09:53 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 59044 invoked by uid 99); 3 Jan 2013 23:09:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Jan 2013 23:09:53 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of john.lilley@redpoint.net designates 206.225.164.217 as permitted sender) Received: from [206.225.164.217] (HELO hub021-nj-2.exch021.serverdata.net) (206.225.164.217) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Jan 2013 23:09:44 +0000 Received: from MBX021-E3-NJ-2.exch021.domain.local ([10.240.4.78]) by HUB021-NJ-2.exch021.domain.local ([10.240.4.33]) with mapi id 14.02.0318.001; Thu, 3 Jan 2013 15:09:23 -0800 From: John Lilley To: "user@hadoop.apache.org" Subject: RE: Hadoop throughput question Thread-Topic: Hadoop throughput question Thread-Index: Ac3p/SHw1SvtjA3FRb6wvLoM0YfjTgAAXskgAAFxWuAAAJyzQA== Date: Thu, 3 Jan 2013 23:09:22 +0000 Message-ID: <869970D71E26D7498BDAC4E1CA92226B3FCD64B9@MBX021-E3-NJ-2.exch021.domain.local> References: <22945_1357250425_0MG200FEIL4LY760_99DD75DC8938B743BBBC2CA54F7224A706D293F6@NYSGMBXB06.a.wcmc-ad.net> <869970D71E26D7498BDAC4E1CA92226B3FCD63BB@MBX021-E3-NJ-2.exch021.domain.local> <22945_1357254138_0MG200A81NZT5750_99DD75DC8938B743BBBC2CA54F7224A706D294B1@NYSGMBXB06.a.wcmc-ad.net> In-Reply-To: <22945_1357254138_0MG200A81NZT5750_99DD75DC8938B743BBBC2CA54F7224A706D294B1@NYSGMBXB06.a.wcmc-ad.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [206.168.224.109] Content-Type: multipart/alternative; boundary="_000_869970D71E26D7498BDAC4E1CA92226B3FCD64B9MBX021E3NJ2exch_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_869970D71E26D7498BDAC4E1CA92226B3FCD64B9MBX021E3NJ2exch_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Unless the Hadoop processing and the OneFS storage are co-located, MapReduc= e can't schedule tasks so as to take advantage of data locality. You would= basically be doing a distributed computation against a separate NAS, so th= roughput would be limited by the performance properties of the Insilon NAS = and the network switch architecture. Still, 26MB/sec in aggregate is far w= orse than what I'd expect Insilon to deliver, even over a single 1GB connec= tion. john From: Artem Ervits [mailto:are9004@nyp.org] Sent: Thursday, January 03, 2013 4:02 PM To: user@hadoop.apache.org Subject: RE: Hadoop throughput question Hadoop is using OneFS, not HDFS in our configuration. Isilon NAS and the Ha= doop nodes are in the same datacenter but as far as rack locations, I canno= t tell. From: John Lilley [mailto:john.lilley@redpoint.net] Sent: Thursday, January 03, 2013 5:15 PM To: user@hadoop.apache.org Subject: RE: Hadoop throughput question Let's suppose you are doing a read-intensive job like, for example, countin= g records. This is will be disk bandwidth limited. On a 4-node cluster wi= th 2 local SATA on each node you should easily read 400MB/sec in aggregate.= When you are running the Hadoop cluster, is the Hadoop processing co-loca= ted with the Ilsilon nodes? Is Hadoop configured to use OneFS or HDFS? John From: Artem Ervits [mailto:are9004@nyp.org] Sent: Thursday, January 03, 2013 3:00 PM To: user@hadoop.apache.org Subject: Hadoop throughput question Hello all, I'd like to pick the community brain on average throughput speeds for a mod= erately specced 4-node Hadoop cluster with 1GigE networking. Is it reasonab= le to expect constant average speeds of 150-200mb/sec on such setup? Forgiv= e me if the question is loaded but we're Hadoop cluster with HDFS served vi= a EMC Isilon storage. We're getting about 30mb/sec with our machines and we= do not see a difference in job speed between 2 node cluster and 4 node clu= ster. Thank you. -------------------- This electronic message is intended to be for the use only of the named rec= ipient, and may contain information that is confidential or privileged. If= you are not the intended recipient, you are hereby notified that any discl= osure, copying, distribution or use of the contents of this message is stri= ctly prohibited. If you have received this message in error or are not the= named recipient, please notify us immediately by contacting the sender at = the electronic mail address noted above, and delete and destroy all copies = of this message. Thank you. -------------------- This electronic message is intended to be for the use only of the named rec= ipient, and may contain information that is confidential or privileged. If= you are not the intended recipient, you are hereby notified that any discl= osure, copying, distribution or use of the contents of this message is stri= ctly prohibited. If you have received this message in error or are not the= named recipient, please notify us immediately by contacting the sender at = the electronic mail address noted above, and delete and destroy all copies = of this message. Thank you. --_000_869970D71E26D7498BDAC4E1CA92226B3FCD64B9MBX021E3NJ2exch_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Unless the Hadoop proc= essing and the OneFS storage are co-located, MapReduce can’t schedule= tasks so as to take advantage of data locality.  You would basically = be doing a distributed computation against a separate NAS, so throughput would be limited by the performance properties of the I= nsilon NAS and the network switch architecture.  Still, 26MB/sec in ag= gregate is far worse than what I’d expect Insilon to deliver, even ov= er a single 1GB connection.

john=

 

From: Artem Er= vits [mailto:are9004@nyp.org]
Sent: Thursday, January 03, 2013 4:02 PM
To: user@hadoop.apache.org
Subject: RE: Hadoop throughput question

 

Hadoop is using OneFS,= not HDFS in our configuration. Isilon NAS and the Hadoop nodes are in the = same datacenter but as far as rack locations, I cannot tell.

 

From: John Lil= ley [mailto:john.lilley@redpoin= t.net]
Sent: Thursday, January 03, 2013 5:15 PM
To: user@hadoop.apache.org=
Subject: RE: Hadoop throughput question

 

Let’s suppose yo= u are doing a read-intensive job like, for example, counting records. = This is will be disk bandwidth limited.  On a 4-node cluster with 2 l= ocal SATA on each node you should easily read 400MB/sec in aggregate.  When you are running the Hadoop cluster, is the Hadoop= processing co-located with the Ilsilon nodes?  Is Hadoop configured t= o use OneFS or HDFS?

John=

 

From: Artem Er= vits [mailto:are9004@nyp.org] Sent: Thursday, January 03, 2013 3:00 PM
To: user@hadoop.apache.org=
Subject: Hadoop throughput question

 

Hello all,

 

I’d like to pick the community brain on averag= e throughput speeds for a moderately specced 4-node Hadoop cluster with 1Gi= gE networking. Is it reasonable to expect constant average speeds of 150-20= 0mb/sec on such setup? Forgive me if the question is loaded but we’re Hadoop cluster with HDFS served via EMC= Isilon storage. We’re getting about 30mb/sec with our machines and w= e do not see a difference in job speed between 2 node cluster and 4 node cl= uster.

 

Thank you.

 
 
--------------------
 
This electronic message is intended to be for the use only of the name=
d recipient, and may contain information that is confidential or privileged=
.  If you are not the intended recipient, you are hereby notified that=
 any disclosure, copying, distribution or use of the contents of this messa=
ge is strictly prohibited.  If you have received this message in error=
 or are not the named recipient, please notify us immediately by contacting=
 the sender at the electronic mail address noted above, and delete and dest=
roy all copies of this message.  Thank you.
 
 
--------------------
 
This electronic message is intended to be for the use only of the name=
d recipient, and may contain information that is confidential or privileged=
.  If you are not the intended recipient, you are hereby notified that=
 any disclosure, copying, distribution or use of the contents of this messa=
ge is strictly prohibited.  If you have received this message in error=
 or are not the named recipient, please notify us immediately by contacting=
 the sender at the electronic mail address noted above, and delete and dest=
roy all copies of this message.  Thank you.
 
 
--_000_869970D71E26D7498BDAC4E1CA92226B3FCD64B9MBX021E3NJ2exch_--