On Thu, Jan 3, 2013 at 4= :00 PM, Artem Ervits <are9004@nyp.org> wrote:

I will follow up on th= at certainly, thank you for the information.

=A0

So further investigati= on showed that counting SequenceFile records takes about 26mb/sec. If I sim= ply read bytes on the same cluster and the same file, the speed is 70mb/sec= . Is there a configuration for optimizing SequenceFile processing?

=A0

Thank you.

=A0

From: John Lil= ley [mailto:j= ohn.lilley@redpoint.net]
Sent: Thursday, January 03, 2013 6:09 PM

To: user= @hadoop.apache.org
Subject: RE: Hadoop throughput question
=

=A0

Unless the Hadoop proc= essing and the OneFS storage are co-located, MapReduce can=92t schedule tas= ks so as to take advantage of data locality.=A0 You would basically be doin= g a distributed computation against a separate NAS, so throughput would be limited by the performance properties of the I= nsilon NAS and the network switch architecture.=A0 Still, 26MB/sec in aggre= gate is far worse than what I=92d expect Insilon to deliver, even over a si= ngle 1GB connection.

john

=A0

From: Artem Er= vits [mailto:are90= 04@nyp.org]
Sent: Thursday, January 03, 2013 4:02 PM
To: user= @hadoop.apache.org
Subject: RE: Hadoop throughput question

=A0

Hadoop is using OneFS,= not HDFS in our configuration. Isilon NAS and the Hadoop nodes are in the = same datacenter but as far as rack locations, I cannot tell.

=A0

From: John Lil= ley [mailto:j= ohn.lilley@redpoint.net]
Sent: Thursday, January 03, 2013 5:15 PM
To: user= @hadoop.apache.org
Subject: RE: Hadoop throughput question

=A0

Let=92s suppose you ar= e doing a read-intensive job like, for example, counting records.=A0 This i= s will be disk bandwidth limited.=A0 On a 4-node cluster with 2 local SATA = on each node you should easily read 400MB/sec in aggregate.=A0 When you are running the Hadoop cluster, is the Hadoop pr= ocessing co-located with the Ilsilon nodes?=A0 Is Hadoop configured to use = OneFS or HDFS?

John

=A0

From: Artem Er= vits [mailto:are90= 04@nyp.org]
Sent: Thursday, January 03, 2013 3:00 PM
To: user= @hadoop.apache.org
Subject: Hadoop throughput question

=A0

Hello all,

=A0

I=92d like to pick the community brain on average th= roughput speeds for a moderately specced 4-node Hadoop cluster with 1GigE n= etworking. Is it reasonable to expect constant average speeds of 150-200mb/= sec on such setup? Forgive me if the question is loaded but we=92re Hadoop cluster with HDFS served via EMC Isi= lon storage. We=92re getting about 30mb/sec with our machines and we do not= see a difference in job speed between 2 node cluster and 4 node cluster.

=A0

Thank you.

=A0

=A0

--------------------

=A0

This electronic message is intended to be for the use only of the name= d recipient, and may contain information that is confidential or privileged= .=A0 If you are not the intended recipient, you are hereby notified that an= y disclosure, copying, distribution or use of the contents of this message = is strictly prohibited.=A0 If you have received this message in error or ar= e not the named recipient, please notify us immediately by contacting the s= ender at the electronic mail address noted above, and delete and destroy al= l copies of this message.=A0 Thank you.

=A0

=A0

--------------------

=A0

This electronic message is intended to be for the use only of the name= d recipient, and may contain information that is confidential or privileged= .=A0 If you are not the intended recipient, you are hereby notified that an= y disclosure, copying, distribution or use of the contents of this message = is strictly prohibited.=A0 If you have received this message in error or ar= e not the named recipient, please notify us immediately by contacting the s= ender at the electronic mail address noted above, and delete and destroy al= l copies of this message.=A0 Thank you.

=A0

=A0
-------------------- This electronic message is intended to be for the use only of the named rec= ipient, and may contain information that is confidential or privileged. If= you are not the intended recipient, you are hereby notified that any discl= osure, copying, distribution or use of the contents of this message is stri= ctly prohibited. If you have received this message in error or are not the= named recipient, please notify us immediately by contacting the sender at = the electronic mail address noted above, and delete and destroy all copies = of this message. Thank you.
-------------------- This electronic message is intended to be for the use only of the named rec= ipient, and may contain information that is confidential or privileged. If= you are not the intended recipient, you are hereby notified that any discl= osure, copying, distribution or use of the contents of this message is stri= ctly prohibited. If you have received this message in error or are not the= named recipient, please notify us immediately by contacting the sender at = the electronic mail address noted above, and delete and destroy all copies = of this message. Thank you.