hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: Hadoop throughput question
Date Thu, 03 Jan 2013 23:09:22 GMT
Unless the Hadoop processing and the OneFS storage are co-located, MapReduce can't schedule
tasks so as to take advantage of data locality.  You would basically be doing a distributed
computation against a separate NAS, so throughput would be limited by the performance properties
of the Insilon NAS and the network switch architecture.  Still, 26MB/sec in aggregate is far
worse than what I'd expect Insilon to deliver, even over a single 1GB connection.
john

From: Artem Ervits [mailto:are9004@nyp.org]
Sent: Thursday, January 03, 2013 4:02 PM
To: user@hadoop.apache.org
Subject: RE: Hadoop throughput question

Hadoop is using OneFS, not HDFS in our configuration. Isilon NAS and the Hadoop nodes are
in the same datacenter but as far as rack locations, I cannot tell.

From: John Lilley [mailto:john.lilley@redpoint.net]
Sent: Thursday, January 03, 2013 5:15 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: RE: Hadoop throughput question

Let's suppose you are doing a read-intensive job like, for example, counting records.  This
is will be disk bandwidth limited.  On a 4-node cluster with 2 local SATA on each node you
should easily read 400MB/sec in aggregate.  When you are running the Hadoop cluster, is the
Hadoop processing co-located with the Ilsilon nodes?  Is Hadoop configured to use OneFS or
HDFS?
John

From: Artem Ervits [mailto:are9004@nyp.org]<mailto:[mailto:are9004@nyp.org]>
Sent: Thursday, January 03, 2013 3:00 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Hadoop throughput question

Hello all,

I'd like to pick the community brain on average throughput speeds for a moderately specced
4-node Hadoop cluster with 1GigE networking. Is it reasonable to expect constant average speeds
of 150-200mb/sec on such setup? Forgive me if the question is loaded but we're Hadoop cluster
with HDFS served via EMC Isilon storage. We're getting about 30mb/sec with our machines and
we do not see a difference in job speed between 2 node cluster and 4 node cluster.

Thank you.





--------------------



This electronic message is intended to be for the use only of the named recipient, and may
contain information that is confidential or privileged.  If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution or use of the contents
of this message is strictly prohibited.  If you have received this message in error or are
not the named recipient, please notify us immediately by contacting the sender at the electronic
mail address noted above, and delete and destroy all copies of this message.  Thank you.





--------------------



This electronic message is intended to be for the use only of the named recipient, and may
contain information that is confidential or privileged.  If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution or use of the contents
of this message is strictly prohibited.  If you have received this message in error or are
not the named recipient, please notify us immediately by contacting the sender at the electronic
mail address noted above, and delete and destroy all copies of this message.  Thank you.





Mime
View raw message