hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Hadoop throughput question
Date Fri, 04 Jan 2013 00:11:27 GMT
You can't really say that.

Too many variables in terms of networking. (Like what other traffic is occurring at the same
time? Or who else is attached to the NAS? 

On Jan 3, 2013, at 5:09 PM, John Lilley <john.lilley@redpoint.net> wrote:

> Unless the Hadoop processing and the OneFS storage are co-located, MapReduce can’t
schedule tasks so as to take advantage of data locality.  You would basically be doing a distributed
computation against a separate NAS, so throughput would be limited by the performance properties
of the Insilon NAS and the network switch architecture.  Still, 26MB/sec in aggregate is far
worse than what I’d expect Insilon to deliver, even over a single 1GB connection.
> john
>  
> From: Artem Ervits [mailto:are9004@nyp.org] 
> Sent: Thursday, January 03, 2013 4:02 PM
> To: user@hadoop.apache.org
> Subject: RE: Hadoop throughput question
>  
> Hadoop is using OneFS, not HDFS in our configuration. Isilon NAS and the Hadoop nodes
are in the same datacenter but as far as rack locations, I cannot tell.
>  
> From: John Lilley [mailto:john.lilley@redpoint.net] 
> Sent: Thursday, January 03, 2013 5:15 PM
> To: user@hadoop.apache.org
> Subject: RE: Hadoop throughput question
>  
> Let’s suppose you are doing a read-intensive job like, for example, counting records.
 This is will be disk bandwidth limited.  On a 4-node cluster with 2 local SATA on each node
you should easily read 400MB/sec in aggregate.  When you are running the Hadoop cluster, is
the Hadoop processing co-located with the Ilsilon nodes?  Is Hadoop configured to use OneFS
or HDFS?
> John
>  
> From: Artem Ervits [mailto:are9004@nyp.org] 
> Sent: Thursday, January 03, 2013 3:00 PM
> To: user@hadoop.apache.org
> Subject: Hadoop throughput question
>  
> Hello all,
>  
> I’d like to pick the community brain on average throughput speeds for a moderately
specced 4-node Hadoop cluster with 1GigE networking. Is it reasonable to expect constant average
speeds of 150-200mb/sec on such setup? Forgive me if the question is loaded but we’re Hadoop
cluster with HDFS served via EMC Isilon storage. We’re getting about 30mb/sec with our machines
and we do not see a difference in job speed between 2 node cluster and 4 node cluster.
>  
> Thank you.
>  
>  
> --------------------
>  
> This electronic message is intended to be for the use only of the named recipient, and
may contain information that is confidential or privileged.  If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution or use of the contents
of this message is strictly prohibited.  If you have received this message in error or are
not the named recipient, please notify us immediately by contacting the sender at the electronic
mail address noted above, and delete and destroy all copies of this message.  Thank you.
>  
>  
> --------------------
>  
> This electronic message is intended to be for the use only of the named recipient, and
may contain information that is confidential or privileged.  If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution or use of the contents
of this message is strictly prohibited.  If you have received this message in error or are
not the named recipient, please notify us immediately by contacting the sender at the electronic
mail address noted above, and delete and destroy all copies of this message.  Thank you.
>  


Mime
View raw message