hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sathya" <sat...@morisonmenon.com>
Subject RE: Data Locality Importance
Date Sun, 23 Mar 2014 04:07:25 GMT
"VOTE FOR MODI" or teach me how not to get mails

-----Original Message-----
From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com] On Behalf Of
Vinod Kumar Vavilapalli
Sent: Sunday, March 23, 2014 12:20 AM
To: common-user@hadoop.apache.org
Subject: Re: Data Locality Importance

Like you said, it depends both on the kind of network you have and the type
of your workload.

Given your point about S3, I'd guess your input files/blocks are not large
enough that moving code to data trumps moving data itself to the code. When
that balance tilts a lot, especially when moving large input data
files/blocks, data-locality will help improve performance significantly.
That or when the read throughput from a remote desk << reading it from a
local disk.

HTH
+Vinod

On Mar 21, 2014, at 7:06 PM, Mike Sam <mikesam460@gmail.com> wrote:

> How important is Data Locality to Hadoop? I mean, if we prefer to 
> separate the HDFS cluster from the MR cluster, we will lose data 
> locality but my question is how bad is this assuming we provider a 
> reasonable network connection between the two clusters? EMR kills data 
> locality when using S3 as storage but we do not see a significant job 
> time difference running same job from the HDFS cluster of the same 
> setup. So, I am wondering how important is Data Locality to Hadoop in
practice?
> 
> Thanks,
> Mike


--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader of
this message is not the intended recipient, you are hereby notified that any
printing, copying, dissemination, distribution, disclosure or forwarding of
this communication is strictly prohibited. If you have received this
communication in error, please contact the sender immediately and delete it
from your system. Thank You.


---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com


Mime
View raw message