hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Sam <mikesam...@gmail.com>
Subject Data Locality Importance
Date Sat, 22 Mar 2014 02:06:38 GMT
How important is Data Locality to Hadoop? I mean, if we prefer to separate
the HDFS cluster from the MR cluster, we will lose data locality but my
question is how bad is this assuming we provider a reasonable network
connection between the two clusters? EMR kills data locality when using S3
as storage but we do not see a significant job time difference running same
job from the HDFS cluster of the same setup. So, I am wondering
how important is Data Locality to Hadoop in practice?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message