spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gtanguy <g.tanguy.claravi...@gmail.com>
Subject Re: How does Spark handle RDD via HDFS ?
Date Thu, 10 Apr 2014 10:56:08 GMT
Yes that help to understand better how works spark. But that was also what I
was afraid, I think the network communications will take to much time for my
job.

I will continue to look for a trick in order to not have network
communications.

I saw on the hadoop website that : "To minimize global bandwidth consumption
and read latency, HDFS tries to satisfy a read request from a replica that
is closest to the reader. If there exists a replica on the same rack as the
reader node, then that replica is preferred to satisfy the read request"

May if in a way I success to combine a part of spark and some of this, it
could work.

Thank you very much for you answer.

Germain.




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-does-Spark-handle-RDD-via-HDFS-tp4003p4058.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Mime
View raw message