spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Li <liji...@gmail.com>
Subject Loading Files from HDFS Incurs Network Communication
Date Mon, 26 Oct 2015 08:57:20 GMT
Hi, I find that loading files from HDFS can incur huge amount of network
traffic. Input size is 90G and network traffic is about 80G. By my
understanding, local files should be read and thus no network communication
is needed.

I use Spark 1.5.1, and the following is my code:

val textRDD = sc.textFile("hdfs://master:9000/inputDir")
textRDD.count

Jeffrey

Mime
View raw message