spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wisc Forum <wiscfo...@gmail.com>
Subject Spark map function question
Date Tue, 22 Oct 2013 04:18:49 GMT
Hi, we have tried integrating Spark with our existing code and see some issues.

The issue is that when we use the below function (where func is a function to process elem)

rdd.map{ elem => {func.apply(elem)} }

in the log, I see the apply function are applied a few times for the same element elem instead
of one.

When I execute this in a sequential way (see below), everything works just fine.

sparkContext.parallelize(rdd.toArray.map(elem => proj.apply(elem)))

(the only reason I used sparkContext.parallelize) in the above line is because the method
requires returning RDD[MyDataType]

Why this happens? Does the map function requires some special thing for the rdd?

Thanks,
Xiaobing
Mime
View raw message