Hi,
I'd like to make an operation on an RDD that ONLY change the value of some items, without make a full copy or full scan of each data.
It is useful when I need to handle a large RDD, and each time I need only to change a little fraction of the data, and keeps other data unchanged. Certainly I don't want to make a full copy the data to the new RDD.
For example, suppose I have a RDD that contains integer data from 0 to 100. What I want is to make the first element of the RDD changed from 0 to 1, other elements untouched.
I tried this, but it doesn't work:
var rdd = parallelize(Range(0,100))
rdd.mapPartitions({iter=> iter(0) = 1})
The reported error is : value update is not a member of Iterator[Int]
Anyone knows how to make it work?