spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yanbo Liang <>
Subject Re: Is it possible to just change the value of the items in RDD without making a full copy?
Date Tue, 02 Dec 2014 11:48:27 GMT
You can not modify one RDD in mapPartitions due to RDD is immutable.
Once you apply transform function on RDDs, they will produce new RDDs.
If you just want to modify only a fraction of the total RDD, try to collect
the new value list to driver or use broadcast variable after each
iteration, not to update RDD. It's similar with SGD in mllib.

2014-12-02 19:18 GMT+08:00 Xuelin Cao <>:

> Hi,
>      I'd like to make an operation on an RDD that *ONLY *change the value
> of  some items, without make a full copy or full scan of each data.
>      It is useful when I need to handle a large RDD, and each time I need
> only to change a little fraction of the data, and keeps other data
> unchanged. Certainly I don't want to make a full copy the data to the new
> RDD.
>      For example, suppose I have a RDD that contains integer data from 0
> to 100. What I want is to make the first element of the RDD changed from 0
> to 1, other elements untouched.
>      I tried this, but it doesn't work:
>      var rdd = parallelize(Range(0,100))
>      rdd.mapPartitions({iter=> iter(0) = 1})
>      The reported error is :   value update is not a member of
> Iterator[Int]
>      Anyone knows how to make it work?

View raw message