spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuelin Cao <>
Subject Is it possible to just change the value of the items in RDD without making a full copy?
Date Tue, 02 Dec 2014 11:18:55 GMT

     I'd like to make an operation on an RDD that ONLY change the value of  some items,
without make a full copy or full scan of each data.
     It is useful when I need to handle a large RDD, and each time I need only to change
a little fraction of the data, and keeps other data unchanged. Certainly I don't want to make
a full copy the data to the new RDD.
     For example, suppose I have a RDD that contains integer data from 0 to 100. What I
want is to make the first element of the RDD changed from 0 to 1, other elements untouched. 

     I tried this, but it doesn't work:
     var rdd = parallelize(Range(0,100))     rdd.mapPartitions({iter=> iter(0) =
1})      The reported error is :   value update is not a member of Iterator[Int]

     Anyone knows how to make it work?

View raw message