Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 630F510505 for ; Tue, 2 Dec 2014 11:49:42 +0000 (UTC) Received: (qmail 12909 invoked by uid 500); 2 Dec 2014 11:49:40 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 12840 invoked by uid 500); 2 Dec 2014 11:49:40 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 12830 invoked by uid 99); 2 Dec 2014 11:49:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Dec 2014 11:49:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yanbohappy@gmail.com designates 209.85.212.174 as permitted sender) Received: from [209.85.212.174] (HELO mail-wi0-f174.google.com) (209.85.212.174) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Dec 2014 11:49:13 +0000 Received: by mail-wi0-f174.google.com with SMTP id h11so27788026wiw.7 for ; Tue, 02 Dec 2014 03:48:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=RZJFbvzxHuwtRqRZTWFAPJO1hnAvNFYhjaeum4iD5/k=; b=NRbnO7/Pen2G3OmOhhz+1olVjfnVnuMGhCg2z46/r2VHKrvxer+9EIxPCUXGm0xpBX Ne26DyH8+lGyP/HdNZuqfV27bldr6Z3z0aAt7rSh+T2TqKT8HKxHNPMwr9MsLsGDD4+f 57yTU2s41NIuCPE09dXoTsRtuPnzcSQwCdZnOTHgSv//ZFEJOK5EfHQemCrTMuMj024P KAZ8r2JeHya8lNbSeYW80kePfjs1LMm3J9ArdolrdfwROb7CF9rcmzgBd2RY7Z3ktxjk fIlGwKVeSjjR2nF2SmFUx7ErmAR80qE1npLEzTbTTrN5sXFTjpO12GixuCNxO7WyD0+I w6tA== MIME-Version: 1.0 X-Received: by 10.180.182.199 with SMTP id eg7mr4480160wic.17.1417520907607; Tue, 02 Dec 2014 03:48:27 -0800 (PST) Received: by 10.216.213.8 with HTTP; Tue, 2 Dec 2014 03:48:27 -0800 (PST) In-Reply-To: <709378518.2259528.1417519135965.JavaMail.yahoo@jws100182.mail.ne1.yahoo.com> References: <709378518.2259528.1417519135965.JavaMail.yahoo@jws100182.mail.ne1.yahoo.com> Date: Tue, 2 Dec 2014 19:48:27 +0800 Message-ID: Subject: Re: Is it possible to just change the value of the items in RDD without making a full copy? From: Yanbo Liang To: Xuelin Cao Cc: "user@spark.incubator.apache.org" Content-Type: multipart/alternative; boundary=089e0163503c28ed6905093a4e3f X-Virus-Checked: Checked by ClamAV on apache.org --089e0163503c28ed6905093a4e3f Content-Type: text/plain; charset=UTF-8 You can not modify one RDD in mapPartitions due to RDD is immutable. Once you apply transform function on RDDs, they will produce new RDDs. If you just want to modify only a fraction of the total RDD, try to collect the new value list to driver or use broadcast variable after each iteration, not to update RDD. It's similar with SGD in mllib. 2014-12-02 19:18 GMT+08:00 Xuelin Cao : > > Hi, > > I'd like to make an operation on an RDD that *ONLY *change the value > of some items, without make a full copy or full scan of each data. > > It is useful when I need to handle a large RDD, and each time I need > only to change a little fraction of the data, and keeps other data > unchanged. Certainly I don't want to make a full copy the data to the new > RDD. > > For example, suppose I have a RDD that contains integer data from 0 > to 100. What I want is to make the first element of the RDD changed from 0 > to 1, other elements untouched. > > I tried this, but it doesn't work: > > var rdd = parallelize(Range(0,100)) > rdd.mapPartitions({iter=> iter(0) = 1}) > > The reported error is : value update is not a member of > Iterator[Int] > > > Anyone knows how to make it work? > > --089e0163503c28ed6905093a4e3f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
You can not modify one RDD in mapPartitions due to RDD is = immutable.
Once you apply transform function on RDDs, they will produce= new RDDs.
If you just want to modify only a fraction of the total = RDD, try to collect the new value list to driver or use broadcast variable = after each iteration, not to update RDD. It's similar with SGD in mllib= .

2014-12-02 19:18 GMT+08:00 Xuelin Cao <xuelincao@yahoo.com.inv= alid>:

Hi,=C2=A0

=
=C2=A0 =C2=A0 =C2=A0I'd = like to make an operation on an RDD that ONLY change the value of = =C2=A0some items, without make a full copy or full scan of each data.

= =C2=A0 =C2=A0 =C2=A0It is useful when I need to handle a l= arge RDD, and each time I need only to change a little fraction of the data= , and keeps other data unchanged. Certainly I don't want to make a full= copy the data to the new RDD.

=C2=A0 =C2=A0 =C2=A0Fo= r example, suppose I have a RDD that contains integer data from 0 to 100. W= hat I want is to make the first element of the RDD changed from 0 to 1, oth= er elements untouched.=C2=A0

=C2=A0 =C2=A0 =C2=A0I tried this, but it doe= sn't work:

=C2=A0 =C2=A0 =C2=A0var rdd =3D=C2=A0p= arallelize(Range(0,100))
=C2= =A0 =C2=A0 =C2=A0rdd.mapPartitions({iter=3D> iter(0) =3D 1})
=C2=A0
=C2=A0 =C2=A0 =C2=A0The reported error is : =C2=A0 value updat= e is not a member of Iterator[Int]


=C2=A0 =C2=A0 =C2=A0Anyone knows how to make it work?


--089e0163503c28ed6905093a4e3f--