Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D4914175EA for ; Wed, 7 Oct 2015 10:57:20 +0000 (UTC) Received: (qmail 88682 invoked by uid 500); 7 Oct 2015 10:57:16 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 88583 invoked by uid 500); 7 Oct 2015 10:57:16 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 88570 invoked by uid 99); 7 Oct 2015 10:57:16 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Oct 2015 10:57:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id B95401A22B9 for ; Wed, 7 Oct 2015 10:57:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.89 X-Spam-Level: ** X-Spam-Status: No, score=2.89 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=seznam.cz Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id FBqcvhUw5i5o for ; Wed, 7 Oct 2015 10:57:07 +0000 (UTC) Received: from smtp1.seznam.cz (smtp1.seznam.cz [77.75.78.43]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTP id 8115128FAC for ; Wed, 7 Oct 2015 10:57:06 +0000 (UTC) Received: from email.seznam.cz by email-smtpc10a.ko.seznam.cz (email-smtpc10a.ko.seznam.cz [10.53.11.45]) id 2e9643bfa98fa5792d0e46fd; Wed, 07 Oct 2015 12:56:58 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=seznam.cz; s=beta; t=1444215418; bh=lsthEgDs2UO5QPGp3zKF1xIIVyjp3JFsXVWE564tT8I=; h=Received:From:To:Cc:Subject:Date:Message-Id:References: Mime-Version:X-Mailer:Content-Type; b=DJF6ySZwyvQj5dZHi0AOfHty88EK3hvDHzYdIl213MsuxiRWzTVkPPr+B8FTCILST XyPKGTScOm6TsubTJcTNGo84r/NLaWYDcDHWumf59rK0ZCukawiM25xFG0odqs14gQ JITQMzLslQJNouUWgVQHvF11Kf1wPLaAPpxpEYyo= Received: from unknown ([2001:67c:284:32:8d7c:2ec:86d5:79bf]) by email.seznam.cz (szn-ebox-4.4.286) with HTTP; Wed, 07 Oct 2015 12:56:50 +0200 (CEST) From: "Jakub Dubovsky" To: "Jonathan Coveney" Cc: "Igor Berman" , user Subject: Re: RDD of ImmutableList Date: Wed, 07 Oct 2015 12:56:50 +0200 (CEST) Message-Id: References: Mime-Version: 1.0 (szn-mime-2.0.6) X-Mailer: szn-ebox-4.4.286 Content-Type: multipart/alternative; boundary="=_582e687c5887120a4b4195a1=07d9e3a0-85e0-533e-b08b-ff3177fb4c12_=" --=_582e687c5887120a4b4195a1=07d9e3a0-85e0-533e-b08b-ff3177fb4c12_= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable I did not realized that scala's and java's immutable collections uses = =0A= different api which causes this. Thank you for reminder. This makes some = =0A= sense now...=0A= =0A= =0A= ---------- P=C5=AFvodn=C3=AD zpr=C3=A1va ----------=0A= Od: Jonathan Coveney =0A= Komu: Jakub Dubovsky =0A= Datum: 7. 10. 2015 1:29:34=0A= P=C5=99edm=C4=9Bt: Re: RDD of ImmutableList=0A= =0A= "=0A= Nobody is saying not to use immutable data structures, only that guava's = =0A= aren't natively supported.=0A= =0A= =0A= =0A= Scala's default collections library is all immutable. list, Vector, Map. = =0A= This is what people generally use, especially in scala code!=0A= =0A= El martes, 6 de octubre de 2015, Jakub Dubovsky escribi=C3=B3:=0A= "=0A= =0A= Thank you for quick reaction.=0A= =0A= =0A= =0A= =0A= I have to say this is very surprising to me. I never received an advice to= =0A= stop using an immutable approach. Whole RDD is designed to be immutable = =0A= (which is sort of sabotaged by not being able to (de)serialize immutable = =0A= classes properly).=C2=A0I will ask on dev list if this is to be changed or= not.=0A= =0A= =0A= =0A= =0A= Ok, I have let go initial feelings and now let's be pragmatic. And this is= =0A= still for everyone not just Igor:=0A= =0A= =0A= =0A= =0A= I use a class from a library which is immutable. Now I want to use this = =0A= class to represent my data in RDD because this saves me a huge amount of = =0A= work. The class uses ImmutableList as one of its fields. That's why it = =0A= fails. But isn't there a way to workaround this? I ask this because=C2=A0I= have =0A= exactly zero knowledge about kryo and the way how it works. So for example= =0A= would some of these two work?=0A= =0A= =0A= =0A= =0A= 1) Change the external class so that it implements writeObject, readObject= =0A= methods (it's java). Will these methods be used by kryo? (I can ask the = =0A= maintainers of a library to change the class if the change is reasonable. = =0A= Adding these methods would be while dropping immutability certainly wouldn= '=0A= t)=0A= =0A= =0A= =0A= =0A= 2) Wrap the class to scala class which would translate the data during (de= )=0A= serialization?=0A= =0A= =0A= =0A= =0A= =C2=A0 Thanks!=0A= =0A= =C2=A0 Jakub Dubovsky=0A= =0A= =0A= ---------- P=C5=AFvodn=C3=AD zpr=C3=A1va ----------=0A= Od: Igor Berman =0A= Komu: Jakub Dubovsky =0A= Datum: 5. 10. 2015 20:11:35=0A= P=C5=99edm=C4=9Bt: Re: RDD of ImmutableList=0A= =0A= "=0A= =0A= kryo doesn't support guava's collections by default=0A= I remember encountered project in github that fixes this(not sure though).= =0A= I've ended to stop using guava collections as soon as spark rdds are = =0A= concerned.=0A= =0A= =0A= =0A= =0A= On 5 October 2015 at 21:04, Jakub Dubovsky =0A= wrote:=0A= "=0A= Hi all,=0A= =0A= =0A= =0A= =C2=A0 I would like to have an advice on how to use ImmutableList with RDD= .=C2=A0Small=0A= presentation of an essence of my problem in spark-shell with guava jar = =0A= added:=0A= =0A= =0A= =0A= =0A= scala> import com.google.common.collect.ImmutableList=0A= =0A= import com.google.common.collect.ImmutableList=0A= =0A= =0A= =0A= =0A= scala> val arr =3D Array(ImmutableList.of(1,2), ImmutableList.of(2,4), = =0A= ImmutableList.of(3,6))=0A= =0A= arr: Array[com.google.common.collect.ImmutableList[Int]] =3D Array([1, 2],= [2,=0A= 4], [3, 6])=0A= =0A= =0A= =0A= =0A= scala> val rdd =3D sc.parallelize(arr)=0A= =0A= =0A= rdd: org.apache.spark.rdd.RDD[com.google.common.collect.ImmutableList[Int]= ] =0A= =3D ParallelCollectionRDD[0] at parallelize at :24=0A= =0A= =0A= =0A= =0A= scala> rdd.count=0A= =0A= =0A= =0A= =0A= =0A= =C2=A0This results in kryo exception saying that it cannot add a new eleme= nt to =0A= list instance while deserialization:=0A= =0A= =0A= =0A= =0A= java.io.IOException: java.lang.UnsupportedOperationException=0A= =0A= =0A= =0A= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.util.Utils$.tryOrIOExcepti= on(Utils.scala:1163)=0A= =0A= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.rdd.ParallelCollectionPart= ition.readObject=0A= (ParallelCollectionRDD.scala:70)=0A= =0A= =C2=A0 =C2=A0 =C2=A0 =C2=A0 ...=0A= =0A= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.Thread.run(Thread.java:745)=0A= =0A= =0A= Caused by: java.lang.UnsupportedOperationException=0A= =0A= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.google.common.collect.ImmutableCollecti= on.add=0A= (ImmutableCollection.java:91)=0A= =0A= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.esotericsoftware.kryo.serializers.Colle= ctionSerializer.read=0A= (CollectionSerializer.java:109)=0A= =0A= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.esotericsoftware.kryo.serializers.Colle= ctionSerializer.read=0A= (CollectionSerializer.java:18)=0A= =0A= =C2=A0 =C2=A0 =C2=A0 =C2=A0 ...=0A= =0A= =0A= =0A= =0A= =0A= =C2=A0 It somehow makes sense. But I cannot think of a workaround and I do= not =0A= believe that using ImmutableList with RDD is not possible. How this is = =0A= solved?=0A= =0A= =0A= =0A= =0A= =C2=A0 Thank you in advance!=0A= =0A= =0A= =0A= =0A= =C2=A0 =C2=A0Jakub Dubovsky=0A= =0A= =0A= =0A= =0A= =0A= "=0A= =0A= =0A= =0A= "=0A= "=0A= =0A= "=0A= --=_582e687c5887120a4b4195a1=07d9e3a0-85e0-533e-b08b-ff3177fb4c12_= Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable I did not realized that scala's and java's immutable collectio= ns uses different api which causes this. Thank you for reminder. This make= s some sense now...

---------- P=C5=AFvodn=C3=AD zpr=C3=A1va --= --------
Od: Jonathan Coveney <jcoveney@gmail.com>
Komu: Jakub= Dubovsky <spark.dubovsky.jakub@seznam.cz>
Datum: 7. 10. 2015 1:2= 9:34
P=C5=99edm=C4=9Bt: Re: RDD of ImmutableList


Nobody is saying not to use immutable data structures, only that guava's= aren't natively supported.

Scala's default collections= library is all immutable. list, Vector, Map. This is what people generall= y use, especially in scala code!

El martes, 6 de octub= re de 2015, Jakub Dubovsky <spark.dubovsky.jakub@seznam.cz> escribi=C3=B3:
=
Thank you for quick reaction.

I have to= say this is very surprising to me. I never received an advice to stop usi= ng an immutable approach. Whole RDD is designed to be immutable (which is = sort of sabotaged by not being able to (de)serialize immutable classes pro= perly). I will ask on dev list if thi= s is to be changed or not.

Ok, I have let = go initial feelings and now let's be pragmatic. And this is still for ever= yone not just Igor:

I use a class from a library = which is immutable. Now I want to use this class to represent my data in R= DD because this saves me a huge amount of work. The class uses ImmutableLi= st as one of its fields. That's why it fails. But isn't there a way to wor= karound this? I ask this because I ha= ve exactly zero knowledge about kryo and the way how it works. So for exam= ple would some of these two work?

1) Change t= he external class so that it implements writeObject, readObject methods (i= t's java). Will these methods be used by kryo? (I can ask the maintainers = of a library to change the class if the change is reasonable. Adding these= methods would be while dropping immutability certainly wouldn't)

2) Wrap the class to scala class which would translat= e the data during (de)serialization?

  T= hanks!
  Jakub Dubov= sky

---------- P=C5=AFvodn=C3=AD zpr=C3=A1va ----------=
Od: Igor Berman <igor.berman@gmail.com>
Komu: Jakub Du= bovsky <spark.dubovsky.jakub@seznam.cz>
Datum: 5. 10. 2015= 20:11:35
P=C5=99edm=C4=9Bt: Re: RDD of ImmutableList


kryo doesn't support guava's collections by default
I reme= mber encountered project in github that fixes this(not sure though). I've = ended to stop using guava collections as soon as spark rdds are concerned.=

On 5 October 2015 at 21:04, Jakub Dubovsky <spark.dubovsky.jakub@seznam.cz> wrote:
=
Hi all,

  I would like to have an advice on h= ow to use ImmutableList with RDD. Sma= ll presentation of an essence of my problem in spark-shell with guava jar = added:

scala> import com.google.common.c= ollect.ImmutableList
import com.google.common.collect.ImmutableL= ist

scala> val arr =3D Array(ImmutableList.of(= 1,2), ImmutableList.of(2,4), ImmutableList.of(3,6))
arr: Array[c= om.google.common.collect.ImmutableList[Int]] =3D Array([1, 2], [2, 4], [3,= 6])

scala> va= l rdd =3D sc.parallelize(arr)
rdd: org.apache.spark.r= dd.RDD[com.google.common.collect.ImmutableList[Int]] =3D ParallelCollectio= nRDD[0] at parallelize at <console>:24

scal= a> rdd.count

 This results in kryo = exception saying that it cannot add a new element to list instance while d= eserialization:

j= ava.io.IOException: java.lang.UnsupportedOperationException
        at org.apache.spark.util.Utils$.tr= yOrIOException(Utils.scala:1163)
        at = org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollec= tionRDD.scala:70)
        ...
        at java.lang.Threa= d.run(Thread.java:745)
Caused by: java.lang.Unsupport= edOperationException
        at com.google.c= ommon.collect.ImmutableCollection.add(ImmutableCollection.java:91)
        at com.esotericsoftware.kryo.serializers.Co= llectionSerializer.read(CollectionSerializer.java:109)
  &n= bsp;     at com.esotericsoftware.kryo.serializers.CollectionSeri= alizer.read(CollectionSerializer.java:18)
      &= nbsp; ...

  It somehow makes sense. Bu= t I cannot think of a workaround and I do not believe that using Immutable= List with RDD is not possible. How this is solved?

  Thank you in advance!
   Jakub Dubovsky

=

=0A= =0A= = --=_582e687c5887120a4b4195a1=07d9e3a0-85e0-533e-b08b-ff3177fb4c12_=--