spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jakub Dubovsky" <spark.dubovsky.ja...@seznam.cz>
Subject Re: RDD of ImmutableList
Date Wed, 07 Oct 2015 10:56:50 GMT
I did not realized that scala's and java's immutable collections uses 
different api which causes this. Thank you for reminder. This makes some 
sense now...


---------- Původní zpráva ----------
Od: Jonathan Coveney <jcoveney@gmail.com>
Komu: Jakub Dubovsky <spark.dubovsky.jakub@seznam.cz>
Datum: 7. 10. 2015 1:29:34
Předmět: Re: RDD of ImmutableList

"
Nobody is saying not to use immutable data structures, only that guava's 
aren't natively supported.



Scala's default collections library is all immutable. list, Vector, Map. 
This is what people generally use, especially in scala code!

El martes, 6 de octubre de 2015, Jakub Dubovsky <spark.dubovsky.jakub@
seznam.cz(mailto:spark.dubovsky.jakub@seznam.cz)> escribió:
"

Thank you for quick reaction.




I have to say this is very surprising to me. I never received an advice to 
stop using an immutable approach. Whole RDD is designed to be immutable 
(which is sort of sabotaged by not being able to (de)serialize immutable 
classes properly). I will ask on dev list if this is to be changed or not.




Ok, I have let go initial feelings and now let's be pragmatic. And this is 
still for everyone not just Igor:




I use a class from a library which is immutable. Now I want to use this 
class to represent my data in RDD because this saves me a huge amount of 
work. The class uses ImmutableList as one of its fields. That's why it 
fails. But isn't there a way to workaround this? I ask this because I have 
exactly zero knowledge about kryo and the way how it works. So for example 
would some of these two work?




1) Change the external class so that it implements writeObject, readObject 
methods (it's java). Will these methods be used by kryo? (I can ask the 
maintainers of a library to change the class if the change is reasonable. 
Adding these methods would be while dropping immutability certainly wouldn'
t)




2) Wrap the class to scala class which would translate the data during (de)
serialization?




  Thanks!

  Jakub Dubovsky


---------- Původní zpráva ----------
Od: Igor Berman <igor.berman@gmail.com>
Komu: Jakub Dubovsky <spark.dubovsky.jakub@seznam.cz>
Datum: 5. 10. 2015 20:11:35
Předmět: Re: RDD of ImmutableList

"

kryo doesn't support guava's collections by default
I remember encountered project in github that fixes this(not sure though). 
I've ended to stop using guava collections as soon as spark rdds are 
concerned.




On 5 October 2015 at 21:04, Jakub Dubovsky <spark.dubovsky.jakub@seznam.cz> 
wrote:
"
Hi all,



  I would like to have an advice on how to use ImmutableList with RDD. Small
 presentation of an essence of my problem in spark-shell with guava jar 
added:




scala> import com.google.common.collect.ImmutableList

import com.google.common.collect.ImmutableList




scala> val arr = Array(ImmutableList.of(1,2), ImmutableList.of(2,4), 
ImmutableList.of(3,6))

arr: Array[com.google.common.collect.ImmutableList[Int]] = Array([1, 2], [2,
4], [3, 6])




scala> val rdd = sc.parallelize(arr)


rdd: org.apache.spark.rdd.RDD[com.google.common.collect.ImmutableList[Int]] 
= ParallelCollectionRDD[0] at parallelize at <console>:24




scala> rdd.count





 This results in kryo exception saying that it cannot add a new element to 
list instance while deserialization:




java.io.IOException: java.lang.UnsupportedOperationException



        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)

        at org.apache.spark.rdd.ParallelCollectionPartition.readObject
(ParallelCollectionRDD.scala:70)

        ...

        at java.lang.Thread.run(Thread.java:745)


Caused by: java.lang.UnsupportedOperationException

        at com.google.common.collect.ImmutableCollection.add
(ImmutableCollection.java:91)

        at com.esotericsoftware.kryo.serializers.CollectionSerializer.read
(CollectionSerializer.java:109)

        at com.esotericsoftware.kryo.serializers.CollectionSerializer.read
(CollectionSerializer.java:18)

        ...





  It somehow makes sense. But I cannot think of a workaround and I do not 
believe that using ImmutableList with RDD is not possible. How this is 
solved?




  Thank you in advance!




   Jakub Dubovsky





"



"
"

"
Mime
View raw message