I did not realized that scala's and java's immutable collections uses
different api which causes this. Thank you for reminder. This makes some
sense now...
---------- Původní zpráva ----------
Od: Jonathan Coveney <jcoveney@gmail.com>
Komu: Jakub Dubovsky <spark.dubovsky.jakub@seznam.cz>
Datum: 7. 10. 2015 1:29:34
Předmět: Re: RDD of ImmutableList
"
Nobody is saying not to use immutable data structures, only that guava's
aren't natively supported.
Scala's default collections library is all immutable. list, Vector, Map.
This is what people generally use, especially in scala code!
El martes, 6 de octubre de 2015, Jakub Dubovsky <spark.dubovsky.jakub@
seznam.cz(mailto:spark.dubovsky.jakub@seznam.cz)> escribió:
"
Thank you for quick reaction.
I have to say this is very surprising to me. I never received an advice to
stop using an immutable approach. Whole RDD is designed to be immutable
(which is sort of sabotaged by not being able to (de)serialize immutable
classes properly). I will ask on dev list if this is to be changed or not.
Ok, I have let go initial feelings and now let's be pragmatic. And this is
still for everyone not just Igor:
I use a class from a library which is immutable. Now I want to use this
class to represent my data in RDD because this saves me a huge amount of
work. The class uses ImmutableList as one of its fields. That's why it
fails. But isn't there a way to workaround this? I ask this because I have
exactly zero knowledge about kryo and the way how it works. So for example
would some of these two work?
1) Change the external class so that it implements writeObject, readObject
methods (it's java). Will these methods be used by kryo? (I can ask the
maintainers of a library to change the class if the change is reasonable.
Adding these methods would be while dropping immutability certainly wouldn'
t)
2) Wrap the class to scala class which would translate the data during (de)
serialization?
Thanks!
Jakub Dubovsky
---------- Původní zpráva ----------
Od: Igor Berman <igor.berman@gmail.com>
Komu: Jakub Dubovsky <spark.dubovsky.jakub@seznam.cz>
Datum: 5. 10. 2015 20:11:35
Předmět: Re: RDD of ImmutableList
"
kryo doesn't support guava's collections by default
I remember encountered project in github that fixes this(not sure though).
I've ended to stop using guava collections as soon as spark rdds are
concerned.
On 5 October 2015 at 21:04, Jakub Dubovsky <spark.dubovsky.jakub@seznam.cz>
wrote:
"
Hi all,
I would like to have an advice on how to use ImmutableList with RDD. Small
presentation of an essence of my problem in spark-shell with guava jar
added:
scala> import com.google.common.collect.ImmutableList
import com.google.common.collect.ImmutableList
scala> val arr = Array(ImmutableList.of(1,2), ImmutableList.of(2,4),
ImmutableList.of(3,6))
arr: Array[com.google.common.collect.ImmutableList[Int]] = Array([1, 2], [2,
4], [3, 6])
scala> val rdd = sc.parallelize(arr)
rdd: org.apache.spark.rdd.RDD[com.google.common.collect.ImmutableList[Int]]
= ParallelCollectionRDD[0] at parallelize at <console>:24
scala> rdd.count
This results in kryo exception saying that it cannot add a new element to
list instance while deserialization:
java.io.IOException: java.lang.UnsupportedOperationException
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
at org.apache.spark.rdd.ParallelCollectionPartition.readObject
(ParallelCollectionRDD.scala:70)
...
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.UnsupportedOperationException
at com.google.common.collect.ImmutableCollection.add
(ImmutableCollection.java:91)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read
(CollectionSerializer.java:109)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read
(CollectionSerializer.java:18)
...
It somehow makes sense. But I cannot think of a workaround and I do not
believe that using ImmutableList with RDD is not possible. How this is
solved?
Thank you in advance!
Jakub Dubovsky
"
"
"
"
|