spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: RDD of ImmutableList
Date Wed, 07 Oct 2015 13:08:58 GMT
I think Java's immutable collections are fine with respect to kryo --
that's not the same as Guava.

On Wed, Oct 7, 2015 at 11:56 AM, Jakub Dubovsky
<spark.dubovsky.jakub@seznam.cz> wrote:
> I did not realized that scala's and java's immutable collections uses
> different api which causes this. Thank you for reminder. This makes some
> sense now...
>
> ---------- Původní zpráva ----------
> Od: Jonathan Coveney <jcoveney@gmail.com>
> Komu: Jakub Dubovsky <spark.dubovsky.jakub@seznam.cz>
> Datum: 7. 10. 2015 1:29:34
>
>
> Předmět: Re: RDD of ImmutableList
>
>
> Nobody is saying not to use immutable data structures, only that guava's
> aren't natively supported.
>
> Scala's default collections library is all immutable. list, Vector, Map.
> This is what people generally use, especially in scala code!
>
> El martes, 6 de octubre de 2015, Jakub Dubovsky
> <spark.dubovsky.jakub@seznam.cz> escribió:
>
> Thank you for quick reaction.
>
> I have to say this is very surprising to me. I never received an advice to
> stop using an immutable approach. Whole RDD is designed to be immutable
> (which is sort of sabotaged by not being able to (de)serialize immutable
> classes properly). I will ask on dev list if this is to be changed or not.
>
> Ok, I have let go initial feelings and now let's be pragmatic. And this is
> still for everyone not just Igor:
>
> I use a class from a library which is immutable. Now I want to use this
> class to represent my data in RDD because this saves me a huge amount of
> work. The class uses ImmutableList as one of its fields. That's why it
> fails. But isn't there a way to workaround this? I ask this because I have
> exactly zero knowledge about kryo and the way how it works. So for example
> would some of these two work?
>
> 1) Change the external class so that it implements writeObject, readObject
> methods (it's java). Will these methods be used by kryo? (I can ask the
> maintainers of a library to change the class if the change is reasonable.
> Adding these methods would be while dropping immutability certainly
> wouldn't)
>
> 2) Wrap the class to scala class which would translate the data during
> (de)serialization?
>
>   Thanks!
>   Jakub Dubovsky
>
> ---------- Původní zpráva ----------
> Od: Igor Berman <igor.berman@gmail.com>
> Komu: Jakub Dubovsky <spark.dubovsky.jakub@seznam.cz>
> Datum: 5. 10. 2015 20:11:35
> Předmět: Re: RDD of ImmutableList
>
>
> kryo doesn't support guava's collections by default
> I remember encountered project in github that fixes this(not sure though).
> I've ended to stop using guava collections as soon as spark rdds are
> concerned.
>
> On 5 October 2015 at 21:04, Jakub Dubovsky <spark.dubovsky.jakub@seznam.cz>
> wrote:
>
> Hi all,
>
>   I would like to have an advice on how to use ImmutableList with RDD. Small
> presentation of an essence of my problem in spark-shell with guava jar
> added:
>
> scala> import com.google.common.collect.ImmutableList
> import com.google.common.collect.ImmutableList
>
> scala> val arr = Array(ImmutableList.of(1,2), ImmutableList.of(2,4),
> ImmutableList.of(3,6))
> arr: Array[com.google.common.collect.ImmutableList[Int]] = Array([1, 2], [2,
> 4], [3, 6])
>
> scala> val rdd = sc.parallelize(arr)
> rdd: org.apache.spark.rdd.RDD[com.google.common.collect.ImmutableList[Int]]
> = ParallelCollectionRDD[0] at parallelize at <console>:24
>
> scala> rdd.count
>
>  This results in kryo exception saying that it cannot add a new element to
> list instance while deserialization:
>
> java.io.IOException: java.lang.UnsupportedOperationException
>         at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
>         at
> org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)
>         ...
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.UnsupportedOperationException
>         at
> com.google.common.collect.ImmutableCollection.add(ImmutableCollection.java:91)
>         at
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
>         at
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>         ...
>
>   It somehow makes sense. But I cannot think of a workaround and I do not
> believe that using ImmutableList with RDD is not possible. How this is
> solved?
>
>   Thank you in advance!
>
>    Jakub Dubovsky
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message