spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Coveney <jcove...@gmail.com>
Subject Re: RDD of ImmutableList
Date Tue, 06 Oct 2015 23:29:16 GMT
Nobody is saying not to use immutable data structures, only that guava's
aren't natively supported.

Scala's default collections library is all immutable. list, Vector, Map.
This is what people generally use, especially in scala code!

El martes, 6 de octubre de 2015, Jakub Dubovsky <
spark.dubovsky.jakub@seznam.cz> escribió:

> Thank you for quick reaction.
>
> I have to say this is very surprising to me. I never received an advice to
> stop using an immutable approach. Whole RDD is designed to be immutable
> (which is sort of sabotaged by not being able to (de)serialize immutable
> classes properly). I will ask on dev list if this is to be changed or not.
>
> Ok, I have let go initial feelings and now let's be pragmatic. And this is
> still for everyone not just Igor:
>
> I use a class from a library which is immutable. Now I want to use this
> class to represent my data in RDD because this saves me a huge amount of
> work. The class uses ImmutableList as one of its fields. That's why it
> fails. But isn't there a way to workaround this? I ask this because I
> have exactly zero knowledge about kryo and the way how it works. So for
> example would some of these two work?
>
> 1) Change the external class so that it implements writeObject, readObject
> methods (it's java). Will these methods be used by kryo? (I can ask the
> maintainers of a library to change the class if the change is reasonable.
> Adding these methods would be while dropping immutability certainly
> wouldn't)
>
> 2) Wrap the class to scala class which would translate the data during
> (de)serialization?
>
>   Thanks!
>   Jakub Dubovsky
>
> ---------- Původní zpráva ----------
> Od: Igor Berman <igor.berman@gmail.com
> <javascript:_e(%7B%7D,'cvml','igor.berman@gmail.com');>>
> Komu: Jakub Dubovsky <spark.dubovsky.jakub@seznam.cz
> <javascript:_e(%7B%7D,'cvml','spark.dubovsky.jakub@seznam.cz');>>
> Datum: 5. 10. 2015 20:11:35
> Předmět: Re: RDD of ImmutableList
>
> kryo doesn't support guava's collections by default
> I remember encountered project in github that fixes this(not sure though).
> I've ended to stop using guava collections as soon as spark rdds are
> concerned.
>
> On 5 October 2015 at 21:04, Jakub Dubovsky <spark.dubovsky.jakub@seznam.cz
> <javascript:_e(%7B%7D,'cvml','spark.dubovsky.jakub@seznam.cz');>> wrote:
>
> Hi all,
>
>   I would like to have an advice on how to use ImmutableList with RDD. Small
> presentation of an essence of my problem in spark-shell with guava jar
> added:
>
> scala> import com.google.common.collect.ImmutableList
> import com.google.common.collect.ImmutableList
>
> scala> val arr = Array(ImmutableList.of(1,2), ImmutableList.of(2,4),
> ImmutableList.of(3,6))
> arr: Array[com.google.common.collect.ImmutableList[Int]] = Array([1, 2],
> [2, 4], [3, 6])
>
> scala> val rdd = sc.parallelize(arr)
> rdd:
> org.apache.spark.rdd.RDD[com.google.common.collect.ImmutableList[Int]] =
> ParallelCollectionRDD[0] at parallelize at <console>:24
>
> scala> rdd.count
>
>  This results in kryo exception saying that it cannot add a new element to
> list instance while deserialization:
>
> java.io.IOException: java.lang.UnsupportedOperationException
>         at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
>         at
> org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)
>         ...
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.UnsupportedOperationException
>         at
> com.google.common.collect.ImmutableCollection.add(ImmutableCollection.java:91)
>         at
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
>         at
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>         ...
>
>   It somehow makes sense. But I cannot think of a workaround and I do not
> believe that using ImmutableList with RDD is not possible. How this is
> solved?
>
>   Thank you in advance!
>
>    Jakub Dubovsky
>
>
>

Mime
View raw message