spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Palamuttam <>
Subject Serialization troubles with mutable.LinkedHashMap
Date Tue, 23 Aug 2016 17:51:31 GMT

I initially send this on the user mailing list, however I didn't get any
I figured this could be a bug so it might of more concern to the dev-list.

I recently switched to using kryo serialization and I've been running into
with the mutable.LinkedHashMap class.

If I don't register the mutable.LinkedHashMap class then I get an
ArrayStoreException seen below.
If I do register the class, then when the LinkedHashMap is collected on the
driver, it does not contain any elements.

Here is the snippet of code I used :

val sc = new SparkContext(new SparkConf()
  .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
  .registerKryoClasses(Array(classOf[mutable.LinkedHashMap[String, String]])))

val collect = sc.parallelize(0 to 10)
  .map(p => new mutable.LinkedHashMap[String, String]() ++=
Array(("hello", "bonjour"), ("good", "bueno")))

val mapSideSizes = => p.size).collect()(0)
val driverSideSizes = collect.collect()(0).size

println("The sizes before collect : " + mapSideSizes)
println("The sizes after collect : " + driverSideSizes)

** The following only occurs if I did not register the
mutable.LinkedHashMap class **
16/08/20 18:10:38 ERROR TaskResultGetter: Exception while getting task
java.lang.ArrayStoreException: scala.collection.mutable.HashMap
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$
at com.esotericsoftware.kryo.Kryo.readClassAndObject(
at org.apache.spark.serializer.KryoSerializerInstance.
at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:97)
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(
at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
at org.apache.spark.scheduler.TaskResultGetter$$anon$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$

I hope this is a known issue and/or I'm missing something important in my
Appreciate any help or advice!

As a bit of background this was encountered in the SciSpark project being
developed at NASA JPL.
The mutable.LinkedHashMap is necessary as it enables us to deal with Netcdf
attributes in the order they appear in the original Netcdf files.
The test case I posted above was just to show the error I'm seeing more
Our actual use case is slightly different, but we see the same result
(empty HashMaps)..

Rahul Palamuttam

View raw message