spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x
Date Tue, 04 Apr 2017 20:28:41 GMT

    [ https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955747#comment-15955747
] 

Prasanth commented on SPARK-18737:
----------------------------------

We are using 2.0.1. Can you tell us how to "disable the Kryo auto-pick for streaming from
the Java API" as a workaround?

> Serialization setting "spark.serializer" ignored in Spark 2.x
> -------------------------------------------------------------
>
>                 Key: SPARK-18737
>                 URL: https://issues.apache.org/jira/browse/SPARK-18737
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 2.0.1
>            Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 77, ip-10-121-14-147.eu-central-1.compute.internal,
partition 1, RACK_LOCAL, 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching task 77 on
executor id: 2 hostname: ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory on ip-10-121-14-147.eu-central-1.compute.internal:45059
(size: 879.0 B, free: 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, ip-10-121-14-147.eu-central-1.compute.internal):
com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 13994
>         at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
>         at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>         at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
>         at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
>         at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
>         at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>         at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
>         at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>         at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>         at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>         at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>         at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
>         at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>         at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
>         at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>         at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
>         at org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
>         at org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
>         at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
>         at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>         at org.apache.spark.scheduler.Task.run(Task.scala:86)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 2.0.1, we see
the Kyro deserialization exception and over time the Spark streaming job stops processing
since too many tasks failed.
> Our action was to use conf.set("spark.serializer", "org.apache.spark.serializer.JavaSerializer")
and to disable Kryo class registration with conf.set("spark.kryo.registrationRequired", false).
We hope to identify the root cause of the exception. 
> However, setting the serializer to JavaSerializer is oviously ignored by the Spark-internals.
Despite the setting we still see the exception printed in the log and tasks fail. The occurence
seems to be non-deterministic, but to become more frequent over time.
> Several questions we could not answer during our troubleshooting:
> 1. How can the debug log for Kryo be enabled? -- We tried following the minilog documentation,
but no output can be found.
> 2. Is the serializer setting effective for Spark internal serializations? How can the
JavaSerialize be forced on internal serializations for worker to driver communication?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message