beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "JC (JIRA)" <>
Subject [jira] [Created] (BEAM-4597) Serialization problem using SparkRunner and KryoSerializer from spark
Date Wed, 20 Jun 2018 15:32:00 GMT
JC created BEAM-4597:

             Summary: Serialization problem using SparkRunner and KryoSerializer from spark
                 Key: BEAM-4597
             Project: Beam
          Issue Type: Bug
          Components: runner-spark
    Affects Versions: 2.4.0
            Reporter: JC
            Assignee: Amit Sela

When using the SparkRunner and specifying Spark to use the 'KryoSerializer' as:
{quote}spark-submit --class org.apache.beam.examples.BugWithKryoOnSpark --master yarn --deploy-mode
client --conf spark.serializer=org.apache.spark.serializer.KryoSerializer /tmp/kafka-sdk-beam-example-bundled-0.1.jar
We get an exception after 10 or 15 seconds:
{quote}Exception in thread "main" java.lang.RuntimeException: org.apache.spark.SparkException:
Job aborted due to stage failure: Exception while getting task result: com.esotericsoftware.kryo.KryoException:
Unable to find class: org.apache.beam.runners.core.metrics.MetricsContainerImpl$$Lambda$31/1875283985
Serialization trace:
factory (org.apache.beam.runners.core.metrics.MetricsMap)
counters (org.apache.beam.runners.core.metrics.MetricsContainerImpl)
metricsContainers (org.apache.beam.runners.core.metrics.MetricsContainerStepMap)
metricsContainers ($Metadata)
 at org.apache.beam.runners.spark.SparkPipelineResult.runtimeExceptionFrom(
 at org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(
 at org.apache.beam.runners.spark.SparkPipelineResult.access$000(
 at org.apache.beam.runners.spark.SparkPipelineResult$StreamingMode.stop(
 at org.apache.beam.runners.spark.SparkPipelineResult.offerNewState(
 at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(
 at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(
 at org.apache.beam.examples.BugWithKryoOnSpark.main(
But when using the SparkRunner and specifying Spark to use the 'JavaSerializer' as:
{quote}spark-submit --class org.apache.beam.examples.BugWithKryoOnSpark --master yarn --deploy-mode
client --conf spark.serializer=org.apache.spark.serializer.JavaSerializer /tmp/kafka-sdk-beam-example-bundled-0.1.jar
The pipeline works correctly.

Our deployment consist of (CDH 5.14.2, Parcels) and Spark2

spark-submit --version
Welcome to
 ____ __
 / __/__ ___ _____/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /___/ .__/\_,_/_/ /_/\_\ version 2.3.0.cloudera2
Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_151
Branch HEAD
Compiled by user jenkins on 2018-04-10T23:08:17Z
Revision 9f5baab06f127486a030024877fc13a3992f100f
Url git://
Type --help for more information.

I have attached a sample maven project which read data from kafka (localhost) and just produce
an echo of the incoming data to reproduce this bug, please refer to the README for the full
Stacktrace and information of how to build the sample


This message was sent by Atlassian JIRA

View raw message