beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aviem Zur (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-1145) Remove classifier from shaded spark runner artifact
Date Tue, 13 Dec 2016 15:19:58 GMT
Aviem Zur created BEAM-1145:
-------------------------------

             Summary: Remove classifier from shaded spark runner artifact
                 Key: BEAM-1145
                 URL: https://issues.apache.org/jira/browse/BEAM-1145
             Project: Beam
          Issue Type: Improvement
          Components: runner-spark
            Reporter: Aviem Zur
            Assignee: Amit Sela


Shade plugin configured in spark runner's pom adds a classifier to spark runner shaded jar
{code:xml}
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>spark-app</shadedClassifierName>
{code}
This means, that in order for a user application that is dependent on spark-runner to work
in cluster mode, they have to add the classifier in their dependency declaration, like so:
{code:xml}
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-spark</artifactId>
            <version>0.4.0-incubating-SNAPSHOT</version>
            <classifier>spark-app</classifier>
        </dependency>
{code}
Otherwise, if they do not specify classifier, the jar they get is unshaded, which in cluster
mode, causes collisions between different guava versions.
Example exception in cluster mode when adding the dependency without classifier:
{code}
16/12/12 06:58:56 WARN TaskSetManager: Lost task 4.0 in stage 8.0 (TID 153, lvsriskng02.lvs.paypal.com):
java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;
	at org.apache.beam.runners.spark.stateful.StateSpecFunctions$1.apply(StateSpecFunctions.java:137)
	at org.apache.beam.runners.spark.stateful.StateSpecFunctions$1.apply(StateSpecFunctions.java:98)
	at org.apache.spark.streaming.StateSpec$$anonfun$1.apply(StateSpec.scala:180)
	at org.apache.spark.streaming.StateSpec$$anonfun$1.apply(StateSpec.scala:179)
	at org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:57)
	at org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55)
	at org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:155)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:275)
	at org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:149)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:275)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:275)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{code}
I would suggest that the classifier be removed from the shaded jar, to avoid confusion among
users, and have a better user experience.

P.S. Looks like Dataflow runner does not add a classifier to its shaded jar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message