Have you tried the following options ?

--conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true


On Mon, Oct 19, 2015 at 5:07 AM, YiZhi Liu <javelinjs@gmail.com> wrote:
I'm trying to read a Thrift object from SequenceFile, using
elephant-bird's ThriftWritable. My code looks like

val rawData = sc.sequenceFile[BooleanWritable,
val samples = rawData.map { case (key, value) => {
  val conversion = if (key.get) 1 else 0
  val sample = value.get
  (conversion, sample)

When I spark-submit in local mode, it failed with

(Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times,
most recent failure: Lost task 0.0 in stage 1.0 (TID 2, localhost):
... ...

I'm pretty sure it is caused by the conflict of libthrift, I use
thrift-0.6.1 while spark uses 0.9.2, which requires TUnion object to
implement the abstract 'standardSchemeReadValue' method.

But when I set spark.files.userClassPathFirst=true, it failed even earlier:

(Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times,
most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost):
java.lang.ClassCastException: cannot assign instance of scala.None$ to
field org.apache.spark.scheduler.Task.metrics of type scala.Option in
instance of org.apache.spark.scheduler.ResultTask
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

It seems I introduced more conflict, but I couldn't figure out which
one caused this failure.

Interestingly, when I ran mvn test in my project, which test spark job
in locally mode, all worked fine.

So what is the right way to take user jars precedence over Spark jars?

Yizhi Liu
Senior Software Engineer / Data Mining
www.mvad.com, Shanghai, China

