spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick McGloin <mcgloin.patr...@gmail.com>
Subject Spark SQL + Hive + JobConf NoClassDefFoundError
Date Mon, 29 Sep 2014 15:41:14 GMT
Hi,

I have an error when submitting a Spark SQL application to our Spark
cluster:

14/09/29 16:02:11 WARN scheduler.TaskSetManager: Loss was due to
java.lang.NoClassDefFoundError
*java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf*
        at
org.apache.spark.sql.hive.SparkHiveHadoopWriter.setIDs(SparkHadoopWriter.scala:169)
        at
org.apache.spark.sql.hive.SparkHiveHadoopWriter.setup(SparkHadoopWriter.scala:69)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org
$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(hiveOperators.scala:260)
        at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274)
        at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274)
        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        at org.apache.spark.scheduler.Task.run(Task.scala:51)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

I assume this is because the Executor does not have the hadoop-core.jar
file.  I've tried adding it to the SparkContext using addJar but this
didn't help.

I also see that the documentation says you must rebuild Spark if you want
to use Hive.

https://spark.apache.org/docs/1.0.2/sql-programming-guide.html#hive-tables

Is this really true or can we just package the jar files with the Spark
Application we build?  Rebuilding Spark itself isn't possible for us as it
is installed on a VM without internet access and we are using the Cloudera
distribution (Spark 1.0).

Is it possible to assemble the Hive dependencies into our Spark Application
and submit this to the cluster?  I've tried to do this with spark-submit
(and the Hadoop JobConf class is in AAC-assembly-1.0.jar) but the Executor
doesn't find the class.  Here is the command:

sudo ./spark-submit --class aac.main.SparkDriver --master
spark://localhost:7077 --jars AAC-assembly-1.0.jar aacApp_2.10-1.0.jar

Any pointers would be appreciated!

Best regards,
Patrick

Mime
View raw message