Mailing-List: contact user-help@spark.apache.org; run by ezmlm
Precedence: bulk
Received-SPF: pass (nike.apache.org: domain of mcgloin.patrick@gmail.com
 designates 209.85.220.173 as permitted sender)
MIME-Version: 1.0
Date: Mon, 29 Sep 2014 17:41:14 +0200
Message-ID: 
 <CAMN8OaeB=MGFEPzMshot=p_7Vo4qsjHFYJT0=YNXWY4G=efMDA@mail.gmail.com>
Subject: Spark SQL + Hive + JobConf NoClassDefFoundError
From: Patrick McGloin <mcgloin.patrick@gmail.com>
To: "user@spark.apache.org" <user@spark.apache.org>
Content-Type: multipart/alternative; boundary=bcaec547c5e5c829eb0504361827

--bcaec547c5e5c829eb0504361827
Content-Type: text/plain; charset=UTF-8

Hi,

I have an error when submitting a Spark SQL application to our Spark
cluster:

14/09/29 16:02:11 WARN scheduler.TaskSetManager: Loss was due to
java.lang.NoClassDefFoundError
*java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf*
        at
org.apache.spark.sql.hive.SparkHiveHadoopWriter.setIDs(SparkHadoopWriter.scala:169)
        at
org.apache.spark.sql.hive.SparkHiveHadoopWriter.setup(SparkHadoopWriter.scala:69)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org
$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(hiveOperators.scala:260)
        at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274)
        at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274)
        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        at org.apache.spark.scheduler.Task.run(Task.scala:51)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

I assume this is because the Executor does not have the hadoop-core.jar
file.  I've tried adding it to the SparkContext using addJar but this
didn't help.

I also see that the documentation says you must rebuild Spark if you want
to use Hive.

https://spark.apache.org/docs/1.0.2/sql-programming-guide.html#hive-tables

Is this really true or can we just package the jar files with the Spark
Application we build?  Rebuilding Spark itself isn't possible for us as it
is installed on a VM without internet access and we are using the Cloudera
distribution (Spark 1.0).

Is it possible to assemble the Hive dependencies into our Spark Application
and submit this to the cluster?  I've tried to do this with spark-submit
(and the Hadoop JobConf class is in AAC-assembly-1.0.jar) but the Executor
doesn't find the class.  Here is the command:

sudo ./spark-submit --class aac.main.SparkDriver --master
spark://localhost:7077 --jars AAC-assembly-1.0.jar aacApp_2.10-1.0.jar

Any pointers would be appreciated!

Best regards,
Patrick

--bcaec547c5e5c829eb0504361827
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hi,</div><div><br></div>I have an error when submitti=
ng a Spark SQL application to our Spark cluster:<div><br></div><div><div>14=
/09/29 16:02:11 WARN scheduler.TaskSetManager: Loss was due to java.lang.No=
ClassDefFoundError</div><div><b>java.lang.NoClassDefFoundError: org/apache/=
hadoop/mapred/JobConf</b></div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apac=
he.spark.sql.hive.SparkHiveHadoopWriter.setIDs(SparkHadoopWriter.scala:169)=
</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.sql.hive.SparkHi=
veHadoopWriter.setup(SparkHadoopWriter.scala:69)</div><div>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 at <a href=3D"http://org.apache.spark.sql.hive.execution.Inse=
rtIntoHiveTable.org">org.apache.spark.sql.hive.execution.InsertIntoHiveTabl=
e.org</a>$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$=
1(hiveOperators.scala:260)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apa=
che.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.=
apply(hiveOperators.scala:274)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org=
.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFil=
e$1.apply(hiveOperators.scala:274)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at=
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)</div><=
div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.scheduler.Task.run(Task=
.scala:51)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.execut=
or.Executor$TaskRunner.run(Executor.scala:187)</div><div>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolE=
xecutor.java:1145)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concu=
rrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)</div><div>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.Thread.run(Thread.java:745)</div><=
/div><div><br></div><div>I assume this is because the Executor does not hav=
e the hadoop-core.jar file.=C2=A0 I&#39;ve tried adding it to the SparkCont=
ext using addJar but this didn&#39;t help.</div><div><br></div><div>I also =
see that the documentation says you must rebuild Spark if you want to use H=
ive. =C2=A0</div><div><br></div><div><a href=3D"https://spark.apache.org/do=
cs/1.0.2/sql-programming-guide.html#hive-tables">https://spark.apache.org/d=
ocs/1.0.2/sql-programming-guide.html#hive-tables</a><br></div><div><br></di=
v><div>Is this really true or can we just package the jar files with the Sp=
ark Application we build?=C2=A0 Rebuilding Spark itself isn&#39;t possible =
for us as it is installed on a VM without internet access and we are using =
the Cloudera distribution (Spark 1.0).</div><div><br></div><div>Is it possi=
ble to assemble the Hive dependencies into our Spark Application and submit=
 this to the cluster?=C2=A0 I&#39;ve tried to do this with spark-submit (an=
d the Hadoop JobConf class is in AAC-assembly-1.0.jar) but the Executor doe=
sn&#39;t find the class.=C2=A0 Here is the command:</div><div><br></div><di=
v>sudo ./spark-submit --class aac.main.SparkDriver --master spark://localho=
st:7077 --jars AAC-assembly-1.0.jar aacApp_2.10-1.0.jar<br></div><div><br><=
/div><div>Any pointers would be appreciated!</div><div><br></div><div>Best =
regards,</div><div>Patrick</div><div><br></div><div><br></div></div>

--bcaec547c5e5c829eb0504361827--