Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6C06917502 for ; Mon, 29 Sep 2014 15:41:43 +0000 (UTC) Received: (qmail 92130 invoked by uid 500); 29 Sep 2014 15:41:41 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 92056 invoked by uid 500); 29 Sep 2014 15:41:41 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 92045 invoked by uid 99); 29 Sep 2014 15:41:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2014 15:41:41 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mcgloin.patrick@gmail.com designates 209.85.220.173 as permitted sender) Received: from [209.85.220.173] (HELO mail-vc0-f173.google.com) (209.85.220.173) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2014 15:41:15 +0000 Received: by mail-vc0-f173.google.com with SMTP id ij19so1075564vcb.18 for ; Mon, 29 Sep 2014 08:41:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=wNNL8Kn8n7I7WCyiRdsPD363YM+2pMh6XTyXZnWZMBU=; b=tB7JeYpBYJLETeUTpjXOJGWUU/R79HvNZP/fbRWq3thdK7jlS55S+Rr4MTGmKY9XVW UFWbIhj/7lgZ59wA94Nsud7y3Ml1bnUQyUGk4pb1tHtvPl3qKxKqOLO37l4Jr+1NbOdU kAP95UgDcODaMnb9Xx2W4NQQSU4g6HGWKNR6GaJOR54Wly5dK1U6C5MbxeIgsdh9xdrE EQe6dTXDLFUJfJGjTsDBIa5IXIMZ4snX6j3vKgdjuu04K7+iEi/bN8oouLiX3lz9bhIW GZ4N0Z8aDhOO5Jkx/lGsm983Dpfoj39aknlVn9y1o3Qvpv7U6pca1V5ogefWv2PaFC8f E30A== MIME-Version: 1.0 X-Received: by 10.52.115.225 with SMTP id jr1mr10723769vdb.27.1412005274066; Mon, 29 Sep 2014 08:41:14 -0700 (PDT) Received: by 10.220.16.74 with HTTP; Mon, 29 Sep 2014 08:41:14 -0700 (PDT) Date: Mon, 29 Sep 2014 17:41:14 +0200 Message-ID: Subject: Spark SQL + Hive + JobConf NoClassDefFoundError From: Patrick McGloin To: "user@spark.apache.org" Content-Type: multipart/alternative; boundary=bcaec547c5e5c829eb0504361827 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec547c5e5c829eb0504361827 Content-Type: text/plain; charset=UTF-8 Hi, I have an error when submitting a Spark SQL application to our Spark cluster: 14/09/29 16:02:11 WARN scheduler.TaskSetManager: Loss was due to java.lang.NoClassDefFoundError *java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf* at org.apache.spark.sql.hive.SparkHiveHadoopWriter.setIDs(SparkHadoopWriter.scala:169) at org.apache.spark.sql.hive.SparkHiveHadoopWriter.setup(SparkHadoopWriter.scala:69) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org $apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(hiveOperators.scala:260) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I assume this is because the Executor does not have the hadoop-core.jar file. I've tried adding it to the SparkContext using addJar but this didn't help. I also see that the documentation says you must rebuild Spark if you want to use Hive. https://spark.apache.org/docs/1.0.2/sql-programming-guide.html#hive-tables Is this really true or can we just package the jar files with the Spark Application we build? Rebuilding Spark itself isn't possible for us as it is installed on a VM without internet access and we are using the Cloudera distribution (Spark 1.0). Is it possible to assemble the Hive dependencies into our Spark Application and submit this to the cluster? I've tried to do this with spark-submit (and the Hadoop JobConf class is in AAC-assembly-1.0.jar) but the Executor doesn't find the class. Here is the command: sudo ./spark-submit --class aac.main.SparkDriver --master spark://localhost:7077 --jars AAC-assembly-1.0.jar aacApp_2.10-1.0.jar Any pointers would be appreciated! Best regards, Patrick --bcaec547c5e5c829eb0504361827 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

I have an error when submitti= ng a Spark SQL application to our Spark cluster:

14= /09/29 16:02:11 WARN scheduler.TaskSetManager: Loss was due to java.lang.No= ClassDefFoundError
java.lang.NoClassDefFoundError: org/apache/= hadoop/mapred/JobConf
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apac= he.spark.sql.hive.SparkHiveHadoopWriter.setIDs(SparkHadoopWriter.scala:169)=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.sql.hive.SparkHi= veHadoopWriter.setup(SparkHadoopWriter.scala:69)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at org.apache.spark.sql.hive.execution.InsertIntoHiveTabl= e.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$= 1(hiveOperators.scala:260)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apa= che.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.= apply(hiveOperators.scala:274)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org= .apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFil= e$1.apply(hiveOperators.scala:274)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at= org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
<= div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.scheduler.Task.run(Task= .scala:51)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.execut= or.Executor$TaskRunner.run(Executor.scala:187)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolE= xecutor.java:1145)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concu= rrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.Thread.run(Thread.java:745)
<= /div>

I assume this is because the Executor does not hav= e the hadoop-core.jar file.=C2=A0 I've tried adding it to the SparkCont= ext using addJar but this didn't help.

I also = see that the documentation says you must rebuild Spark if you want to use H= ive. =C2=A0


Is this really true or can we just package the jar files with the Sp= ark Application we build?=C2=A0 Rebuilding Spark itself isn't possible = for us as it is installed on a VM without internet access and we are using = the Cloudera distribution (Spark 1.0).

Is it possi= ble to assemble the Hive dependencies into our Spark Application and submit= this to the cluster?=C2=A0 I've tried to do this with spark-submit (an= d the Hadoop JobConf class is in AAC-assembly-1.0.jar) but the Executor doe= sn't find the class.=C2=A0 Here is the command:

sudo ./spark-submit --class aac.main.SparkDriver --master spark://localho= st:7077 --jars AAC-assembly-1.0.jar aacApp_2.10-1.0.jar

<= /div>
Any pointers would be appreciated!

Best = regards,
Patrick


--bcaec547c5e5c829eb0504361827--