hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sahil Takiar <takiar.sa...@gmail.com>
Subject Re: hive on spark - why is it so hard?
Date Tue, 26 Sep 2017 21:44:05 GMT
Hey Stephen,

Can you send the full stack trace for the NoClassDefFoundError? For Hive
2.3.0, we only support Spark 2.0.0. Hive may work with more recent versions
of Spark, but we only test with Spark 2.0.0.

--Sahil

On Tue, Sep 26, 2017 at 2:35 PM, Stephen Sprague <spragues@gmail.com> wrote:

> * i've installed hive 2.3 and spark 2.2
>
> * i've read this doc plenty of times -> https://cwiki.apache.org/
> confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
>
> * i run this query:
>
>    hive --hiveconf hive.root.logger=DEBUG,console -e 'set
> hive.execution.engine=spark; select date_key, count(*) from
> fe_inventory.merged_properties_hist group by 1 order by 1;'
>
>
> * i get this error:
>
> *   Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/scheduler/SparkListenerInterface*
>
>
> * this class in:
>   /usr/lib/spark-2.2.0-bin-hadoop2.6/jars/spark-core_2.11-2.2.0.jar
>
> * i have copied all the spark jars to hdfs://dwrdevnn1/spark-2.2-jars
>
> * i have updated hive-site.xml to set spark.yarn.jars to it.
>
> * i see this is the console:
>
> 2017-09-26T13:34:15,505  INFO [334aa7db-ad0c-48c3-9ada-467aaf05cff3 main]
> spark.HiveSparkClientFactory: load spark property from hive configuration
> (spark.yarn.jars -> hdfs://dwrdevnn1.sv2.trulia.com:8020/spark-2.2-jars/*
> ).
>
> * i see this on the console
>
> 2017-09-26T14:04:45,678  INFO [4cb82b6d-9568-4518-8e00-f0cf7ac58cd3 main]
> client.SparkClientImpl: Running client driver with argv:
> /usr/lib/spark-2.2.0-bin-hadoop2.6/bin/spark-submit --properties-file
> /tmp/spark-submit.6105784757200912217.properties --class
> org.apache.hive.spark.client.RemoteDriver /usr/lib/apache-hive-2.3.0-bin/lib/hive-exec-2.3.0.jar
> --remote-host dwrdevnn1.sv2.trulia.com --remote-port 53393 --conf
> hive.spark.client.connect.timeout=1000 --conf hive.spark.client.server.connect.timeout=90000
> --conf hive.spark.client.channel.log.level=null --conf
> hive.spark.client.rpc.max.size=52428800 --conf
> hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256
> --conf hive.spark.client.rpc.server.address=null
>
> * i even print out CLASSPATH in this script: /usr/lib/spark-2.2.0-bin-
> hadoop2.6/bin/spark-submit
>
> and /usr/lib/spark-2.2.0-bin-hadoop2.6/jars/spark-core_2.11-2.2.0.jar is
> in it.
>
> ​so i ask... what am i missing?
>
> thanks,
> Stephen​
>
>
>
>
>
>


-- 
Sahil Takiar
Software Engineer at Cloudera
takiar.sahil@gmail.com | (510) 673-0309

Mime
View raw message