Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id F129B200D0E for ; Tue, 26 Sep 2017 23:44:30 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id EF97B1609D7; Tue, 26 Sep 2017 21:44:30 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E6DFC1609C4 for ; Tue, 26 Sep 2017 23:44:29 +0200 (CEST) Received: (qmail 97100 invoked by uid 500); 26 Sep 2017 21:44:28 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 97090 invoked by uid 99); 26 Sep 2017 21:44:28 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Sep 2017 21:44:28 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 22E981814A2 for ; Tue, 26 Sep 2017 21:44:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.88 X-Spam-Level: * X-Spam-Status: No, score=1.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id sD0sdSrrq6hu for ; Tue, 26 Sep 2017 21:44:26 +0000 (UTC) Received: from mail-qt0-f180.google.com (mail-qt0-f180.google.com [209.85.216.180]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 29A485FAF3 for ; Tue, 26 Sep 2017 21:44:26 +0000 (UTC) Received: by mail-qt0-f180.google.com with SMTP id o13so11884085qtf.1 for ; Tue, 26 Sep 2017 14:44:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=WBu/j31LZwgFrdjv/qoxHVPg5VVbG2pT9eFkv+O+Tzk=; b=JAr2W/kU0jOxgycD47I9kQNbde5OqjMVYr9jUspwP5qRXBC2ofB7niyRaOZ98QZnm/ iFahD4x3YP1T8vFHDFl1TIfb3Wdznnmt2d/YDfBbPaBxiilsDZoA159GQsLeuw6x77qx pP6qTbg1YsuYsyNFy9EVD0i8tTxhWVfHuVJvt9peTt9mG1O7aGWn5jDM39tdaQ1skqC8 RMivDOeDpuy1+bkXqD/LESvy1R8OdbuFvkEJG29ZrP044/ztcB7Crz1BbRjWSDZQ2mbe P8VgQF5BYDIwu+aF6r8FX8T7T75H9dnHwE8QD/4pVTJk00Cg5NZnRrjscvJPm7HJB8ue 6s4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=WBu/j31LZwgFrdjv/qoxHVPg5VVbG2pT9eFkv+O+Tzk=; b=sDh9mW5UZRzYqKbr4LzPVG5SrgXTWzB83qE8Jq4CMZUpWWb33VbKWCZ+CTHy6moV1R 44CPFp8NvEBuJY2MxTgPlVFzrXpHzgFH6LxCuoZrRURgR/oE9/zC4bFStpN0aBr62IKQ u2cg7RcMQVOy7vyQ1NmUzY7qeV4Q1f5v5FaND5IPngXNE8cnVOI3Y3fR1RE9n2j/0wB8 RwUXt7096iQ5y071dfSqr5eSuAwu9WTEy379meS9qM+xUru4RjZFbNTcPKZeoBPGxWNE kST18H9YmThBSixZmR/4dFKDIK03sQEaqjZZd2VVdOLJJPwk/lKC3jx4VudbXh2IpuVk 5lxw== X-Gm-Message-State: AHPjjUhHwmYFT6lOpAv8rQ3SYu/bSBt3hhNM8BT1V02bwMn67T/DDDDV uyIDLVcH9Nsw6845nj3PPrDEOYKgAUgb0cGI952qJw== X-Google-Smtp-Source: AOwi7QDZkTlO6CEmNpmqzGq6TmsY2az/htjBKLYA1E9yIUa3TXcLnmJlc7W7o0q2qfVuD0kwZ9QTq1rcp2hb/rz+/D8= X-Received: by 10.200.51.211 with SMTP id d19mr17111978qtb.227.1506462265598; Tue, 26 Sep 2017 14:44:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.12.136.65 with HTTP; Tue, 26 Sep 2017 14:44:05 -0700 (PDT) In-Reply-To: References: From: Sahil Takiar Date: Tue, 26 Sep 2017 14:44:05 -0700 Message-ID: Subject: Re: hive on spark - why is it so hard? To: user@hive.apache.org Content-Type: multipart/alternative; boundary="001a113bacfc3566f8055a1e94ad" archived-at: Tue, 26 Sep 2017 21:44:31 -0000 --001a113bacfc3566f8055a1e94ad Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey Stephen, Can you send the full stack trace for the NoClassDefFoundError? For Hive 2.3.0, we only support Spark 2.0.0. Hive may work with more recent versions of Spark, but we only test with Spark 2.0.0. --Sahil On Tue, Sep 26, 2017 at 2:35 PM, Stephen Sprague wrote= : > * i've installed hive 2.3 and spark 2.2 > > * i've read this doc plenty of times -> https://cwiki.apache.org/ > confluence/display/Hive/Hive+on+Spark%3A+Getting+Started > > * i run this query: > > hive --hiveconf hive.root.logger=3DDEBUG,console -e 'set > hive.execution.engine=3Dspark; select date_key, count(*) from > fe_inventory.merged_properties_hist group by 1 order by 1;' > > > * i get this error: > > * Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/spark/scheduler/SparkListenerInterface* > > > * this class in: > /usr/lib/spark-2.2.0-bin-hadoop2.6/jars/spark-core_2.11-2.2.0.jar > > * i have copied all the spark jars to hdfs://dwrdevnn1/spark-2.2-jars > > * i have updated hive-site.xml to set spark.yarn.jars to it. > > * i see this is the console: > > 2017-09-26T13:34:15,505 INFO [334aa7db-ad0c-48c3-9ada-467aaf05cff3 main] > spark.HiveSparkClientFactory: load spark property from hive configuration > (spark.yarn.jars -> hdfs://dwrdevnn1.sv2.trulia.com:8020/spark-2.2-jars/* > ). > > * i see this on the console > > 2017-09-26T14:04:45,678 INFO [4cb82b6d-9568-4518-8e00-f0cf7ac58cd3 main] > client.SparkClientImpl: Running client driver with argv: > /usr/lib/spark-2.2.0-bin-hadoop2.6/bin/spark-submit --properties-file > /tmp/spark-submit.6105784757200912217.properties --class > org.apache.hive.spark.client.RemoteDriver /usr/lib/apache-hive-2.3.0-bin/= lib/hive-exec-2.3.0.jar > --remote-host dwrdevnn1.sv2.trulia.com --remote-port 53393 --conf > hive.spark.client.connect.timeout=3D1000 --conf hive.spark.client.server.= connect.timeout=3D90000 > --conf hive.spark.client.channel.log.level=3Dnull --conf > hive.spark.client.rpc.max.size=3D52428800 --conf > hive.spark.client.rpc.threads=3D8 --conf hive.spark.client.secret.bits=3D= 256 > --conf hive.spark.client.rpc.server.address=3Dnull > > * i even print out CLASSPATH in this script: /usr/lib/spark-2.2.0-bin- > hadoop2.6/bin/spark-submit > > and /usr/lib/spark-2.2.0-bin-hadoop2.6/jars/spark-core_2.11-2.2.0.jar is > in it. > > =E2=80=8Bso i ask... what am i missing? > > thanks, > Stephen=E2=80=8B > > > > > > --=20 Sahil Takiar Software Engineer at Cloudera takiar.sahil@gmail.com | (510) 673-0309 --001a113bacfc3566f8055a1e94ad Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hey Stephen,

Can you send the full stac= k trace for the NoClassDefFoundError? For Hive 2.3.0, we only support Spark= 2.0.0. Hive may work with more recent versions of Spark, but we only test = with Spark 2.0.0.

--Sahil

On Tue, Sep 26, 2017 at 2:35 P= M, Stephen Sprague <spragues@gmail.com> wrote:
* i've installed hive 2.3 and sp= ark 2.2


* i run this= query:

=C2=A0=C2=A0 hive --hiveconf hive.root.logger=3DDEBUG,co= nsole -e 'set hive.execution.engine=3Dspark; select date_key, count(*) = from fe_inventory.merged_properties_hist group by 1 order by 1;'

* i get this error:

=C2=A0=C2=A0 Exception = in thread "main" java.lang.NoClassDefFoundError: org/apache/= spark/scheduler/SparkListenerInterface


* this c= lass in:
=C2=A0 /usr/lib/spark-2.2.0-bin-hadoop2.6/jars/spark-core_= 2.11-2.2.0.jar

* i have copied all the spark jars to hdfs:/= /dwrdevnn1/spark-2.2-jars

* i have updated hive-site.xml to= set spark.yarn.jars to it.

* i see this is the console:

=
2017-09-26T13:34:15,505=C2=A0 INFO [334aa7db-ad0c-48c3-9ada-467aaf= 05cff3 main] spark.HiveSparkClientFactory: load spark property from hive co= nfiguration (spark.yarn.jars -> hdfs://dwrdevnn1.sv2.trulia.com:8020/spark-2.2-jars/*).

* i see this on the console<= br>

2017-09-26T14:04:45,678=C2=A0 INFO [4cb82b6d-9568-4518-8e00-= f0cf7ac58cd3 main] client.SparkClientImpl: Running client driver with = argv: /usr/lib/spark-2.2.0-bin-hadoop2.6/bin/spark-submit --properties= -file /tmp/spark-submit.6105784757200912217.properties --class org.apa= che.hive.spark.client.RemoteDriver /usr/lib/apache-hive-2.3.0-bin= /lib/hive-exec-2.3.0.jar --remote-host dwrdevnn1.sv2.trulia.com --remote-port 53393 = --conf hive.spark.client.connect.timeout=3D1000 --conf hive.spark.clie= nt.server.connect.timeout=3D90000 --conf hive.spark.client.channel.log= .level=3Dnull --conf hive.spark.client.rpc.max.size=3D52428800 --= conf hive.spark.client.rpc.threads=3D8 --conf hive.spark.client.secret= .bits=3D256 --conf hive.spark.client.rpc.server.address=3Dnull

* i even print out CLASSPATH in this script: /usr/lib/spark-2.2.0-= bin-hadoop2.6/bin/spark-submit

and /usr/lib/spark-2.2.0-bin= -hadoop2.6/jars/spark-core_2.11-2.2.0.jar is in it.

=E2=80= =8Bso i ask... what am i missing?

thanks,
Stephen=E2=80=8B







--
Sahil Takiar
Software Engineer at Cloudera
takiar.sahil@gmail.com | (510) 673-0309
=
--001a113bacfc3566f8055a1e94ad--