Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C5FC018311 for ; Wed, 9 Dec 2015 00:11:19 +0000 (UTC) Received: (qmail 43787 invoked by uid 500); 9 Dec 2015 00:11:18 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 43704 invoked by uid 500); 9 Dec 2015 00:11:18 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 43694 invoked by uid 99); 9 Dec 2015 00:11:18 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Dec 2015 00:11:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 94C65180A12 for ; Wed, 9 Dec 2015 00:11:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.101 X-Spam-Level: **** X-Spam-Status: No, score=4.101 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, KAM_COUK=1.1, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 2B_fvD77PcAj for ; Wed, 9 Dec 2015 00:11:08 +0000 (UTC) Received: from sulu.netzoomi.net (sulu.netzoomi.net [83.138.144.103]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTP id CAFC524DB9 for ; Wed, 9 Dec 2015 00:11:07 +0000 (UTC) Received: from vulcan.netzoomi.net (unknown [212.100.249.54]) by sulu.netzoomi.net (Postfix) with ESMTP id A9E826A44E6 for ; Wed, 9 Dec 2015 00:05:57 +0000 (GMT) X-Envelope-From: Received: from w7 (cpc86449-seve24-2-0-cust177.13-3.cable.virginm.net [86.19.59.178]) by vulcan.netzoomi.net (Postfix) with ESMTPA id 83A9512480EF for ; Wed, 9 Dec 2015 00:05:57 +0000 (GMT) From: "Mich Talebzadeh" To: References: <0a3901d13088$cd3f7ab0$67be7010$@peridale.co.uk> In-Reply-To: <0a3901d13088$cd3f7ab0$67be7010$@peridale.co.uk> Subject: RE: The Hive shell and Spark issue Date: Wed, 9 Dec 2015 00:06:12 -0000 Message-ID: <0b4601d13215$6a11aba0$3e3502e0$@peridale.co.uk> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0B47_01D13215.6A133240" X-Mailer: Microsoft Outlook 15.0 Thread-Index: AQE9O1K6IS1EEzHztqsK2GUmqgT9Pp/prSXQ Content-Language: en-gb This is a multipart message in MIME format. ------=_NextPart_000_0B47_01D13215.6A133240 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Few other observations based upon my experience with making Hive 1.2.1 = use spark-1.3.1-bin-hadoop2.6 and use the jar file built from source code = (spark version 1.3.1) -> spark-assembly-1.3.1-hadoop2.4.0.jar=20 =20 1. Putting spark-assembly-1.3.1-hadoop2.4.0.jar simply in = $HIVE_HOME/lib is not going to work as you are going to get all sorts of stack traces. = This is because the shell script $HIVE_HOME/ bin/hive is going to create CLASSPATH which results in hive not starting =20 2. The code simply does =20 for f in ${HIVE_LIB}/*.jar; do CLASSPATH=3D${CLASSPATH}:$f; done =20 # add Spark assembly jar to the classpath if [[ -n "$SPARK_HOME" ]] then sparkAssemblyPath=3D`ls ${SPARK_HOME}/lib/spark-assembly-*.jar` CLASSPATH=3D"${CLASSPATH}:${sparkAssemblyPath}" fi =20 The first loop adds all jar files to the CLASSPATH which ends up spark-assembly-1.3.1-hadoop2.4.0.jar being ahead of Hadoop related jar files. The file spark-assembly-1.3.1-hadoop2.4.0.jar is pretty older version! =20 The second loop states that if =A3SPARK_HOME is set up then add spark-assembly-*.jar from $SPARK_HOME/lib to the CLASSPATH which we know will not work because of class dependencies. =20 =20 The proposed solution =20 1. Before starting Hive do 2. unset $SPARK_HOME 3. create a new environment variable to indicate that you want to use Spark as execution engine for Hive --> HIVE_ON_SPARK=3D'Y' =20 Modify hive shell to do the following: =20 # Exclude any spark-assemly*.jar from the normal build for hive for f in `ls ${HIVE_LIB}/*.jar|grep -v = spark-assembly-1.3.1-hadoop2.4.0.jar` do CLASSPATH=3D${CLASSPATH}:$f; done CLASSPATH=3D${CLASSPATH}: =20 # Add Spark assembly jar to the classpath for future work. Otherwise = ensure SPARK_HOME is unset outside of this shell if [[ -n "$SPARK_HOME" ]] then sparkAssemblyPath=3D`ls ${SPARK_HOME}/lib/spark-assembly-*.jar` CLASSPATH=3D"${CLASSPATH}:${sparkAssemblyPath}" fi =20 # Add Spark assembly jar to the classpath for Hive on Spark engine as a work-around! Set HIVE_ON_SPARK=3D=92Y=92 outside of this shell if [[ -n "$HIVE_ON_SPARK" ]] then sparkAssemblyPath=3D`ls ${HIVE_HOME}/lib/spark-assembly-*.jar` CLASSPATH=3D"${CLASSPATH}:${sparkAssemblyPath}" fi =20 HTH, =20 Mich Talebzadeh =20 Sybase ASE 15 Gold Medal Award 2008 A Winning Strategy: Running the most Critical Financial Data on ASE 15 =20 http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-0919= 08. pdf Author of the books "A Practitioner=92s Guide to Upgrading to Sybase ASE = 15", ISBN 978-0-9563693-0-7.=20 co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4 Publications due shortly: Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8 Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, = volume one out shortly =20 = http://talebzadehmich.wordpress.com =20 NOTE: The information in this email is proprietary and confidential. = This message is for the designated recipient only, if you are not the = intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale = Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It = is the responsibility of the recipient to ensure that this email is virus = free, therefore neither Peridale Ltd, its subsidiaries nor their employees = accept any responsibility. =20 From: Mich Talebzadeh [mailto:mich@peridale.co.uk]=20 Sent: 07 December 2015 00:47 To: user@hive.apache.org Subject: The Hive shell and Spark issue =20 Hi, =20 Sounds like the issue with Hive and Spark as Hive engine comes from the following lines in $HIVE_HOME/bin/hive which is =20 # add Spark assembly jar to the classpath if [[ -n "$SPARK_HOME" ]] then sparkAssemblyPath=3D`ls ${SPARK_HOME}/lib/spark-assembly-*.jar` CLASSPATH=3D"${CLASSPATH}:${sparkAssemblyPath}" fi =20 As we know Hive will not be able to use Spark with spark-assembly-*.jar which is located in pre-built spark download. It will not work! For now = as a work-around you need to build Spark from source code and exclude Hive libraries. Then copy spark-assembly-*.jar file from $SPARK_BUILT_FROM_SOURCE_CODE_HOME/lib to $HIVE_HOME/lib. That is if you want to test Hive with Spark engine. =20 So either you have to unset the ENV variable $SPARK_HOME when connecting = to Hive CLI or comment out CLASSPATH=3D"${CLASSPATH}:${sparkAssemblyPath}" = in $HIVE_HOME/bin/hive =20 But leaving the file =93spark-assembly-*.jar=94 in $HIVE_HOME/bin/hive = seems to cause Hive server not to start properly so client connections like = beeline don=92t seem to work as well. =20 I am still investigating.=20 =20 =20 HTH =20 =20 Mich Talebzadeh =20 Sybase ASE 15 Gold Medal Award 2008 A Winning Strategy: Running the most Critical Financial Data on ASE 15 =20 http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-0919= 08. pdf Author of the books "A Practitioner=92s Guide to Upgrading to Sybase ASE = 15", ISBN 978-0-9563693-0-7.=20 co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4 Publications due shortly: Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8 Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, = volume one out shortly =20 = http://talebzadehmich.wordpress.com =20 NOTE: The information in this email is proprietary and confidential. = This message is for the designated recipient only, if you are not the = intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale = Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It = is the responsibility of the recipient to ensure that this email is virus = free, therefore neither Peridale Ltd, its subsidiaries nor their employees = accept any responsibility. =20 ------=_NextPart_000_0B47_01D13215.6A133240 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

Few other = observations based upon my experience with making Hive 1.2.1 = use spark-1.3.1-bin-hadoop2.6 = and use the jar file built from source code (spark version 1.3.1) -> = spark-assembly-1.3.1-hadoop2.4.0.jar

 

1.    Putting = spark-assembly-1.3.1-hadoop2.4.0.jar simply in $HIVE_HOME/lib is not = going to work as you are going to get all sorts of stack traces. This is = because the shell script $HIVE_HOME/ bin/hive is going to create = CLASSPATH which results in hive not starting

 

2.    The code simply = does

 

for f in = ${HIVE_LIB}/*.jar; do

=A0 = CLASSPATH=3D${CLASSPATH}:$f;

done

 

# add Spark assembly jar to the = classpath

if [[ -n "$SPARK_HOME" = ]]

then

=A0 sparkAssemblyPath=3D`ls = ${SPARK_HOME}/lib/spark-assembly-*.jar`

=A0 = CLASSPATH=3D"${CLASSPATH}:${sparkAssemblyPath}"

fi

 

The = first loop adds all jar files to the CLASSPATH which ends up = spark-assembly-1.3.1-hadoop2.4.0.jar being ahead of Hadoop related jar = files. The file spark-assembly-1.3.1-hadoop2.4.0.jar is pretty older = version!

 

The = second loop states that if =A3SPARK_HOME is set up then add = spark-assembly-*.jar from $SPARK_HOME/lib to the CLASSPATH which we know = will not work because of class dependencies.

 

 

The = proposed solution

 

1.    Before starting Hive = do

2.    unset = $SPARK_HOME

3.    create a new environment = variable to indicate that you want to use Spark as execution engine for = Hive=A0 =E0 = HIVE_ON_SPARK=3D'Y'

 

Modify = hive shell to do the following:

 

# Exclude any spark-assemly*.jar from the normal = build for hive

for f in `ls = ${HIVE_LIB}/*.jar|grep -v = spark-assembly-1.3.1-hadoop2.4.0.jar`

do

=A0 = CLASSPATH=3D${CLASSPATH}:$f;

done

CLASSPATH=3D${CLASSPATH}:

 

# Add Spark assembly jar to the classpath for future = work. Otherwise ensure SPARK_HOME is unset outside of this = shell

if [[ -n = "$SPARK_HOME" ]]

then

=A0 = sparkAssemblyPath=3D`ls = ${SPARK_HOME}/lib/spark-assembly-*.jar`

=A0 = CLASSPATH=3D"${CLASSPATH}:${sparkAssemblyPath}"

fi

 

# Add Spark assembly jar to the classpath for Hive = on Spark engine as a work-around! Set HIVE_ON_SPARK=3D’Y’ = outside of this shell

if [[ -n = "$HIVE_ON_SPARK" ]]

then

=A0 = sparkAssemblyPath=3D`ls = ${HIVE_HOME}/lib/spark-assembly-*.jar`

=A0 = CLASSPATH=3D"${CLASSPATH}:${sparkAssemblyPath}"

fi

 

HTH,

 

Mich = Talebzadeh

=  

Sybase ASE 15 Gold Medal Award = 2008

A Winning Strategy: Running the most Critical = Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-= Winning-Strategy-091908.pdf

Author of the books "A = Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN = 978-0-9563693-0-7.

co-author "Sybase Transact SQL Guidelines = Best Practices", ISBN 978-0-9759693-0-4

Publications = due shortly:

Complex Event Processing in Heterogeneous = Environments, ISBN: = 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out = shortly

 

http://talebzadehmich.wordpress.com<= /o:p>

 

NOTE: The information in this email is = proprietary and confidential. This message is for the designated = recipient only, if you are not the intended recipient, you should = destroy it immediately. Any information in this message shall not be = understood as given or endorsed by Peridale Technology Ltd, its = subsidiaries or their employees, unless expressly so stated. It is the = responsibility of the recipient to ensure that this email is virus free, = therefore neither Peridale Ltd, its subsidiaries nor their employees = accept any responsibility.

 

From: Mich Talebzadeh = [mailto:mich@peridale.co.uk]
Sent: 07 December 2015 = 00:47
To: user@hive.apache.org
Subject: The Hive = shell and Spark issue

 

Hi,

 

Sounds = like the issue with Hive and Spark as Hive engine comes from the = following lines in $HIVE_HOME/bin/hive which is

 

# = add Spark assembly jar to the classpath

if [[ -n "$SPARK_HOME" = ]]

then

  = sparkAssemblyPath=3D`ls = ${SPARK_HOME}/lib/spark-assembly-*.jar`

  = CLASSPATH=3D"${CLASSPATH}:${sparkAssemblyPath}"

fi

 

As we = know Hive will not be able to use Spark with spark-assembly-*.jar which = is located in pre-built spark download. It will not work! For now as a = work-around you need to build Spark from source code and exclude Hive = libraries. Then copy spark-assembly-*.jar file from = $SPARK_BUILT_FROM_SOURCE_CODE_HOME/lib to $HIVE_HOME/lib. That is if you = want to test Hive with Spark engine.

 

So = either you have to unset the ENV variable $SPARK_HOME when connecting to = Hive CLI or comment out CLASSPATH=3D"${CLASSPATH}:${sparkAssemblyPath}"= ; in = $HIVE_HOME/bin/hive

 

But = leaving the file  “spark-assembly-*.jar” in = $HIVE_HOME/bin/hive seems to cause Hive server not to start properly so = client connections like beeline don’t seem to work as = well.

 

I am = still investigating.

 

 

HTH

 

 

Mich = Talebzadeh

=  

Sybase ASE 15 Gold Medal Award = 2008

A Winning Strategy: Running the most Critical = Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-= Winning-Strategy-091908.pdf

Author of the books "A = Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN = 978-0-9563693-0-7.

co-author "Sybase Transact SQL Guidelines = Best Practices", ISBN 978-0-9759693-0-4

Publications = due shortly:

Complex Event Processing in Heterogeneous = Environments, ISBN: = 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out = shortly

 

http://talebzadehmich.wordpress.com<= /o:p>

 

NOTE: The information in this email is = proprietary and confidential. This message is for the designated = recipient only, if you are not the intended recipient, you should = destroy it immediately. Any information in this message shall not be = understood as given or endorsed by Peridale Technology Ltd, its = subsidiaries or their employees, unless expressly so stated. It is the = responsibility of the recipient to ensure that this email is virus free, = therefore neither Peridale Ltd, its subsidiaries nor their employees = accept any responsibility.

 

------=_NextPart_000_0B47_01D13215.6A133240--