hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuefu Zhang <xzh...@cloudera.com>
Subject Re: Answers to recent questions on Hive on Spark
Date Sat, 28 Nov 2015 04:34:47 GMT
Okay. I think I know what problem you have now. To run Hive on Spark,
spark-assembly.jar is needed and it's also recommended that you have a
spark installation (identified by spark.home) on the same host where HS2 is
running. You only need spark-assembly.jar in HS2's /lib directory. Other
than those, Hive on Spark doesn't have any other dependency at service
level. On the job level, Hive on Spark jobs of course run on a spark
cluster, which could be standalone, yarn-cluster, etc. However, how you get
the binaries for your spark cluster and how you start them is completely
independent of Hive.

Thus, you only need to build the spark-assembly.jar w/o HIve and put it in
Hive's /lib directory. The one in the existing spark build may contain Hive
classes and that's why you need to build your own. Your spark installation
can still have a jar that's different from what you build for Hive on
Spark. Your spark.home can still point to your existing spark installation.
In fact, Hive on Spark only needs spark-submit from your Spark
installation. Therefore, you should be okay even if your spark installation
contains Hive classes.

By following this, I'm sure you will get your Hive on Spark to work.
Depending on the Hive version that your spark installation contains, you
may have problem with spark applications such as SparkSQL, but it shouldn't
be a concern if you decide that you use Hive in Hive.

Let me know if you are still confused.

Thanks,
Xuefu

On Fri, Nov 27, 2015 at 4:34 PM, Mich Talebzadeh <mich@peridale.co.uk>
wrote:

> Hi,
>
>
>
> Thanks for heads up and comments.
>
>
>
> Sounds like when it comes to using spark as the execution engine for Hive,
> we are in no man’s land so to speak. I have opened questions in both Hive
> and Spark user forums. Not much of luck for reasons that you alluded to.
>
>
>
> Ok just to clarify the prebuild version of spark (as opposed get the
> source code and build your spec) works fine for me.
>
>
>
> Components are
>
>
>
> hadoop version
>
> Hadoop 2.6.0
>
>
>
> hive --version
>
> Hive 1.2.1
>
>
>
> Spark
>
> version 1.5.2
>
>
>
> It does what it says on the tin. For example I can start the master node
> OK start-master.sh.
>
>
>
>
>
> Spark Command: */usr/java/latest/bin/java -cp
> /usr/lib/spark_1.5.2_bin/sbin/../conf/:/usr/lib/spark_1.5.2_bin/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/usr/lib/spark_1.5.2_bin/lib/datanucleus-core-3.2.10.jar:/usr/lib/spark_1.5.2_bin/lib/datanucleus-api-jdo-3.2.6.jar:/usr/lib/spark_1.5.2_bin/lib/datanucleus-rdbms-3.2.9.jar:/home/hduser/hadoop-2.6.0/etc/hadoop/
> -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master
> --ip 127.0.0.1 --port 7077 --webui-port 8080*
>
> ========================================
>
> 15/11/28 00:05:23 INFO master.Master: Registered signal handlers for
> [TERM, HUP, INT]
>
> 15/11/28 00:05:23 WARN util.Utils: Your hostname, rhes564 resolves to a
> loopback address: 127.0.0.1; using 50.140.197.217 instead (on interface
> eth0)
>
> 15/11/28 00:05:23 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind
> to another address
>
> 15/11/28 00:05:24 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> 15/11/28 00:05:24 INFO spark.SecurityManager: Changing view acls to: hduser
>
> 15/11/28 00:05:24 INFO spark.SecurityManager: Changing modify acls to:
> hduser
>
> 15/11/28 00:05:24 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view permissions:
> Set(hduser); users with modify permissions: Set(hduser)
>
> 15/11/28 00:05:25 INFO slf4j.Slf4jLogger: Slf4jLogger started
>
> 15/11/28 00:05:25 INFO Remoting: Starting remoting
>
> 15/11/28 00:05:25 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkMaster@127.0.0.1:7077]
>
> 15/11/28 00:05:25 INFO util.Utils: Successfully started service
> 'sparkMaster' on port 7077.
>
> 15/11/28 00:05:25 INFO master.Master: Starting Spark master at spark://
> 127.0.0.1:7077
>
> 15/11/28 00:05:25 INFO master.Master: Running Spark version 1.5.2
>
> 15/11/28 00:05:25 INFO server.Server: jetty-8.y.z-SNAPSHOT
>
> 15/11/28 00:05:25 INFO server.AbstractConnector: Started
> SelectChannelConnector@0.0.0.0:8080
>
> 15/11/28 00:05:25 INFO util.Utils: Successfully started service 'MasterUI'
> on port 8080.
>
> 15/11/28 00:05:25 INFO ui.MasterWebUI: Started MasterWebUI at
> http://50.140.197.217:8080
>
> 15/11/28 00:05:25 INFO server.Server: jetty-8.y.z-SNAPSHOT
>
> 15/11/28 00:05:25 INFO server.AbstractConnector: Started
> SelectChannelConnector@rhes564:6066
>
> 15/11/28 00:05:25 INFO util.Utils: Successfully started service on port
> 6066.
>
> 15/11/28 00:05:25 INFO rest.StandaloneRestServer: Started REST server for
> submitting applications on port 6066
>
> 15/11/28 00:05:25 INFO master.Master: I have been elected leader! New
> state: ALIVE
>
>
>
> However, I cannot use spark in place of MapReduce engine with this build.
> It fails
>
>
>
> The instruction says download the source code for spark and build it by
> excluding Hive jar files so that you can use spark as the execution engine
>
>
>
> Ok
>
>
>
> I downloaded spark 1.5.2 source code and used the following to create the
> tarred and zipped file
>
>
>
> ./make-distribution.sh --name "hadoop2-without-hive" --tgz
> "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"
>
>
>
> After unpacking the file, I attempted to start the master node as above start-master.sh,
> However, regrettably it fails with the following error
>
>
>
>
>
> *Spark Command: /usr/java/latest/bin/java -cp
> /usr/lib/spark_1.5.2_build/sbin/../conf/:/usr/lib/spark_1.5.2_build/lib/*
> *spark-assembly-1.5.2-hadoop2.4.0.jar**:/home/hduser/hadoop-2.6.0/etc/hadoop/
> -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master
> --ip 127.0.0.1 --port 7077 --webui-port 8080*
>
> ========================================
>
> Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
>
>         at java.lang.Class.getDeclaredMethods0(Native Method)
>
>         at java.lang.Class.privateGetDeclaredMethods(Class.java:2521)
>
>         at java.lang.Class.getMethod0(Class.java:2764)
>
>         at java.lang.Class.getMethod(Class.java:1653)
>
>         at
> sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
>
>         at
> sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
>
> Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
>
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>
>         ... 6 more
>
>
>
>
>
> I believe the problem lies in *spark-assembly-1.5.2-hadoop2.4.0.jar *file.
> Case in point, if I copy the jar file *spark-assembly-1.5.2-hadoop2.6.0.jar
> *to the lib directory above , I can start the master node.
>
>
>
> hduser@rhes564::/usr/lib/spark_1.5.2_build/lib> mv
> spark-assembly-1.5.2-hadoop2.4.0.jar
> spark-assembly-1.5.2-hadoop2.4.0.jar_old
>
> hduser@rhes564::/usr/lib/spark_1.5.2_build/lib> cp
> /usr/lib/spark_1.5.2_bin/lib/spark-assembly-1.5.2-hadoop2.6.0.jar .
>
>
>
> hduser@rhes564::/usr/lib/spark_1.5.2_build/lib> cd ../sbin
>
> hduser@rhes564::/usr/lib/spark_1.5.2_build/sbin> start-master.sh
>
> starting org.apache.spark.deploy.master.Master, logging to
> /usr/lib/spark_1.5.2_build/sbin/../logs/spark-hduser-org.apache.spark.deploy.master.Master-1-rhes564.out
>
> hduser@rhes564::/usr/lib/spark_1.5.2_build/sbin> cat
> /usr/lib/spark_1.5.2_build/sbin/../logs/spark-hduser-org.apache.spark.deploy.master.Master-1-rhes564.out
>
> *Spark Command: /usr/java/latest/bin/java -cp
> /usr/lib/spark_1.5.2_build/sbin/../conf/:/usr/lib/spark_1.5.2_build/lib/*
> *spark-assembly-1.5.2-hadoop2.6.0.jar**:/home/hduser/hadoop-2.6.0/etc/hadoop/
> -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master
> --ip 50.140.197.217 --port 7077 --webui-port 8080*
>
> *========================================*
>
> 15/11/28 00:31:24 INFO master.Master: Registered signal handlers for
> [TERM, HUP, INT]
>
> 15/11/28 00:31:25 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> 15/11/28 00:31:25 INFO spark.SecurityManager: Changing view acls to: hduser
>
> 15/11/28 00:31:25 INFO spark.SecurityManager: Changing modify acls to:
> hduser
>
> 15/11/28 00:31:25 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view permissions:
> Set(hduser); users with modify permissions: Set(hduser)
>
> 15/11/28 00:31:25 INFO slf4j.Slf4jLogger: Slf4jLogger started
>
> 15/11/28 00:31:26 INFO Remoting: Starting remoting
>
> 15/11/28 00:31:26 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkMaster@50.140.197.217:7077]
>
> 15/11/28 00:31:26 INFO util.Utils: Successfully started service
> 'sparkMaster' on port 7077.
>
> 15/11/28 00:31:26 INFO master.Master: Starting Spark master at spark://
> 50.140.197.217:7077
>
> 15/11/28 00:31:26 INFO master.Master: Running Spark version 1.5.2
>
> 15/11/28 00:31:26 INFO server.Server: jetty-8.y.z-SNAPSHOT
>
> 15/11/28 00:31:26 INFO server.AbstractConnector: Started
> SelectChannelConnector@0.0.0.0:8080
>
> 15/11/28 00:31:26 INFO util.Utils: Successfully started service 'MasterUI'
> on port 8080.
>
> 15/11/28 00:31:26 INFO ui.MasterWebUI: Started MasterWebUI at
> http://50.140.197.217:8080
>
> 15/11/28 00:31:26 INFO server.Server: jetty-8.y.z-SNAPSHOT
>
> 15/11/28 00:31:26 INFO server.AbstractConnector: Started
> SelectChannelConnector@c-50-140-197-217.hsd1.fl.comcast.net:6066
>
> 15/11/28 00:31:26 INFO util.Utils: Successfully started service on port
> 6066.
>
> 15/11/28 00:31:26 INFO rest.StandaloneRestServer: Started REST server for
> submitting applications on port 6066
>
> 15/11/28 00:31:27 INFO master.Master: I have been elected leader! New
> state: ALIVE
>
>
>
> Thanks again.
>
>
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Xuefu Zhang [mailto:xzhang@cloudera.com]
> *Sent:* 27 November 2015 18:12
> *To:* user@hive.apache.org; dev@hive.apache.org
> *Subject:* Answers to recent questions on Hive on Spark
>
>
>
> Hi there,
>
> There seemed an increasing interest in Hive On Spark From the Hive users.
> I understand that there have been a few questions or problems reported and
> I can see some frustration sometimes. It's impossible for Hive on Spark
> team to respond every inquiry even thought we wish we could. However, there
> are a few items to be noted:
>
> 1. Hive on Spark is being tested as part of Precommit test.
>
> 2. Hive on Spark is supported in some distributions such as CDH.
>
> 3. I tried a couple of days ago with latest master and branch-1, and they
> all worked with my Spark 1.5 build.
>
> Therefore, if you are facing some problem, it's likely due to your setup.
> Please refer to Wiki on how to do it right. Nevertheless, I have a few
> suggestions here:
>
> 1. Start with simple. Try out a CDH sandbox or distribution first and to
> see it works in action before building your own. Comparing with your setup
> may give you some clues.
>
> 2. Try with spark.master=local first, making sure that you have all the
> necessary dependent jars, and then move to your production setup. Please
> note that yarn-cluster is recommended and mesos is not supported. I tried
> both yarn-cluster and local-cluster and both worked for me.
>
> 3. Check logs beyond hive.log such as spark log, and yarn-log to get more
> error messages.
>
> When you report your problem, please provide as much info as possible,
> such as your platform, your builds, your configurations, and relevant logs
> so that others can reproduce.
>
> Please note that we are not in a good position to answer questions with
> respect to Spark itself, such as spark-shell. Not only is that beyond the
> scope of Hive on Scope, but also the team may not have the expertise to
> give your meaningful answers. One thing to emphasize. When you build your
> spark jar, don't include Hive, as it's very likely there is a version
> mismatch. Again, a distribution may have solve the problem for you if you
> like to give it a try.
>
> Hope this helps.
>
> Thanks,
>
> Xuefu
>

Mime
View raw message