hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dasun Hegoda <dasunheg...@gmail.com>
Subject Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu
Date Fri, 27 Nov 2015 15:13:22 GMT
Hey!

Thanks for the clarification. I have been to struggling to deploy hive on
spark for 3 weeks now. Still no luck. I can't believe that even Hive
experts here don't know about it. I'm wondering what to do.

Any guesses???

On Fri, Nov 27, 2015 at 3:52 PM, Mich Talebzadeh <mich@peridale.co.uk>
wrote:

> This should work as long as $SPARK_HOME has been setup and your CLASSPATH
> includes spark jars.
>
>
>
> Also bear in mind that this will work OK BUT crucially Hive will not be
> able to use Spark engine with pre-built Spark binary downloads
>
>
>
> Example
>
>
>
> *spark-shell --master spark://rhes564:7077*
>
>
>
> /11/27 10:19:25 INFO spark.SecurityManager: Changing view acls to: hduser
>
> 15/11/27 10:19:25 INFO spark.SecurityManager: Changing modify acls to:
> hduser
>
> 15/11/27 10:19:25 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view permissions:
> Set(hduser); users with modify permissions: Set(hduser)
>
> 15/11/27 10:19:25 INFO spark.HttpServer: Starting HTTP Server
>
> 15/11/27 10:19:25 INFO server.Server: jetty-8.y.z-SNAPSHOT
>
> 15/11/27 10:19:25 INFO server.AbstractConnector: Started
> SocketConnector@0.0.0.0:22613
>
> 15/11/27 10:19:25 INFO util.Utils: Successfully started service 'HTTP
> class server' on port 22613.
>
> Welcome to
>
>       ____              __
>
>      / __/__  ___ _____/ /__
>
>     _\ \/ _ \/ _ `/ __/  '_/
>
>    /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
>
>       /_/
>
>
>
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.7.0_25)
>
> Type in expressions to have them evaluated.
>
> Type :help for more information.
>
> 15/11/27 10:19:29 WARN util.Utils: Your hostname, rhes564 resolves to a
> loopback address: 127.0.0.1; using 50.140.197.217 instead (on interface
> eth0)
>
> 15/11/27 10:19:29 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind
> to another address
>
> 15/11/27 10:19:29 INFO spark.SparkContext: Running Spark version 1.5.2
>
> 15/11/27 10:19:29 INFO spark.SecurityManager: Changing view acls to: hduser
>
> 15/11/27 10:19:29 INFO spark.SecurityManager: Changing modify acls to:
> hduser
>
> 15/11/27 10:19:29 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view permissions:
> Set(hduser); users with modify permissions: Set(hduser)
>
> 15/11/27 10:19:30 INFO slf4j.Slf4jLogger: Slf4jLogger started
>
> 15/11/27 10:19:30 INFO Remoting: Starting remoting
>
> 15/11/27 10:19:30 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkDriver@50.140.197.217:61620]
>
> 15/11/27 10:19:30 INFO util.Utils: Successfully started service
> 'sparkDriver' on port 61620.
>
> 15/11/27 10:19:30 INFO spark.SparkEnv: Registering MapOutputTracker
>
> 15/11/27 10:19:30 INFO spark.SparkEnv: Registering BlockManagerMaster
>
> 15/11/27 10:19:30 INFO storage.DiskBlockManager: Created local directory
> at /tmp/blockmgr-eae28f3e-f878-4591-85f0-e8a66c6acb02
>
> 15/11/27 10:19:30 INFO storage.MemoryStore: MemoryStore started with
> capacity 529.9 MB
>
> 15/11/27 10:19:30 INFO spark.HttpFileServer: HTTP File server directory is
> /tmp/spark-75cd7444-5cf7-4175-a15b-6c3882c9d146/httpd-8dc465d5-664d-4cef-86a8-d4e8b34f4146
>
> 15/11/27 10:19:30 INFO spark.HttpServer: Starting HTTP Server
>
> 15/11/27 10:19:30 INFO server.Server: jetty-8.y.z-SNAPSHOT
>
> 15/11/27 10:19:30 INFO server.AbstractConnector: Started
> SocketConnector@0.0.0.0:44656
>
> 15/11/27 10:19:30 INFO util.Utils: Successfully started service 'HTTP file
> server' on port 44656.
>
> 15/11/27 10:19:30 INFO spark.SparkEnv: Registering OutputCommitCoordinator
>
> 15/11/27 10:19:30 INFO server.Server: jetty-8.y.z-SNAPSHOT
>
> 15/11/27 10:19:30 INFO server.AbstractConnector: Started
> SelectChannelConnector@0.0.0.0:4040
>
> 15/11/27 10:19:30 INFO util.Utils: Successfully started service 'SparkUI'
> on port 4040.
>
> 15/11/27 10:19:30 INFO ui.SparkUI: Started SparkUI at
> http://50.140.197.217:4040
>
> 15/11/27 10:19:30 WARN metrics.MetricsSystem: Using default name
> DAGScheduler for source because spark.app.id is not set.
>
> 15/11/27 10:19:30 INFO client.AppClient$ClientEndpoint: Connecting to
> master spark://rhes564:7077...
>
> 15/11/27 10:19:31 INFO cluster.SparkDeploySchedulerBackend: Connected to
> Spark cluster with app ID app-20151127101931-0001
>
> 15/11/27 10:19:31 INFO client.AppClient$ClientEndpoint: Executor added:
> app-20151127101931-0001/0 on worker-20151127100137-50.140.197.217-38428 (
> 50.140.197.217:38428) with 12 cores
>
> 15/11/27 10:19:31 INFO cluster.SparkDeploySchedulerBackend: Granted
> executor ID app-20151127101931-0001/0 on hostPort 50.140.197.217:38428
> with 12 cores, 1024.0 MB RAM
>
> 15/11/27 10:19:31 INFO client.AppClient$ClientEndpoint: Executor updated:
> app-20151127101931-0001/0 is now LOADING
>
> 15/11/27 10:19:31 INFO client.AppClient$ClientEndpoint: Executor updated:
> app-20151127101931-0001/0 is now RUNNING
>
> 15/11/27 10:19:31 INFO util.Utils: Successfully started service
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 19761.
>
> 15/11/27 10:19:31 INFO netty.NettyBlockTransferService: Server created on
> 19761
>
> 15/11/27 10:19:31 INFO storage.BlockManagerMaster: Trying to register
> BlockManager
>
> 15/11/27 10:19:31 INFO storage.BlockManagerMasterEndpoint: Registering
> block manager 50.140.197.217:19761 with 529.9 MB RAM,
> BlockManagerId(driver, 50.140.197.217, 19761)
>
> 15/11/27 10:19:31 INFO storage.BlockManagerMaster: Registered BlockManager
>
> 15/11/27 10:19:31 INFO cluster.SparkDeploySchedulerBackend:
> SchedulerBackend is ready for scheduling beginning after reached
> minRegisteredResourcesRatio: 0.0
>
> 15/11/27 10:19:31 INFO repl.SparkILoop: Created spark context..
>
> Spark context available as sc.
>
> 15/11/27 10:19:31 INFO hive.HiveContext: Initializing execution hive,
> version 1.2.1
>
> 15/11/27 10:19:31 INFO client.ClientWrapper: Inspected Hadoop version:
> 2.6.0
>
> 15/11/27 10:19:31 INFO client.ClientWrapper: Loaded
> org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
>
> 15/11/27 10:19:32 INFO hive.metastore: Trying to connect to metastore with
> URI thrift://localhost:9083
>
> 15/11/27 10:19:32 INFO hive.metastore: Connected to metastore.
>
> 15/11/27 10:19:32 INFO session.SessionState: Created local directory:
> /tmp/hive/b8bba1a1-646b-4734-bad3-4c1d6cb9344d_resources
>
> 15/11/27 10:19:32 INFO session.SessionState: Created HDFS directory:
> /tmp/hive/hduser/b8bba1a1-646b-4734-bad3-4c1d6cb9344d
>
> 15/11/27 10:19:32 INFO session.SessionState: Created local directory:
> /tmp/hive/b8bba1a1-646b-4734-bad3-4c1d6cb9344d
>
> 15/11/27 10:19:32 INFO session.SessionState: Created HDFS directory:
> /tmp/hive/hduser/b8bba1a1-646b-4734-bad3-4c1d6cb9344d/_tmp_space.db
>
> 15/11/27 10:19:32 INFO hive.HiveContext: default warehouse location is
> /user/hive/warehouse
>
> 15/11/27 10:19:32 INFO hive.HiveContext: Initializing
> HiveMetastoreConnection version 1.2.1 using Spark classes.
>
> 15/11/27 10:19:32 INFO client.ClientWrapper: Inspected Hadoop version:
> 2.6.0
>
> 15/11/27 10:19:33 INFO client.ClientWrapper: Loaded
> org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
>
> 15/11/27 10:19:33 INFO cluster.SparkDeploySchedulerBackend: Registered
> executor: AkkaRpcEndpointRef(Actor[akka.tcp://
> sparkExecutor@50.140.197.217:55017/user/Executor#1724631850]) with ID 0
>
> 15/11/27 10:19:33 INFO storage.BlockManagerMasterEndpoint: Registering
> block manager 50.140.197.217:25122 with 529.9 MB RAM, BlockManagerId(0,
> 50.140.197.217, 25122)
>
> 15/11/27 10:19:33 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> 15/11/27 10:19:33 INFO hive.metastore: Trying to connect to metastore with
> URI thrift://localhost:9083
>
> 15/11/27 10:19:33 INFO hive.metastore: Connected to metastore.
>
> 15/11/27 10:19:34 INFO session.SessionState: Created local directory:
> /tmp/hive/d2b5c2bd-3989-4a72-a99f-885356f02f8b_resources
>
> 15/11/27 10:19:34 INFO session.SessionState: Created HDFS directory:
> /tmp/hive/hduser/d2b5c2bd-3989-4a72-a99f-885356f02f8b
>
> 15/11/27 10:19:34 INFO session.SessionState: Created local directory:
> /tmp/hive/d2b5c2bd-3989-4a72-a99f-885356f02f8b
>
> 15/11/27 10:19:34 INFO session.SessionState: Created HDFS directory:
> /tmp/hive/hduser/d2b5c2bd-3989-4a72-a99f-885356f02f8b/_tmp_space.db
>
> 15/11/27 10:19:34 INFO repl.SparkILoop: Created sql context (with Hive
> support)..
>
> SQL context available as sqlContext.
>
>
>
> *scala>*
>
>
>
> However, that is of little use to me cause I want to use Spark as Hive
> engine for faster performance compared to MapReduce engine. *Spark as a
> fully built application does not work as an engine alone*! For that I
> need to build Spark WITHOUT HAVE Jars and use it as engine as opposed to
> standalone application.
>
>
>
> HTH
>
>
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Dasun Hegoda [mailto:dasunhegoda@gmail.com]
> *Sent:* 27 November 2015 05:11
> *To:* user@hive.apache.org
> *Subject:* Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu
>
>
>
> This works fine for me
>
>
>
> spark-shell --master yarn-client
>
>
>
> On Tue, Nov 24, 2015 at 11:43 AM, Dasun Hegoda <dasunhegoda@gmail.com>
> wrote:
>
> Hey floks,
>
>
>
> Any updates?
>
>
>
> On Mon, Nov 23, 2015 at 5:15 PM, Dasun Hegoda <dasunhegoda@gmail.com>
> wrote:
>
> Do you have any clue how to get his fixed?
>
>
>
> On Mon, Nov 23, 2015 at 4:27 PM, Dasun Hegoda <dasunhegoda@gmail.com>
> wrote:
>
> I get this now. It's different than what you get
>
>
>
> hduser@master:~/spark-1.5.1-bin-hadoop2.6/bin$ ./spark-shell
>
> 15/11/23 05:56:13 INFO spark.SecurityManager: Changing view acls to: hduser
>
> 15/11/23 05:56:13 INFO spark.SecurityManager: Changing modify acls to:
> hduser
>
> 15/11/23 05:56:13 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view permissions:
> Set(hduser); users with modify permissions: Set(hduser)
>
> 15/11/23 05:56:13 INFO spark.HttpServer: Starting HTTP Server
>
> 15/11/23 05:56:13 INFO server.Server: jetty-8.y.z-SNAPSHOT
>
> 15/11/23 05:56:13 INFO server.AbstractConnector: Started
> SocketConnector@0.0.0.0:34334
>
> 15/11/23 05:56:13 INFO util.Utils: Successfully started service 'HTTP
> class server' on port 34334.
>
> Welcome to
>
>       ____              __
>
>      / __/__  ___ _____/ /__
>
>     _\ \/ _ \/ _ `/ __/  '_/
>
>    /___/ .__/\_,_/_/ /_/\_\   version 1.5.1
>
>       /_/
>
>
>
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.7.0_55)
>
> Type in expressions to have them evaluated.
>
> Type :help for more information.
>
> 15/11/23 05:56:17 INFO spark.SparkContext: Running Spark version 1.5.1
>
> 15/11/23 05:56:17 WARN spark.SparkConf:
>
> SPARK_JAVA_OPTS was detected (set to '-Dspark.driver.port=53411').
>
> This is deprecated in Spark 1.0+.
>
>
>
> Please instead use:
>
>  - ./spark-submit with conf/spark-defaults.conf to set defaults for an
> application
>
>  - ./spark-submit with --driver-java-options to set -X options for a driver
>
>  - spark.executor.extraJavaOptions to set -X options for executors
>
>  - SPARK_DAEMON_JAVA_OPTS to set java options for standalone daemons
> (master or worker)
>
>
>
> 15/11/23 05:56:17 WARN spark.SparkConf: Setting
> 'spark.executor.extraJavaOptions' to '-Dspark.driver.port=53411' as a
> work-around.
>
> 15/11/23 05:56:17 WARN spark.SparkConf: Setting
> 'spark.driver.extraJavaOptions' to '-Dspark.driver.port=53411' as a
> work-around.
>
> 15/11/23 05:56:17 INFO spark.SecurityManager: Changing view acls to: hduser
>
> 15/11/23 05:56:17 INFO spark.SecurityManager: Changing modify acls to:
> hduser
>
> 15/11/23 05:56:17 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view permissions:
> Set(hduser); users with modify permissions: Set(hduser)
>
> 15/11/23 05:56:18 INFO slf4j.Slf4jLogger: Slf4jLogger started
>
> 15/11/23 05:56:18 INFO Remoting: Starting remoting
>
> 15/11/23 05:56:18 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkDriver@192.168.7.87:53411]
>
> 15/11/23 05:56:18 INFO util.Utils: Successfully started service
> 'sparkDriver' on port 53411.
>
> 15/11/23 05:56:18 INFO spark.SparkEnv: Registering MapOutputTracker
>
> 15/11/23 05:56:18 INFO spark.SparkEnv: Registering BlockManagerMaster
>
> 15/11/23 05:56:18 INFO storage.DiskBlockManager: Created local directory
> at /tmp/blockmgr-0232975c-c76b-444d-b7f7-1ef2f28e388c
>
> 15/11/23 05:56:18 INFO storage.MemoryStore: MemoryStore started with
> capacity 530.3 MB
>
> 15/11/23 05:56:18 INFO spark.HttpFileServer: HTTP File server directory is
> /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593/httpd-311975ea-ac22-493d-8fd5-0f48b562a9a5
>
> 15/11/23 05:56:18 INFO spark.HttpServer: Starting HTTP Server
>
> 15/11/23 05:56:18 INFO server.Server: jetty-8.y.z-SNAPSHOT
>
> 15/11/23 05:56:18 INFO server.AbstractConnector: Started
> SocketConnector@0.0.0.0:60477
>
> 15/11/23 05:56:18 INFO util.Utils: Successfully started service 'HTTP file
> server' on port 60477.
>
> 15/11/23 05:56:18 INFO spark.SparkEnv: Registering OutputCommitCoordinator
>
> 15/11/23 05:56:18 INFO server.Server: jetty-8.y.z-SNAPSHOT
>
> 15/11/23 05:56:18 INFO server.AbstractConnector: Started
> SelectChannelConnector@0.0.0.0:4040
>
> 15/11/23 05:56:18 INFO util.Utils: Successfully started service 'SparkUI'
> on port 4040.
>
> 15/11/23 05:56:18 INFO ui.SparkUI: Started SparkUI at
> http://192.168.7.87:4040
>
> 15/11/23 05:56:18 WARN metrics.MetricsSystem: Using default name
> DAGScheduler for source because spark.app.id is not set.
>
> 15/11/23 05:56:18 INFO client.AppClient$ClientEndpoint: Connecting to
> master spark://master:7077...
>
> 15/11/23 05:56:38 ERROR util.SparkUncaughtExceptionHandler: Uncaught
> exception in thread Thread[appclient-registration-retry-thread,5,main]
>
> java.util.concurrent.RejectedExecutionException: Task
> java.util.concurrent.FutureTask@236f0e3a rejected from
> java.util.concurrent.ThreadPoolExecutor@500f1402[Running, pool size = 1,
> active threads = 0, queued tasks = 0, completed tasks = 1]
>
> at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>
> at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>
> at
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
>
> at
> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:96)
>
> at
> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:95)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>
> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
>
> at
> org.apache.spark.deploy.client.AppClient$ClientEndpoint.tryRegisterAllMasters(AppClient.scala:95)
>
> at
> org.apache.spark.deploy.client.AppClient$ClientEndpoint.org$apache$spark$deploy$client$AppClient$ClientEndpoint$$registerWithMaster(AppClient.scala:121)
>
> at
> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:132)
>
> at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)
>
> at
> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:124)
>
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> 15/11/23 05:56:38 INFO storage.DiskBlockManager: Shutdown hook called
>
> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Shutdown hook called
>
> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory
> /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593/httpd-311975ea-ac22-493d-8fd5-0f48b562a9a5
>
> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory
> /tmp/spark-8fefb39a-09b5-443c-b7b4-9c54bce6e245
>
> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory
> /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593/userFiles-b593fc93-c23a-4a9e-aede-ed051f149fcb
>
> 15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory
> /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593
>
>
>
> On Mon, Nov 23, 2015 at 4:19 PM, Mich Talebzadeh <mich@peridale.co.uk>
> wrote:
>
> As example shows all set in hive-core.xml
>
>
>
> <property>
>
>     <name>hive.execution.engine</name>
>
>     *<value>spark</value>*
>
>     <description>
>
>       Expects one of [mr, tez, spark].
>
>       Chooses execution engine. Options are: mr (Map reduce, default) or
> tez (hadoop 2 only)
>
>     </description>
>
>   </property>
>
>
>
> <property>
>
>     <name> spark.eventLog.enabled</name>
>
>     *<value>true</value>*
>
>     <description>
>
>            Spark event log setting
>
>     </description>
>
>   </property>
>
>
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Dasun Hegoda [mailto:dasunhegoda@gmail.com]
> *Sent:* 23 November 2015 10:40
>
>
> *To:* user@hive.apache.org
> *Subject:* Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu
>
>
>
> Thank you very much. This is very informative. Do you know how to set
> these in hive-site.xml?
>
>
>
> hive> set spark.master=<Spark Master URL>
>
> hive> set spark.eventLog.enabled=true;
>
> hive> set spark.eventLog.dir=<Spark event log folder (must exist)>
>
> hive> set spark.executor.memory=512m;
>
> hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
>
>
>
> If these set these in hive-site I think we will be able to get through
>
>
>
> On Mon, Nov 23, 2015 at 3:05 PM, Mich Talebzadeh <mich@peridale.co.uk>
> wrote:
>
> Hi,
>
>
>
> I am looking at the set up here
>
>
>
>
> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
> .
>
>
>
> First this is about configuration of Hive to work with Spark. These are my
> understanding
>
>
>
> 1.    Hive uses Yarn as its resource manager regardless
>
> 2.    Hive uses MapReduce as its execution engine by default
>
> 3.    Changing the execution engine to that of Spark at the configuration
> level. If you look at Hive configuration file ->
>  $HIVE_HOME/conf/hive-site.xml, you will see that default is mr MapReduce
>
> <property>
>
>     <name>hive.execution.engine</name>
>
>     *<value>mr</value>*
>
>     <description>
>
>       Expects one of [mr, tez].
>
>       Chooses execution engine. Options are: mr (Map reduce, default) or
> tez (hadoop 2 only)
>
>     </description>
>
>   </property>
>
>
>
> 4.    If you change that to *spark and restart Hive, *you will force Hive
> to use spark as its engine. So the choice is either do it at the
> configuration level or session level (i.e set set
> hive.execution.engine=spark;). For the rest of parameters you can do the
> same. i.e. at hive-core.xml or at session level. Personally I would still
> want hive to use MR engine so I will create spark-defaults.conf as
> mentioned.
>
> 5.    I then start spark as standalone that works fine
>
> *hduser@rhes564::/usr/lib/spark> ./sbin/start-master.sh*
>
> starting org.apache.spark.deploy.master.Master, logging to
> /usr/lib/spark/sbin/../logs/spark-hduser-org.apache.spark.deploy.master.Master-1-rhes564.out
>
> hduser@rhes564::/usr/lib/spark> more
> /usr/lib/spark/sbin/../logs/spark-hduser-org.apache.spark.deploy.master.Master-1-rhes564.out
>
> Spark Command: /usr/java/latest/bin/java -cp
> /usr/lib/spark/sbin/../conf/:/usr/lib/spark/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/usr/lib/spark/lib/datanucleus-core-3.2.10.jar:/usr/lib/spark/lib/datanucleus-ap
>
> i-jdo-3.2.6.jar:/usr/lib/spark/lib/datanucleus-rdbms-3.2.9.jar -Xms1g
> -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip
> rhes564 --port 7077 --webui-port 8080
>
> ========================================
>
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
>
> 15/11/21 21:41:58 INFO Master: Registered signal handlers for [TERM, HUP,
> INT]
>
> 15/11/21 21:41:58 WARN Utils: Your hostname, rhes564 resolves to a
> loopback address: 127.0.0.1; using 50.140.197.217 instead (on interface
> eth0)
>
> 15/11/21 21:41:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
> another address
>
> 15/11/21 21:41:59 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> 15/11/21 21:41:59 INFO SecurityManager: Changing view acls to: hduser
>
> 15/11/21 21:41:59 INFO SecurityManager: Changing modify acls to: hduser
>
> 15/11/21 21:41:59 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(hduser); users
> with modify permissions: Set(hduser)
>
> 15/11/21 21:41:59 INFO Slf4jLogger: Slf4jLogger started
>
> 15/11/21 21:42:00 INFO Remoting: Starting remoting
>
> 15/11/21 21:42:00 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkMaster@rhes564:7077]
>
> 15/11/21 21:42:00 INFO Utils: Successfully started service 'sparkMaster'
> on port 7077.
>
> 15/11/21 21:42:00 INFO Master: Starting Spark master at
> spark://rhes564:7077
>
> 15/11/21 21:42:00 INFO Master: Running Spark version 1.5.2
>
> 15/11/21 21:42:00 INFO Utils: Successfully started service 'MasterUI' on
> port 8080.
>
> 15/11/21 21:42:00 INFO MasterWebUI: Started MasterWebUI at
> http://50.140.197.217:8080
>
> 15/11/21 21:42:00 INFO Utils: Successfully started service on port 6066.
>
> 15/11/21 21:42:00 INFO StandaloneRestServer: Started REST server for
> submitting applications on port 6066
>
> 15/11/21 21:42:00 INFO Master: I have been elected leader! New state: ALIVE
>
> 6.    Then I try to start interactive spark-shell and it fails with an
> error that I reported before
>
> *hduser@rhes564::/usr/lib/spark/bin> ./spark-shell --master
> spark://rhes564:7077*
>
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
>
> log4j:WARN Please initialize the log4j system properly.
>
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
>
> Using Spark's repl log4j profile:
> org/apache/spark/log4j-defaults-repl.properties
>
> To adjust logging level use sc.setLogLevel("INFO")
>
> Welcome to
>
>       ____              __
>
>      / __/__  ___ _____/ /__
>
>     _\ \/ _ \/ _ `/ __/  '_/
>
>    /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
>
>       /_/
>
>
>
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.7.0_25)
>
> Type in expressions to have them evaluated.
>
> Type :help for more information.
>
> 15/11/23 09:33:56 WARN Utils: Your hostname, rhes564 resolves to a
> loopback address: 127.0.0.1; using 50.140.197.217 instead (on interface
> eth0)
>
> 15/11/23 09:33:56 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
> another address
>
> 15/11/23 09:33:57 WARN MetricsSystem: Using default name DAGScheduler for
> source because spark.app.id is not set.
>
> Spark context available as sc.
>
> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name
> hive.server2.thrift.http.min.worker.threads does not exist
>
> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name
> hive.mapjoin.optimized.keys does not exist
>
> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name
> hive.mapjoin.lazy.hashtable does not exist
>
> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name
> hive.server2.thrift.http.max.worker.threads does not exist
>
> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name
> hive.server2.logging.operation.verbose does not exist
>
> 15/11/23 09:34:00 WARN HiveConf: HiveConf of name
> hive.optimize.multigroupby.common.distincts does not exist
>
> *java.lang.RuntimeException: java.lang.RuntimeException: The root scratch
> dir: /tmp/hive on HDFS should be writable. Current permissions are:
> rwx------*
>
>
>
> That is where I am now and I have reported this spark user group but no
> luck yet.
>
>
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Dasun Hegoda [mailto:dasunhegoda@gmail.com]
> *Sent:* 23 November 2015 07:05
> *To:* user@hive.apache.org
> *Subject:* Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu
>
>
>
> Anyone????
>
>
>
> On Sat, Nov 21, 2015 at 1:32 PM, Dasun Hegoda <dasunhegoda@gmail.com>
> wrote:
>
> Thank you very much but I would like to do the integration of these
> components myself rather than using a packaged distribution. I think I have
> come to right place. Can you please kindly tell me the configuration
> steps run Hive on Spark?
>
>
>
> At least someone please elaborate these steps.
>
>
> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
> .
>
>
>
> Because at the latter part of the guide configurations are set in the
> Hive runtime shell which is not permanent according to my knowledge.
>
>
>
> Please help me to get this done. Also I'm planning write a detailed guide
> with configuration steps to run Hive on Spark. So others can benefited from
> it and not troubled like me.
>
>
>
> Can someone please kindly tell me the configuration steps run Hive on
> Spark?
>
>
>
>
>
> On Sat, Nov 21, 2015 at 12:28 PM, Sai Gopalakrishnan <
> sai.gopalakrishnan@aspiresys.com> wrote:
>
> Hi everyone,
>
>
>
> Thank you for your responses. I think Mich's suggestion is a great one,
> will go with it. As Alan suggested, using compactor in Hive should help out
> with managing the delta files.
>
>
>
> @Dasun, pardon me for deviating from the topic. Regarding configuration,
> you could try a packaged distribution (Hortonworks , Cloudera or MapR)
> like  Jörn Franke said. I use Hortonworks, its open-source and compatible
> with Linux and Windows, provides detailed documentation for installation
> and can be installed in less than a day provided you're all set with the
> hardware. http://hortonworks.com/hdp/downloads/
>
> [image: Image removed by sender.] <http://hortonworks.com/hdp/downloads/>
>
> Download Hadoop - Hortonworks
>
> Download Apache Hadoop for the enterprise with Hortonworks Data Platform.
> Data access, storage, governance, security and operations across Linux and
> Windows
>
> Read more... <http://hortonworks.com/hdp/downloads/>
>
>
>
>
>
> Regards,
>
> Sai
>
>
> ------------------------------
>
> *From:* Dasun Hegoda <dasunhegoda@gmail.com>
> *Sent:* Saturday, November 21, 2015 8:00 AM
> *To:* user@hive.apache.org
> *Subject:* Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu
>
>
>
> Hi Mich, Hi Sai, Hi Jorn,
>
> Thank you very much for the information. I think we are deviating from the
> original question. Hive on Spark on Ubuntu. Can you please kindly tell me
> the configuration steps?
>
>
>
>
>
>
>
> On Fri, Nov 20, 2015 at 11:10 PM, Jörn Franke <jornfranke@gmail.com>
> wrote:
>
> I think the most recent versions of cloudera or Hortonworks should include
> all these components - try their Sandboxes.
>
>
> On 20 Nov 2015, at 12:54, Dasun Hegoda <dasunhegoda@gmail.com> wrote:
>
> Where can I get a Hadoop distribution containing these technologies? Link?
>
>
>
> On Fri, Nov 20, 2015 at 5:22 PM, Jörn Franke <jornfranke@gmail.com> wrote:
>
> I recommend to use a Hadoop distribution containing these technologies. I
> think you get also other useful tools for your scenario, such as Auditing
> using sentry or ranger.
>
>
> On 20 Nov 2015, at 10:48, Mich Talebzadeh <mich@peridale.co.uk> wrote:
>
> Well
>
>
>
> “I'm planning to deploy Hive on Spark but I can't find the installation
> steps. I tried to read the official '[Hive on Spark][1]' guide but it has
> problems. As an example it says under 'Configuring Yarn'
> `yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler`
> but does not imply where should I do it. Also as per the guide
> configurations are set in the Hive runtime shell which is not permanent
> according to my knowledge.”
>
>
>
> You can do that in yarn-site.xml file which is normally under
> $HADOOP_HOME/etc/hadoop.
>
>
>
>
>
> HTH
>
>
>
>
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4,
> volume one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Dasun Hegoda [mailto:dasunhegoda@gmail.com <dasunhegoda@gmail.com>]
>
> *Sent:* 20 November 2015 09:36
> *To:* user@hive.apache.org
> *Subject:* Hive on Spark - Hadoop 2 - Installation - Ubuntu
>
>
>
> Hi,
>
>
>
> What I'm planning to do is develop a reporting platform using existing
> data. I have an existing RDBMS which has large number of records. So I'm
> using. (
> http://stackoverflow.com/questions/33635234/hadoop-2-7-spark-hive-jasperreports-scoop-architecuture
> )
>
>
>
>  - Scoop - Extract data from RDBMS to Hadoop
>
>  - Hadoop - Storage platform -> *Deployment Completed*
>
>  - Hive - Datawarehouse
>
>  - Spark - Read time processing -> *Deployment Completed*
>
>
>
> I'm planning to deploy Hive on Spark but I can't find the installation
> steps. I tried to read the official '[Hive on Spark][1]' guide but it has
> problems. As an example it says under 'Configuring Yarn'
> `yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler`
> but does not imply where should I do it. Also as per the guide
> configurations are set in the Hive runtime shell which is not permanent
> according to my knowledge.
>
>
>
> Given that I read [this][2] but it does not have any steps.
>
>
>
> Please provide me the steps to run Hive on Spark on Ubuntu as a production
> system?
>
>
>
>
>
>   [1]:
> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
>
>   [2]:
> http://stackoverflow.com/questions/26018306/how-to-configure-hive-to-use-spark
>
>
>
> --
>
> Regards,
>
> Dasun Hegoda, Software Engineer
> www.dasunhegoda.com | dasunhegoda@gmail.com
>
>
>
>
>
> --
>
> Regards,
>
> Dasun Hegoda, Software Engineer
> www.dasunhegoda.com | dasunhegoda@gmail.com
>
>
>
>
>
> --
>
> Regards,
>
> Dasun Hegoda, Software Engineer
> www.dasunhegoda.com | dasunhegoda@gmail.com
>
> [image: Image removed by sender. Aspire Systems]
>
> This e-mail message and any attachments are for the sole use of the
> intended recipient(s) and may contain proprietary, confidential, trade
> secret or privileged information. Any unauthorized review, use, disclosure
> or distribution is prohibited and may be a violation of law. If you are not
> the intended recipient, please contact the sender by reply e-mail and
> destroy all copies of the original message.
>
>
>
>
>
> --
>
> Regards,
>
> Dasun Hegoda, Software Engineer
> www.dasunhegoda.com | dasunhegoda@gmail.com
>
>
>
>
>
> --
>
> Regards,
>
> Dasun Hegoda, Software Engineer
> www.dasunhegoda.com | dasunhegoda@gmail.com
>
>
>
>
>
> --
>
> Regards,
>
> Dasun Hegoda, Software Engineer
> www.dasunhegoda.com | dasunhegoda@gmail.com
>
>
>
>
>
> --
>
> Regards,
>
> Dasun Hegoda, Software Engineer
> www.dasunhegoda.com | dasunhegoda@gmail.com
>
>
>
>
>
> --
>
> Regards,
>
> Dasun Hegoda, Software Engineer
> www.dasunhegoda.com | dasunhegoda@gmail.com
>
>
>
>
>
> --
>
> Regards,
>
> Dasun Hegoda, Software Engineer
> www.dasunhegoda.com | dasunhegoda@gmail.com
>
>
>
>
>
> --
>
> Regards,
>
> Dasun Hegoda, Software Engineer
> www.dasunhegoda.com | dasunhegoda@gmail.com
>



-- 
Regards,
Dasun Hegoda, Software Engineer
www.dasunhegoda.com | dasunhegoda@gmail.com

Mime
View raw message