hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mich Talebzadeh" <m...@peridale.co.uk>
Subject RE: Hive on Spark - Hadoop 2 - Installation - Ubuntu
Date Fri, 27 Nov 2015 16:20:54 GMT
Hi,

 

I download the latest version of Hive 1.2.1 few days ago and upgraded from Hive 0.14.0 to
Hive 1.2.1. Pretty straight forward and minimal changes to Metastore schema (mine is on Oracle).

 

Now I have no problem making Spark work with Hive when a pre-compiled version of Spark like
1.5.2 is downloaded. For example you can create in Spark via Scala and that will be seen through
Hive.

 

However, that is not my primary concern. I don’t want to run Spark standalone or as an application
with Hive.

 

My prime interest is to see if I can make Hive to use Spark as its execution engine, as opposed
to the long established MapReduce engine that Hive uses.

 

As of now I have not succeeded to get it working. I have downloaded Spark source code for
versions 1.5.1, 1.4, 1.3 etc and created projects using mvn and tar files. However, I have
not succeeded even starting spark master (start-master.sh). It just crashes with errors I
have reported before. The reason seem to be that a jar file in $SPARK_HOME/lib (sorry I cannot
recall itd name now) works fine in the pre-built but is much smaller in lib directory when
spark is built from source. Indeed if you copy the original one from the pre-built lib directory
you will be able to start master node. However, that is not a solution.

 

I am sure someone in this forum with much better knowledge of Java should be able to come
up with some solution.

 

HTH,

 

Mich

 

 

From: Dasun Hegoda [mailto:dasunhegoda@gmail.com] 
Sent: 27 November 2015 15:13
To: user@hive.apache.org
Subject: Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

 

Hey!

 

Thanks for the clarification. I have been to struggling to deploy hive on spark for 3 weeks
now. Still no luck. I can't believe that even Hive experts here don't know about it. I'm wondering
what to do.

 

Any guesses???

 

On Fri, Nov 27, 2015 at 3:52 PM, Mich Talebzadeh <mich@peridale.co.uk <mailto:mich@peridale.co.uk>
> wrote:

This should work as long as $SPARK_HOME has been setup and your CLASSPATH includes spark jars.

 

Also bear in mind that this will work OK BUT crucially Hive will not be able to use Spark
engine with pre-built Spark binary downloads

 

Example

 

spark-shell --master spark://rhes564:7077

 

/11/27 10:19:25 INFO spark.SecurityManager: Changing view acls to: hduser

15/11/27 10:19:25 INFO spark.SecurityManager: Changing modify acls to: hduser

15/11/27 10:19:25 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui
acls disabled; users with view permissions: Set(hduser); users with modify permissions: Set(hduser)

15/11/27 10:19:25 INFO spark.HttpServer: Starting HTTP Server

15/11/27 10:19:25 INFO server.Server: jetty-8.y.z-SNAPSHOT

15/11/27 10:19:25 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:22613 <http://SocketConnector@0.0.0.0:22613>


15/11/27 10:19:25 INFO util.Utils: Successfully started service 'HTTP class server' on port
22613.

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 1.5.2

      /_/

 

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_25)

Type in expressions to have them evaluated.

Type :help for more information.

15/11/27 10:19:29 WARN util.Utils: Your hostname, rhes564 resolves to a loopback address:
127.0.0.1; using 50.140.197.217 instead (on interface eth0)

15/11/27 10:19:29 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address

15/11/27 10:19:29 INFO spark.SparkContext: Running Spark version 1.5.2

15/11/27 10:19:29 INFO spark.SecurityManager: Changing view acls to: hduser

15/11/27 10:19:29 INFO spark.SecurityManager: Changing modify acls to: hduser

15/11/27 10:19:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui
acls disabled; users with view permissions: Set(hduser); users with modify permissions: Set(hduser)

15/11/27 10:19:30 INFO slf4j.Slf4jLogger: Slf4jLogger started

15/11/27 10:19:30 INFO Remoting: Starting remoting

15/11/27 10:19:30 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@50.140.197.217:61620
<http://sparkDriver@50.140.197.217:61620> ]

15/11/27 10:19:30 INFO util.Utils: Successfully started service 'sparkDriver' on port 61620.

15/11/27 10:19:30 INFO spark.SparkEnv: Registering MapOutputTracker

15/11/27 10:19:30 INFO spark.SparkEnv: Registering BlockManagerMaster

15/11/27 10:19:30 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-eae28f3e-f878-4591-85f0-e8a66c6acb02

15/11/27 10:19:30 INFO storage.MemoryStore: MemoryStore started with capacity 529.9 MB

15/11/27 10:19:30 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-75cd7444-5cf7-4175-a15b-6c3882c9d146/httpd-8dc465d5-664d-4cef-86a8-d4e8b34f4146

15/11/27 10:19:30 INFO spark.HttpServer: Starting HTTP Server

15/11/27 10:19:30 INFO server.Server: jetty-8.y.z-SNAPSHOT

15/11/27 10:19:30 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:44656 <http://SocketConnector@0.0.0.0:44656>


15/11/27 10:19:30 INFO util.Utils: Successfully started service 'HTTP file server' on port
44656.

15/11/27 10:19:30 INFO spark.SparkEnv: Registering OutputCommitCoordinator

15/11/27 10:19:30 INFO server.Server: jetty-8.y.z-SNAPSHOT

15/11/27 10:19:30 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
<http://SelectChannelConnector@0.0.0.0:4040> 

15/11/27 10:19:30 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.

15/11/27 10:19:30 INFO ui.SparkUI: Started SparkUI at http://50.140.197.217:4040

15/11/27 10:19:30 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because
spark.app.id <http://spark.app.id>  is not set.

15/11/27 10:19:30 INFO client.AppClient$ClientEndpoint: Connecting to master spark://rhes564:7077...

15/11/27 10:19:31 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with
app ID app-20151127101931-0001

15/11/27 10:19:31 INFO client.AppClient$ClientEndpoint: Executor added: app-20151127101931-0001/0
on worker-20151127100137-50.140.197.217-38428 (50.140.197.217:38428 <http://50.140.197.217:38428>
) with 12 cores

15/11/27 10:19:31 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20151127101931-0001/0
on hostPort 50.140.197.217:38428 <http://50.140.197.217:38428>  with 12 cores, 1024.0
MB RAM

15/11/27 10:19:31 INFO client.AppClient$ClientEndpoint: Executor updated: app-20151127101931-0001/0
is now LOADING

15/11/27 10:19:31 INFO client.AppClient$ClientEndpoint: Executor updated: app-20151127101931-0001/0
is now RUNNING

15/11/27 10:19:31 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService'
on port 19761.

15/11/27 10:19:31 INFO netty.NettyBlockTransferService: Server created on 19761

15/11/27 10:19:31 INFO storage.BlockManagerMaster: Trying to register BlockManager

15/11/27 10:19:31 INFO storage.BlockManagerMasterEndpoint: Registering block manager 50.140.197.217:19761
<http://50.140.197.217:19761>  with 529.9 MB RAM, BlockManagerId(driver, 50.140.197.217,
19761)

15/11/27 10:19:31 INFO storage.BlockManagerMaster: Registered BlockManager

15/11/27 10:19:31 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for
scheduling beginning after reached minRegisteredResourcesRatio: 0.0

15/11/27 10:19:31 INFO repl.SparkILoop: Created spark context..

Spark context available as sc.

15/11/27 10:19:31 INFO hive.HiveContext: Initializing execution hive, version 1.2.1

15/11/27 10:19:31 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0

15/11/27 10:19:31 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims
for Hadoop version 2.6.0

15/11/27 10:19:32 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083

15/11/27 10:19:32 INFO hive.metastore: Connected to metastore.

15/11/27 10:19:32 INFO session.SessionState: Created local directory: /tmp/hive/b8bba1a1-646b-4734-bad3-4c1d6cb9344d_resources

15/11/27 10:19:32 INFO session.SessionState: Created HDFS directory: /tmp/hive/hduser/b8bba1a1-646b-4734-bad3-4c1d6cb9344d

15/11/27 10:19:32 INFO session.SessionState: Created local directory: /tmp/hive/b8bba1a1-646b-4734-bad3-4c1d6cb9344d

15/11/27 10:19:32 INFO session.SessionState: Created HDFS directory: /tmp/hive/hduser/b8bba1a1-646b-4734-bad3-4c1d6cb9344d/_tmp_space.db

15/11/27 10:19:32 INFO hive.HiveContext: default warehouse location is /user/hive/warehouse

15/11/27 10:19:32 INFO hive.HiveContext: Initializing HiveMetastoreConnection version 1.2.1
using Spark classes.

15/11/27 10:19:32 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0

15/11/27 10:19:33 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims
for Hadoop version 2.6.0

15/11/27 10:19:33 INFO cluster.SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@50.140.197.217:55017/user/Executor#1724631850
<http://sparkExecutor@50.140.197.217:55017/user/Executor#1724631850> ]) with ID 0

15/11/27 10:19:33 INFO storage.BlockManagerMasterEndpoint: Registering block manager 50.140.197.217:25122
<http://50.140.197.217:25122>  with 529.9 MB RAM, BlockManagerId(0, 50.140.197.217,
25122)

15/11/27 10:19:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable

15/11/27 10:19:33 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083

15/11/27 10:19:33 INFO hive.metastore: Connected to metastore.

15/11/27 10:19:34 INFO session.SessionState: Created local directory: /tmp/hive/d2b5c2bd-3989-4a72-a99f-885356f02f8b_resources

15/11/27 10:19:34 INFO session.SessionState: Created HDFS directory: /tmp/hive/hduser/d2b5c2bd-3989-4a72-a99f-885356f02f8b

15/11/27 10:19:34 INFO session.SessionState: Created local directory: /tmp/hive/d2b5c2bd-3989-4a72-a99f-885356f02f8b

15/11/27 10:19:34 INFO session.SessionState: Created HDFS directory: /tmp/hive/hduser/d2b5c2bd-3989-4a72-a99f-885356f02f8b/_tmp_space.db

15/11/27 10:19:34 INFO repl.SparkILoop: Created sql context (with Hive support)..

SQL context available as sqlContext.

 

scala>

 

However, that is of little use to me cause I want to use Spark as Hive engine for faster performance
compared to MapReduce engine. Spark as a fully built application does not work as an engine
alone! For that I need to build Spark WITHOUT HAVE Jars and use it as engine as opposed to
standalone application.

 

HTH

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.


co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the
designated recipient only, if you are not the intended recipient, you should destroy it immediately.
Any information in this message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility
of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd,
its subsidiaries nor their employees accept any responsibility.

 

From: Dasun Hegoda [mailto:dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com> ] 
Sent: 27 November 2015 05:11
To: user@hive.apache.org <mailto:user@hive.apache.org> 
Subject: Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

 

This works fine for me

 

spark-shell --master yarn-client

 

On Tue, Nov 24, 2015 at 11:43 AM, Dasun Hegoda <dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>
> wrote:

Hey floks,

 

Any updates?

 

On Mon, Nov 23, 2015 at 5:15 PM, Dasun Hegoda <dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>
> wrote:

Do you have any clue how to get his fixed?

 

On Mon, Nov 23, 2015 at 4:27 PM, Dasun Hegoda <dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>
> wrote:

I get this now. It's different than what you get

 

hduser@master:~/spark-1.5.1-bin-hadoop2.6/bin$ <mailto:hduser@master:~/spark-1.5.1-bin-hadoop2.6/bin$>
 ./spark-shell 

15/11/23 05:56:13 INFO spark.SecurityManager: Changing view acls to: hduser

15/11/23 05:56:13 INFO spark.SecurityManager: Changing modify acls to: hduser

15/11/23 05:56:13 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui
acls disabled; users with view permissions: Set(hduser); users with modify permissions: Set(hduser)

15/11/23 05:56:13 INFO spark.HttpServer: Starting HTTP Server

15/11/23 05:56:13 INFO server.Server: jetty-8.y.z-SNAPSHOT

15/11/23 05:56:13 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:34334 <http://SocketConnector@0.0.0.0:34334>


15/11/23 05:56:13 INFO util.Utils: Successfully started service 'HTTP class server' on port
34334.

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 1.5.1

      /_/

 

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55)

Type in expressions to have them evaluated.

Type :help for more information.

15/11/23 05:56:17 INFO spark.SparkContext: Running Spark version 1.5.1

15/11/23 05:56:17 WARN spark.SparkConf: 

SPARK_JAVA_OPTS was detected (set to '-Dspark.driver.port=53411').

This is deprecated in Spark 1.0+.

 

Please instead use:

 - ./spark-submit with conf/spark-defaults.conf to set defaults for an application

 - ./spark-submit with --driver-java-options to set -X options for a driver

 - spark.executor.extraJavaOptions to set -X options for executors

 - SPARK_DAEMON_JAVA_OPTS to set java options for standalone daemons (master or worker)

        

15/11/23 05:56:17 WARN spark.SparkConf: Setting 'spark.executor.extraJavaOptions' to '-Dspark.driver.port=53411'
as a work-around.

15/11/23 05:56:17 WARN spark.SparkConf: Setting 'spark.driver.extraJavaOptions' to '-Dspark.driver.port=53411'
as a work-around.

15/11/23 05:56:17 INFO spark.SecurityManager: Changing view acls to: hduser

15/11/23 05:56:17 INFO spark.SecurityManager: Changing modify acls to: hduser

15/11/23 05:56:17 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui
acls disabled; users with view permissions: Set(hduser); users with modify permissions: Set(hduser)

15/11/23 05:56:18 INFO slf4j.Slf4jLogger: Slf4jLogger started

15/11/23 05:56:18 INFO Remoting: Starting remoting

15/11/23 05:56:18 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.7.87:53411
<http://sparkDriver@192.168.7.87:53411> ]

15/11/23 05:56:18 INFO util.Utils: Successfully started service 'sparkDriver' on port 53411.

15/11/23 05:56:18 INFO spark.SparkEnv: Registering MapOutputTracker

15/11/23 05:56:18 INFO spark.SparkEnv: Registering BlockManagerMaster

15/11/23 05:56:18 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-0232975c-c76b-444d-b7f7-1ef2f28e388c

15/11/23 05:56:18 INFO storage.MemoryStore: MemoryStore started with capacity 530.3 MB

15/11/23 05:56:18 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593/httpd-311975ea-ac22-493d-8fd5-0f48b562a9a5

15/11/23 05:56:18 INFO spark.HttpServer: Starting HTTP Server

15/11/23 05:56:18 INFO server.Server: jetty-8.y.z-SNAPSHOT

15/11/23 05:56:18 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:60477 <http://SocketConnector@0.0.0.0:60477>


15/11/23 05:56:18 INFO util.Utils: Successfully started service 'HTTP file server' on port
60477.

15/11/23 05:56:18 INFO spark.SparkEnv: Registering OutputCommitCoordinator

15/11/23 05:56:18 INFO server.Server: jetty-8.y.z-SNAPSHOT

15/11/23 05:56:18 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
<http://SelectChannelConnector@0.0.0.0:4040> 

15/11/23 05:56:18 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.

15/11/23 05:56:18 INFO ui.SparkUI: Started SparkUI at http://192.168.7.87:4040

15/11/23 05:56:18 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because
spark.app.id <http://spark.app.id>  is not set.

15/11/23 05:56:18 INFO client.AppClient$ClientEndpoint: Connecting to master spark://master:7077...

15/11/23 05:56:38 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main]

java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@236f0e3a
<mailto:java.util.concurrent.FutureTask@236f0e3a>  rejected from java.util.concurrent.ThreadPoolExecutor@500f1402[Running
<mailto:java.util.concurrent.ThreadPoolExecutor@500f1402[Running> , pool size = 1, active
threads = 0, queued tasks = 0, completed tasks = 1]

at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)

at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)

at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)

at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)

at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:96)

at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:95)

at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)

at org.apache.spark.deploy.client.AppClient$ClientEndpoint.tryRegisterAllMasters(AppClient.scala:95)

at org.apache.spark.deploy.client.AppClient$ClientEndpoint.org$apache$spark$deploy$client$AppClient$ClientEndpoint$$registerWithMaster(AppClient.scala:121)

at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:132)

at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)

at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:124)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

15/11/23 05:56:38 INFO storage.DiskBlockManager: Shutdown hook called

15/11/23 05:56:38 INFO util.ShutdownHookManager: Shutdown hook called

15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593/httpd-311975ea-ac22-493d-8fd5-0f48b562a9a5

15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-8fefb39a-09b5-443c-b7b4-9c54bce6e245

15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593/userFiles-b593fc93-c23a-4a9e-aede-ed051f149fcb

15/11/23 05:56:38 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2413b536-c845-4964-a96d-973e5ec02593

 

On Mon, Nov 23, 2015 at 4:19 PM, Mich Talebzadeh <mich@peridale.co.uk <mailto:mich@peridale.co.uk>
> wrote:

As example shows all set in hive-core.xml

 

<property>

    <name>hive.execution.engine</name>

    <value>spark</value>

    <description>

      Expects one of [mr, tez, spark].

      Chooses execution engine. Options are: mr (Map reduce, default) or tez (hadoop 2 only)

    </description>

  </property>

 

<property>

    <name> spark.eventLog.enabled</name>

    <value>true</value>

    <description>

           Spark event log setting

    </description>

  </property>

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.


co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the
designated recipient only, if you are not the intended recipient, you should destroy it immediately.
Any information in this message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility
of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd,
its subsidiaries nor their employees accept any responsibility.

 

From: Dasun Hegoda [mailto:dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com> ] 
Sent: 23 November 2015 10:40


To: user@hive.apache.org <mailto:user@hive.apache.org> 
Subject: Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

 

Thank you very much. This is very informative. Do you know how to set these in hive-site.xml?

 

hive> set spark.master=<Spark Master URL>

hive> set spark.eventLog.enabled=true;

hive> set spark.eventLog.dir=<Spark event log folder (must exist)>

hive> set spark.executor.memory=512m;             

hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;

 

If these set these in hive-site I think we will be able to get through

 

On Mon, Nov 23, 2015 at 3:05 PM, Mich Talebzadeh <mich@peridale.co.uk <mailto:mich@peridale.co.uk>
> wrote:

Hi,

 

I am looking at the set up here

 

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started.

 

First this is about configuration of Hive to work with Spark. These are my understanding

 

1.    Hive uses Yarn as its resource manager regardless

2.    Hive uses MapReduce as its execution engine by default

3.    Changing the execution engine to that of Spark at the configuration level. If you look
at Hive configuration file ->  $HIVE_HOME/conf/hive-site.xml, you will see that default
is mr MapReduce

<property>

    <name>hive.execution.engine</name>

    <value>mr</value>

    <description>

      Expects one of [mr, tez].

      Chooses execution engine. Options are: mr (Map reduce, default) or tez (hadoop 2 only)

    </description>

  </property>

 

4.    If you change that to spark and restart Hive, you will force Hive to use spark as its
engine. So the choice is either do it at the configuration level or session level (i.e set
set hive.execution.engine=spark;). For the rest of parameters you can do the same. i.e. at
hive-core.xml or at session level. Personally I would still want hive to use MR engine so
I will create spark-defaults.conf as mentioned.

5.    I then start spark as standalone that works fine

hduser@rhes564::/usr/lib/spark> ./sbin/start-master.sh

starting org.apache.spark.deploy.master.Master, logging to /usr/lib/spark/sbin/../logs/spark-hduser-org.apache.spark.deploy.master.Master-1-rhes564.out

hduser@rhes564::/usr/lib/spark> more  /usr/lib/spark/sbin/../logs/spark-hduser-org.apache.spark.deploy.master.Master-1-rhes564.out

Spark Command: /usr/java/latest/bin/java -cp /usr/lib/spark/sbin/../conf/:/usr/lib/spark/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/usr/lib/spark/lib/datanucleus-core-3.2.10.jar:/usr/lib/spark/lib/datanucleus-ap

i-jdo-3.2.6.jar:/usr/lib/spark/lib/datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g -XX:MaxPermSize=256m
org.apache.spark.deploy.master.Master --ip rhes564 --port 7077 --webui-port 8080

========================================

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

15/11/21 21:41:58 INFO Master: Registered signal handlers for [TERM, HUP, INT]

15/11/21 21:41:58 WARN Utils: Your hostname, rhes564 resolves to a loopback address: 127.0.0.1;
using 50.140.197.217 instead (on interface eth0)

15/11/21 21:41:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address

15/11/21 21:41:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable

15/11/21 21:41:59 INFO SecurityManager: Changing view acls to: hduser

15/11/21 21:41:59 INFO SecurityManager: Changing modify acls to: hduser

15/11/21 21:41:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls
disabled; users with view permissions: Set(hduser); users with modify permissions: Set(hduser)

15/11/21 21:41:59 INFO Slf4jLogger: Slf4jLogger started

15/11/21 21:42:00 INFO Remoting: Starting remoting

15/11/21 21:42:00 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkMaster@rhes564:7077]

15/11/21 21:42:00 INFO Utils: Successfully started service 'sparkMaster' on port 7077.

15/11/21 21:42:00 INFO Master: Starting Spark master at spark://rhes564:7077

15/11/21 21:42:00 INFO Master: Running Spark version 1.5.2

15/11/21 21:42:00 INFO Utils: Successfully started service 'MasterUI' on port 8080.

15/11/21 21:42:00 INFO MasterWebUI: Started MasterWebUI at http://50.140.197.217:8080

15/11/21 21:42:00 INFO Utils: Successfully started service on port 6066.

15/11/21 21:42:00 INFO StandaloneRestServer: Started REST server for submitting applications
on port 6066

15/11/21 21:42:00 INFO Master: I have been elected leader! New state: ALIVE

6.    Then I try to start interactive spark-shell and it fails with an error that I reported
before

hduser@rhes564::/usr/lib/spark/bin> ./spark-shell --master spark://rhes564:7077

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties

To adjust logging level use sc.setLogLevel("INFO")

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 1.5.2

      /_/

 

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_25)

Type in expressions to have them evaluated.

Type :help for more information.

15/11/23 09:33:56 WARN Utils: Your hostname, rhes564 resolves to a loopback address: 127.0.0.1;
using 50.140.197.217 instead (on interface eth0)

15/11/23 09:33:56 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address

15/11/23 09:33:57 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id
<http://spark.app.id>  is not set.

Spark context available as sc.

15/11/23 09:34:00 WARN HiveConf: HiveConf of name hive.server2.thrift.http.min.worker.threads
does not exist

15/11/23 09:34:00 WARN HiveConf: HiveConf of name hive.mapjoin.optimized.keys does not exist

15/11/23 09:34:00 WARN HiveConf: HiveConf of name hive.mapjoin.lazy.hashtable does not exist

15/11/23 09:34:00 WARN HiveConf: HiveConf of name hive.server2.thrift.http.max.worker.threads
does not exist

15/11/23 09:34:00 WARN HiveConf: HiveConf of name hive.server2.logging.operation.verbose does
not exist

15/11/23 09:34:00 WARN HiveConf: HiveConf of name hive.optimize.multigroupby.common.distincts
does not exist

java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on
HDFS should be writable. Current permissions are: rwx------

 

That is where I am now and I have reported this spark user group but no luck yet. 

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.


co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the
designated recipient only, if you are not the intended recipient, you should destroy it immediately.
Any information in this message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility
of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd,
its subsidiaries nor their employees accept any responsibility.

 

From: Dasun Hegoda [mailto:dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com> ] 
Sent: 23 November 2015 07:05
To: user@hive.apache.org <mailto:user@hive.apache.org> 
Subject: Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu

 

Anyone????

 

On Sat, Nov 21, 2015 at 1:32 PM, Dasun Hegoda <dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>
> wrote:

Thank you very much but I would like to do the integration of these components myself rather
than using a packaged distribution. I think I have come to right place. Can you please kindly
tell me the configuration steps run Hive on Spark?

 

At least someone please elaborate these steps.

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started.

 

Because at the latter part of the guide configurations are set in the Hive runtime shell which
is not permanent according to my knowledge.

 

Please help me to get this done. Also I'm planning write a detailed guide with configuration
steps to run Hive on Spark. So others can benefited from it and not troubled like me.

 

Can someone please kindly tell me the configuration steps run Hive on Spark?

 

 

On Sat, Nov 21, 2015 at 12:28 PM, Sai Gopalakrishnan <sai.gopalakrishnan@aspiresys.com
<mailto:sai.gopalakrishnan@aspiresys.com> > wrote:

Hi everyone,

 

Thank you for your responses. I think Mich's suggestion is a great one, will go with it. As
Alan suggested, using compactor in Hive should help out with managing the delta files.

 

@Dasun, pardon me for deviating from the topic. Regarding configuration, you could try a packaged
distribution (Hortonworks , Cloudera or MapR) like  Jörn Franke said. I use Hortonworks,
its open-source and compatible with Linux and Windows, provides detailed documentation for
installation and can be installed in less than a day provided you're all set with the hardware.
http://hortonworks.com/hdp/downloads/ 


 <http://hortonworks.com/hdp/downloads/> 

Download Hadoop - Hortonworks

Download Apache Hadoop for the enterprise with Hortonworks Data Platform. Data access, storage,
governance, security and operations across Linux and Windows

 <http://hortonworks.com/hdp/downloads/> Read more...

 

 

Regards,

Sai

 


  _____  


From: Dasun Hegoda <dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com> >
Sent: Saturday, November 21, 2015 8:00 AM
To: user@hive.apache.org <mailto:user@hive.apache.org> 
Subject: Re: Hive on Spark - Hadoop 2 - Installation - Ubuntu 

 

Hi Mich, Hi Sai, Hi Jorn,

Thank you very much for the information. I think we are deviating from the original question.
Hive on Spark on Ubuntu. Can you please kindly tell me the configuration steps?

 

 

 

On Fri, Nov 20, 2015 at 11:10 PM, Jörn Franke <jornfranke@gmail.com <mailto:jornfranke@gmail.com>
> wrote:

I think the most recent versions of cloudera or Hortonworks should include all these components
- try their Sandboxes. 


On 20 Nov 2015, at 12:54, Dasun Hegoda <dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>
> wrote:

Where can I get a Hadoop distribution containing these technologies? Link?

 

On Fri, Nov 20, 2015 at 5:22 PM, Jörn Franke <jornfranke@gmail.com <mailto:jornfranke@gmail.com>
> wrote:

I recommend to use a Hadoop distribution containing these technologies. I think you get also
other useful tools for your scenario, such as Auditing using sentry or ranger.


On 20 Nov 2015, at 10:48, Mich Talebzadeh <mich@peridale.co.uk <mailto:mich@peridale.co.uk>
> wrote:

Well

 

“I'm planning to deploy Hive on Spark but I can't find the installation steps. I tried to
read the official '[Hive on Spark][1]' guide but it has problems. As an example it says under
'Configuring Yarn' `yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler`
but does not imply where should I do it. Also as per the guide configurations are set in the
Hive runtime shell which is not permanent according to my knowledge.”

 

You can do that in yarn-site.xml file which is normally under $HADOOP_HOME/etc/hadoop.

 

 

HTH

 

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.


co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the
designated recipient only, if you are not the intended recipient, you should destroy it immediately.
Any information in this message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility
of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd,
its subsidiaries nor their employees accept any responsibility.

 

From: Dasun Hegoda [mailto:dasunhegoda@gmail.com] 
Sent: 20 November 2015 09:36
To: user@hive.apache.org <mailto:user@hive.apache.org> 
Subject: Hive on Spark - Hadoop 2 - Installation - Ubuntu

 

Hi,

 

What I'm planning to do is develop a reporting platform using existing data. I have an existing
RDBMS which has large number of records. So I'm using. (http://stackoverflow.com/questions/33635234/hadoop-2-7-spark-hive-jasperreports-scoop-architecuture)

 

 - Scoop - Extract data from RDBMS to Hadoop

 - Hadoop - Storage platform -> *Deployment Completed*

 - Hive - Datawarehouse

 - Spark - Read time processing -> *Deployment Completed*

 

I'm planning to deploy Hive on Spark but I can't find the installation steps. I tried to read
the official '[Hive on Spark][1]' guide but it has problems. As an example it says under 'Configuring
Yarn' `yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler`
but does not imply where should I do it. Also as per the guide configurations are set in the
Hive runtime shell which is not permanent according to my knowledge.

 

Given that I read [this][2] but it does not have any steps.

 

Please provide me the steps to run Hive on Spark on Ubuntu as a production system?

 

 

  [1]: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

  [2]: http://stackoverflow.com/questions/26018306/how-to-configure-hive-to-use-spark

 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>






 

-- 

Regards, 

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>






 

-- 

Regards, 

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>




This e-mail message and any attachments are for the sole use of the intended recipient(s)
and may contain proprietary, confidential, trade secret or privileged information. Any unauthorized
review, use, disclosure or distribution is prohibited and may be a violation of law. If you
are not the intended recipient, please contact the sender by reply e-mail and destroy all
copies of the original message. 





 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>






 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>






 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>






 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>






 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>






 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>






 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>






 

-- 

Regards,

Dasun Hegoda, Software Engineer  
www.dasunhegoda.com <http://www.dasunhegoda.com/>  | dasunhegoda@gmail.com <mailto:dasunhegoda@gmail.com>



Mime
View raw message