hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuefu Zhang <xu...@uber.com>
Subject Re: Error in Hive on Spark
Date Thu, 10 Mar 2016 15:51:53 GMT
You can probably avoid the problem by set environment variable SPARK_HOME
or JVM property spark.home that points to your spark installation.

--Xuefu

On Thu, Mar 10, 2016 at 3:11 AM, Stana <stana@is-land.com.tw> wrote:

>  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
> executing org.apache.hadoop.hive.ql.Driver with java application.
>
> Following are my situations:
> 1.Building spark 1.4.1 assembly jar without Hive .
> 2.Uploading the spark assembly jar to the hadoop cluster.
> 3.Executing the java application with eclipse IDE in my client computer.
>
> The application went well and it submitted mr job to the yarn cluster
> successfully when using " hiveConf.set("hive.execution.engine", "mr")
> ",but it threw exceptions in spark-engine.
>
> Finally, i traced Hive source code and came to the conclusion´╝Ü
>
> In my situation, SparkClientImpl class will generate the spark-submit
> shell and executed it.
> The shell command allocated  --class with RemoteDriver.class.getName()
> and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
> my application threw the exception.
>
> Is it right? And how can I do to execute the application with
> spark-engine successfully in my client computer ? Thanks a lot!
>
>
> Java application code:
>
> public class TestHiveDriver {
>
>         private static HiveConf hiveConf;
>         private static Driver driver;
>         private static CliSessionState ss;
>         public static void main(String[] args){
>
>                 String sql = "select * from hadoop0263_0 as a join
> hadoop0263_0 as b
> on (a.key = b.key)";
>                 ss = new CliSessionState(new HiveConf(SessionState.class));
>                 hiveConf = new HiveConf(Driver.class);
>                 hiveConf.set("fs.default.name", "hdfs://storm0:9000");
>                 hiveConf.set("yarn.resourcemanager.address",
> "storm0:8032");
>                 hiveConf.set("yarn.resourcemanager.scheduler.address",
> "storm0:8030");
>
> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
>                 hiveConf.set("yarn.resourcemanager.admin.address",
> "storm0:8033");
>                 hiveConf.set("mapreduce.framework.name", "yarn");
>                 hiveConf.set("mapreduce.johistory.address",
> "storm0:10020");
>
> hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");
>
> hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
>                 hiveConf.set("javax.jdo.option.ConnectionUserName",
> "root");
>                 hiveConf.set("javax.jdo.option.ConnectionPassword",
> "123456");
>                 hiveConf.setBoolean("hive.auto.convert.join",false);
>                 hiveConf.set("spark.yarn.jar",
> "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
>                 hiveConf.set("spark.home","target/spark");
>                 hiveConf.set("hive.execution.engine", "spark");
>                 hiveConf.set("hive.dbname", "default");
>
>
>                 driver = new Driver(hiveConf);
>                 SessionState.start(hiveConf);
>
>                 CommandProcessorResponse res = null;
>                 try {
>                         res = driver.run(sql);
>                 } catch (CommandNeedRetryException e) {
>                         // TODO Auto-generated catch block
>                         e.printStackTrace();
>                 }
>
>                 System.out.println("Response Code:" +
> res.getResponseCode());
>                 System.out.println("Error Message:" +
> res.getErrorMessage());
>                 System.out.println("SQL State:" + res.getSQLState());
>
>         }
> }
>
>
>
>
> Exception of spark-engine:
>
> 16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with
> argv:
> /Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit
> --properties-file
>
> /var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-submit.7697089826296920539.properties
> --class org.apache.hive.spark.client.RemoteDriver
>
> /Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> --remote-host MacBook-Pro.local --remote-port 51331 --conf
> hive.spark.client.connect.timeout=1000 --conf
> hive.spark.client.server.connect.timeout=90000 --conf
> hive.spark.client.channel.log.level=null --conf
> hive.spark.client.rpc.max.size=52428800 --conf
> hive.spark.client.rpc.threads=8 --conf
> hive.spark.client.secret.bits=256
> 16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO Client:
> 16/03/10 18:33:09 INFO SparkClientImpl:          client token: N/A
> 16/03/10 18:33:09 INFO SparkClientImpl:          diagnostics: N/A
> 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster host:
> N/A
> 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster RPC
> port: -1
> 16/03/10 18:33:09 INFO SparkClientImpl:          queue: default
> 16/03/10 18:33:09 INFO SparkClientImpl:          start time: 1457180833494
> 16/03/10 18:33:09 INFO SparkClientImpl:          final status: UNDEFINED
> 16/03/10 18:33:09 INFO SparkClientImpl:          tracking URL:
> http://storm0:8088/proxy/application_1457002628102_0043/
> 16/03/10 18:33:09 INFO SparkClientImpl:          user: stana
> 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
> Application report for application_1457002628102_0043 (state: FAILED)
> 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
> 16/03/10 18:33:10 INFO SparkClientImpl:          client token: N/A
> 16/03/10 18:33:10 INFO SparkClientImpl:          diagnostics: Application
> application_1457002628102_0043 failed 1 times due to AM Container for
> appattempt_1457002628102_0043_000001 exited with  exitCode: -1000
> 16/03/10 18:33:10 INFO SparkClientImpl: For more detailed output,
> check application tracking
> page:http://storm0:8088/proxy/application_1457002628102_0043/Then,
> click on links to logs of each attempt.
> 16/03/10 18:33:10 INFO SparkClientImpl: Diagnostics:
> java.io.FileNotFoundException: File
>
> file:/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> does not exist
> 16/03/10 18:33:10 INFO SparkClientImpl: Failing this attempt. Failing
> the application.
> 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster host:
> N/A
> 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster RPC
> port: -1
> 16/03/10 18:33:10 INFO SparkClientImpl:          queue: default
> 16/03/10 18:33:10 INFO SparkClientImpl:          start time: 1457180833494
> 16/03/10 18:33:10 INFO SparkClientImpl:          final status: FAILED
> 16/03/10 18:33:10 INFO SparkClientImpl:          tracking URL:
> http://storm0:8088/cluster/app/application_1457002628102_0043
> 16/03/10 18:33:10 INFO SparkClientImpl:          user: stana
> 16/03/10 18:33:10 INFO SparkClientImpl: Exception in thread "main"
> org.apache.spark.SparkException: Application
> application_1457002628102_0043 finished with failed status
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> org.apache.spark.deploy.yarn.Client.run(Client.scala:920)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> org.apache.spark.deploy.yarn.Client$.main(Client.scala:966)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> org.apache.spark.deploy.yarn.Client.main(Client.scala)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> java.lang.reflect.Method.invoke(Method.java:606)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
>
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> ShutdownHookManager: Shutdown hook called
> 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> ShutdownHookManager: Deleting directory
>
> /private/var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-5b92ce20-b6f8-4832-8b15-5e98bd0e0705
> 16/03/10 18:33:10 WARN SparkClientImpl: Error while waiting for client
> to connect.
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> Cancel client '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child
> process exited before connecting back
>         at
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>         at
> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:101)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:94)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:117)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
> [hive-exec-2.0.0.jar:2.0.0]
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
> [hive-exec-2.0.0.jar:?]
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
> [hive-exec-2.0.0.jar:?]
>         at
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
> [hive-exec-2.0.0.jar:?]
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
> [hive-exec-2.0.0.jar:?]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
> [hive-exec-2.0.0.jar:?]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
> [hive-exec-2.0.0.jar:?]
>         at
> org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
> [test-classes/:?]
> Caused by: java.lang.RuntimeException: Cancel client
> '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child process exited
> before connecting back
>         at
> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179)
> ~[hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450)
> ~[hive-exec-2.0.0.jar:2.0.0]
>         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_67]
> 16/03/10 18:33:10 WARN SparkClientImpl: Child process exited with code 1.
> FAILED: SemanticException Failed to get a spark session:
> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create
> spark client.
> 16/03/10 18:33:10 ERROR Driver: FAILED: SemanticException Failed to
> get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
> Failed to create spark client.
> org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a
> spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
> Failed to create spark client.
>         at
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:121)
>         at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>         at
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
>         at
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
>         at
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
>         at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
>         at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
>         at
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
>         at
> org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message