hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szehon Ho" <sze...@cloudera.com>
Subject Re: Review Request 30055: HIVE-9337 : Move more hive.spark.* configurations to HiveConf
Date Tue, 20 Jan 2015 23:34:58 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30055/
-----------------------------------------------------------

(Updated Jan. 20, 2015, 11:34 p.m.)


Review request for hive and chengxiang li.


Changes
-------

Address review comments from Lefty and Brock.  Also in the descriptions, put 'Hive' as the
client in order to clarify it.

Looked at Chengxiang's suggestions a little more to use --conf to pass the values down to
remote Spark driver, I guess I must have had a bug in my original attempt, and after fixing
those ran a few basic tests and it seemed to work.


Bugs: HIVE-9337
    https://issues.apache.org/jira/browse/HIVE-9337


Repository: hive-git


Description
-------

This change allows the Remote Spark Driver's properties to be set dynamically via Hive configuration
(ie, set commands).

Went through the Remote Spark Driver's properties and added them to HiveConf, fixing the descriptions
so that they're more clear in a global context with other Hive properties.  Also fixed a bug
in description that stated default value of max message size is 10MB, should read 50MB.  One
open question is that I did not move 'hive.spark.log.dir' as I could not find where it was
read, and did not know if its still being used somewhere?

The passing of these properties between client (Hive) and RemoteSparkDriver is done via the
properties file.  One note is that these properties have to be appended with 'spark', as SparkConf
only accepts those.  I tried a long time to pass them via 'conf' but found that it won't work
(see SparkSubmitArguments.scala).  It may be possible to pass them each as another argument
(like --hive.spark.XXX=YYY), but I think its more scalable to do it via properties file.

On the Remote Spark Driver side, I kept the defensive logic to provide a default value in
case the conf object doesn't contain the property.  This may occur if a prop is unset. For
this, I had to instantiate a HiveConf on that process to get the default value, as some of
the timeout props need a hiveConf instance to do calculation on.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9a830d2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 334c191 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 044f189 
  spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java dab92f6 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java 5e3777a

  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 851e937 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/Rpc.java ac71ae9 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java 5a826ba

  spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java def4907 
  spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java a2dd3e6 

Diff: https://reviews.apache.org/r/30055/diff/


Testing
-------


Thanks,

Szehon Ho


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message