hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chengxiang Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7436) Load Spark configuration into Hive driver
Date Tue, 22 Jul 2014 03:02:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069745#comment-14069745
] 

Chengxiang Li commented on HIVE-7436:
-------------------------------------

Thanks, [~xuefuz]
{quote}
1. We don't want to read Spark config from a spark-site.xml file. We want expose a few basic
Hive configuration for those, such as hive.server2.spark.masterurl for spark master. The reason
for this is that we don't want to require the availability of such configuration file, and
everything can be done in Hive itself. We don't necessarily want to get every configuration
from spark-site.xml. This is also how tez is done, by the way.
{quote}
I suppose the spark-site.xml you mentioned means spark-defaults.conf here. spark-defaults.conf
should be a comfortable place to configure spark for anyone who are familiar with spark, so
i  think we should keep this configuration option, similarly hive on tez support configure
tez in tez-site.xml as well. At the same time, i agree that hive on spark should not depends
on the availability of spark-defaults.conf to configure spark, we need to support spark configuration
in hive-site.xml. My only question is that, do we need to introduce new configure name such
as 'hive.server2.spark.masterurl' instead of orignal 'spark.master'? or we can just support
spark configurations in hive-site.xml like tez.
{quote}
2. I think the SparkContext should be per user session. A singleton available to everyone
is not acceptable because we like to have a separation between users so that user doesn't
accidentally share with other users such as custom UDFs. We don't like to do it per query
due to the startup cost. I think per user session is a reasonable compromise. When user session
expires, the resources will be released, so that they become available to other users.
{quote}
You're right, SparkContext should be per user session as multi-tenant is supported in hive.
So this means that we can not support changing spark configurations through hive cli set command.

> Load Spark configuration into Hive driver
> -----------------------------------------
>
>                 Key: HIVE-7436
>                 URL: https://issues.apache.org/jira/browse/HIVE-7436
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>         Attachments: HIVE-7436-Spark.1.patch, HIVE-7436-Spark.2.patch
>
>
> load Spark configuration into Hive driver, there are 3 ways to setup spark configurations:
> #  Configure properties in spark configuration file(spark-defaults.conf).
> #  Java property.
> #  System environment.
> Spark support configuration through system environment just for compatible with previous
scripts, we won't support in Hive on Spark. Hive on Spark load defaults from java properties,
then load properties from configuration file, and override existed properties.
> configuration steps:
> # Create spark-defaults.conf, and place it in the /etc/spark/conf configuration directory.
>     please refer to [http://spark.apache.org/docs/latest/configuration.html] for configuration
of spark-defaults.conf.
> # Create the $SPARK_CONF_DIR environment variable and set it to the location of spark-defaults.conf.
>     export SPARK_CONF_DIR=/etc/spark/conf
> # Add $SAPRK_CONF_DIR to the $HADOOP_CLASSPATH environment variable.
>     export HADOOP_CLASSPATH=$SPARK_CONF_DIR:$HADOOP_CLASSPATH
> NO PRECOMMIT TESTS. This is for spark-branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message