mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pat Ferrel (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (MAHOUT-1762) Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
Date Thu, 17 Mar 2016 15:22:33 GMT

     [ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pat Ferrel resolved MAHOUT-1762.
--------------------------------
    Resolution: Won't Fix

We don't know of anything this blocks and moving to using sparksubmit was voted down, which
only applies to Mahout CLI drivers anyway. All CLI drivers support passthrough of arbitrary
key=value pairs, which go into the SparkConf and when using Mahout as a Lib you can create
any arbitrary SparkConf.

Will not fix unless someone can explain the need. 

> Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
> -------------------------------------------------------
>
>                 Key: MAHOUT-1762
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1762
>             Project: Mahout
>          Issue Type: Improvement
>          Components: spark
>            Reporter: Sergey Tryuber
>            Assignee: Pat Ferrel
>             Fix For: 1.0.0
>
>
> [spark-defaults.conf|http://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties]
is aimed to contain global configuration for Spark cluster. For example, in our HDP2.2 environment
it contains:
> {noformat}
> spark.driver.extraJavaOptions      -Dhdp.version=2.2.0.0–2041
> spark.yarn.am.extraJavaOptions     -Dhdp.version=2.2.0.0–2041
> {noformat}
> and there are many other good things. Actually it is expected that when a user starts
Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark
Shell, because it ignores spark configuration and user has to copy-past lots of options into
_MAHOUT_OPTS_.
> This happens because [org.apache.mahout.sparkbindings.shell.Main|https://github.com/apache/mahout/blob/master/spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/Main.scala]
is executed directly in [initialization script|https://github.com/apache/mahout/blob/master/bin/mahout]:
> {code}
> "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" "org.apache.mahout.sparkbindings.shell.Main"
$@
> {code}
> In contrast, in Spark shell is indirectly invoked through spark-submit in [spark-shell|https://github.com/apache/spark/blob/master/bin/spark-shell]
script:
> {code}
> "$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main "$@"
> {code}
> [SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties
method).
> So there are two possible solutions:
> * use proper Spark-like initialization logic
> * use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message