spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charlie Tsai (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-19307) SPARK-17387 caused ignorance of conf object passed to SparkContext:
Date Tue, 29 Aug 2017 22:44:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-19307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146280#comment-16146280
] 

Charlie Tsai edited comment on SPARK-19307 at 8/29/17 10:43 PM:
----------------------------------------------------------------

Hi,

I am using 2.2.0 but find that command line {{--conf}} arguments are still not available when
the {{SparkConf()}} object is instantiated. As a result, I can't check what has already been
set using the command line {{--conf}} arguments in my driver and set additional configuration
using {{setIfMissing}}. Instead, {{setIfMissing}} effectively overwrites whatever is passed
in through the CLI.

For example, if my job is:
{code}
# debug.py

import pyspark

if __name__ == '__main__':
    print(pyspark.SparkConf()._jconf)    # is `None` but should include `--conf` arguments

    default_conf = {
        "spark.dynamicAllocation.maxExecutors": "36",
        "spark.yarn.executor.memoryOverhead": "1500",
    }

    # these are supposed to be set only if not provided by the CLI args
    spark_conf = pyspark.SparkConf()
    for (k, v) in default_conf.items():
        spark_conf.setIfMissing(k, v)
{code}

Running
{code}
spark-submit \
    --master yarn \
    --deploy-mode client \
    --conf spark.yarn.executor.memoryOverhead=2500 \
    --conf spark.dynamicAllocation.maxExecutors=128 \
    debug.py
{code}

In 1.6.2 the CLI args take precedent, whereas in 2.2.0, {{SparkConf().getAll()}} appears empty
even though {{--conf}} args were passed in already.


was (Author: ctsai):
Hi,

I am using 2.2.0 but find that command line {{--conf}} arguments are still not available when
the {{SparkConf()}} object is instantiated. As a result, I can't check what has already been
set using the command line {{--conf}} arguments in my driver and set additional configuration
using {{setIfMissing}}. Instead, {{setIfMissing}} effectively overwrites whatever is passed
in through the CLI.

For example, if my job is:
{code}
# debug.py

import pyspark

if __name__ == '__main__':
    print(pyspark.SparkConf()._jconf)    # is `None` but should include `--conf` arguments

    default_conf = {
        "spark.dynamicAllocation.maxExecutors": "36",
        "spark.yarn.executor.memoryOverhead": "1500",
    }

    # these are suppsoed to be set only if not provided by the CLI args
    spark_conf = pyspark.SparkConf()
    for (k, v) in default_conf.items():
        spark_conf.setIfMissing(k, v)
{code}

Running
{code}
spark-submit \
    --master yarn \
    --deploy-mode client \
    --conf spark.yarn.executor.memoryOverhead=2500 \
    --conf spark.dynamicAllocation.maxExecutors=128 \
    debug.py
{code}

In 1.6.2 the CLI args take precedent, whereas in 2.2.0, {{SparkConf().getAll()}} appears empty
even though {{--conf}} args were passed in already.

> SPARK-17387 caused ignorance of conf object passed to SparkContext:
> -------------------------------------------------------------------
>
>                 Key: SPARK-19307
>                 URL: https://issues.apache.org/jira/browse/SPARK-19307
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.1.0
>            Reporter: yuriy_hupalo
>            Assignee: Marcelo Vanzin
>             Fix For: 2.1.1, 2.2.0
>
>         Attachments: SPARK-19307.patch
>
>
> after patch SPARK-17387 was applied -- Sparkconf object is ignored when launching SparkContext
programmatically via python from spark-submit:
> https://github.com/apache/spark/blob/master/python/pyspark/context.py#L128:
> in case when we are running python SparkContext(conf=xxx) from spark-submit:
>     conf is set, conf._jconf is None ()
>     passed as arg  conf object is ignored (and used only when we are launching java_gateway).
> how to fix:
> python/pyspark/context.py:132
> {code:title=python/pyspark/context.py:132}
>         if conf is not None and conf._jconf is not None:
>             # conf has been initialized in JVM properly, so use conf directly. This represent
the
>             # scenario that JVM has been launched before SparkConf is created (e.g. SparkContext
is
>             # created and then stopped, and we create a new SparkConf and new SparkContext
again)
>             self._conf = conf
>         else:
>             self._conf = SparkConf(_jvm=SparkContext._jvm)
> +             if conf:
> +                 for key, value in conf.getAll():
> +                     self._conf.set(key,value)
> +                     print(key,value)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message