Mailing-List: contact issues-help@spark.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 15 Sep 2017 14:38:00 +0000 (UTC)
From: "Franz Wimmer (JIRA)" <jira@apache.org>
To: issues@spark.apache.org
Message-ID: <JIRA.13102588.1505486224000.123384.1505486280166@Atlassian.JIRA>
In-Reply-To: <JIRA.13102588.1505486224000@Atlassian.JIRA>
References: <JIRA.13102588.1505486224000@Atlassian.JIRA> <JIRA.13102588.1505486224418@jira-lw-us.apache.org>
Subject: [jira] [Created] (SPARK-22028) spark-submit trips over environment
 variables
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Fri, 15 Sep 2017 14:38:07 -0000

Franz Wimmer created SPARK-22028:
------------------------------------

             Summary: spark-submit trips over environment variables
                 Key: SPARK-22028
                 URL: https://issues.apache.org/jira/browse/SPARK-22028
             Project: Spark
          Issue Type: Bug
          Components: Deploy
    Affects Versions: 2.1.1
         Environment: Operating System: Windows 10
Shell: CMD or bash.exe, both with the same result
            Reporter: Franz Wimmer


I have a strange environment variable in my Windows operating system:

{code:none}
C:\Path>set ""
=::=::\
{code}

According to [this issue at stackexchange|https://unix.stackexchange.com/a/251215/251326], this is some sort of old MS-DOS relict that interacts with cygwin shells.

Leaving that aside for a moment, Spark tries to read environment variables on submit and trips over it: 

{code:none}
./spark-submit.cmd
Running Spark using the REST application submission protocol.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/09/15 15:57:51 INFO RestSubmissionClient: Submitting a request to launch an application in spark://********:31824.
17/09/15 15:58:01 WARN RestSubmissionClient: Unable to connect to server spark://*******:31824.
Warning: Master endpoint spark://********:31824 was not a REST server. Falling back to legacy submission gateway instead.
17/09/15 15:58:02 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
        at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
        at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
        at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
        at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
        at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
        at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
        at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
        at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
        at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
        at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2391)
        at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2391)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2391)
        at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:221)
        at org.apache.spark.deploy.Client$.main(Client.scala:230)
        at org.apache.spark.deploy.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/09/15 15:58:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/09/15 15:58:08 ERROR ClientEndpoint: Exception from cluster was: java.lang.IllegalArgumentException: Invalid environment variable name: "=::"
java.lang.IllegalArgumentException: Invalid environment variable name: "=::"
        at java.lang.ProcessEnvironment.validateVariable(ProcessEnvironment.java:114)
        at java.lang.ProcessEnvironment.access$200(ProcessEnvironment.java:61)
        at java.lang.ProcessEnvironment$Variable.valueOf(ProcessEnvironment.java:170)
        at java.lang.ProcessEnvironment$StringEnvironment.put(ProcessEnvironment.java:242)
        at java.lang.ProcessEnvironment$StringEnvironment.put(ProcessEnvironment.java:221)
        at org.apache.spark.deploy.worker.CommandUtils$$anonfun$buildProcessBuilder$2.apply(CommandUtils.scala:55)
        at org.apache.spark.deploy.worker.CommandUtils$$anonfun$buildProcessBuilder$2.apply(CommandUtils.scala:54)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
        at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
        at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
        at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
        at org.apache.spark.deploy.worker.CommandUtils$.buildProcessBuilder(CommandUtils.scala:54)
        at org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:181)
        at org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:91)
{code}

Please note that _spark-submit.cmd_ is in this case my own script calling the _spark-submit.cmd_ from the spark distribution.

I think that shouldn't happen. Spark should handle such a malformed environment variable gracefully.


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org