hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yun Tang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-6745) Cannot parse correct Spark 2.x jars classpath in YARN on Windows
Date Wed, 28 Jun 2017 11:27:00 GMT
Yun Tang created YARN-6745:
------------------------------

             Summary: Cannot parse correct Spark 2.x jars classpath in YARN on Windows
                 Key: YARN-6745
                 URL: https://issues.apache.org/jira/browse/YARN-6745
             Project: Hadoop YARN
          Issue Type: Bug
          Components: applications
    Affects Versions: 2.7.2
         Environment: Windows cluster, Yarn-2.7.2
            Reporter: Yun Tang


When submit Spark 2.x applications to YARN cluster on Windows, we found two errors:
# If  [dynamic resource allocation|https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation]is
enabled for Spark, we will get exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.network.util.JavaUtils.byteStringAs(Ljava/lang/String;Lorg/apache/spark/network/util/ByteUnit)
# We cannot open spark application running web UI

The two errors are both related to YARN cannot parse correct Spark 2.x jars wildcard classpath
on Windows, and I checked the latest code from hadoop-3.x, this part of code seems not changed
and would cause this error again.

A typical appacahe folder to run spark executor/driver in our windows yarn looks like below:
!http://wx1.sinaimg.cn/large/62eae5a9gy1fh14j38zvbj20bb0990tm.jpg!
The link folder of ‘__spark_libs_’ points to a filecache folder with spark-2+ needed jars;
The classpath-xxx.jar containing a manifest file of the runtime classpath to work around the
8k maximum command line length problem in windows (https://issues.apache.org/jira/browse/YARN-358)
.
The ‘launch_container.cmd’ is the script to start YARN container, please note that after
running launch_container.cmd, the shortcut ‘__spark_conf_’ , ‘__spark_libs_’ and ‘__app__.jar’
could then be created.


=================================================
The typical CLASSPATH of hadoop-2.7.2 in launch_container.cmd looks like below:
!http://wx4.sinaimg.cn/large/62eae5a9gy1fh14j2c801j20sh023weh.jpg!
The ‘classpath-3177336218981224920.jar’ contains a manifest file containing all the hadoop
runtime jars, in which we could find spark-1.6.2-nao-yarn-shuffle.jar and servlet-api-2.5.jar.
The two problems are all due to java runtime first load class from those two old jars, while
spark 1.x shuffle external service is not compatible with spark 2.x and servlet-api-2.x is
not compatible with servlet-api-3.x (used in spark-2).

So, that is to say, the “xxx/spark_libs/*” should place before the classpath-jar. OK,
let’s see what is the CLASSPATH in Linux.

=================================================
The classpath in launch_container.sh looks like:
!http://wx2.sinaimg.cn/large/62eae5a9gy1fh14ivycpxj20um01tjre.jpg!
We can see the “xxx/spark_libs/*” placed before hadoop jars so that the #1 and #2 problem
would not happen in Linux environment.

*Root cause*:
Two steps for the whole process
1.{color:blue}org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch{color}
will transform original CLASSPATH into the classpath-jar in method ‘sanitizeEnv’. The
CLASSPATH is:
{code:java}
%PWD%;%PWD%/__spark_conf__;%PWD%/__app__.jar;%PWD%/__spark_libs__/*;%HADOOP_CONF_DIR%;%HADOOP_COMMON_HOME%/share/hadoop/common/*;%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*;%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\*;%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\lib\*;
{code}

Within this method, it will call ‘createJarWithClassPath’ method from {color:blue}org.apache.hadoop.fs.FileUtil{color}

2. For the wildcard path, {color:blue}org.apache.hadoop.fs.FileUtil{color} will find the files
in that folder with suffix of ‘jar’ or ‘JAR’. The previous %PWD%/__spark_libs__/*
transformed to 
{code:java}
D:/Data/Yarn/nm-local-dir/usercache/xxx/appcache/application_1494151518127_0073/container_e3752_1494151518127_0073_01_000001/__spark_libs__/*
.
{code}

However, this folder is not existing when generating the classpath-jar, only after running
‘launch_container.cmd’ we could have the ‘_spark_libs_’ folder in current directory,
which results in YARN put the “xxx/_spark_libs_/*” classpath into unexpandedWildcardClasspath.
And the unexpandedWildcardClasspath is placed after the classpath-jar in CLASSPATH, that’s
why we see the “xxx/__spark_libs__/” located in the end. 

In other words, the correct order should be “xxx/spark_libs/*" placed before the classpath-jar
just like Linux case or parse the “xxx/spark_libs/xxx.jar” into the classpath-jar, which
means changing current wrong order satisfied the original design. 





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Mime
View raw message