spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Or (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-5655) YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured in secure mode
Date Wed, 11 Feb 2015 16:26:14 GMT

     [ https://issues.apache.org/jira/browse/SPARK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Or updated SPARK-5655:
-----------------------------
    Affects Version/s: 1.3.0

> YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured
in secure mode
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-5655
>                 URL: https://issues.apache.org/jira/browse/SPARK-5655
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.3.0, 1.2.1
>         Environment: Both CDH5.3.0 and CDH5.1.3, latest build on branch-1.2
>            Reporter: Andrew Rowson
>            Priority: Critical
>              Labels: hadoop
>
> When running a Spark job on a YARN cluster which doesn't run containers under the same
user as the nodemanager, and also when using the YARN auxiliary shuffle service, jobs fail
with something similar to:
> {code:java}
> java.io.FileNotFoundException: /data/9/yarn/nm/usercache/username/appcache/application_1423069181231_0032/spark-c434a703-7368-4a05-9e99-41e77e564d1d/3e/shuffle_0_0_0.index
(Permission denied)
> {code}
> The root cause of this here: https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/util/Utils.scala#L287
> Spark will attempt to chmod 700 any application directories it creates during the job,
which includes files created in the nodemanager's usercache directory. The owner of these
files is the container UID, which on a secure cluster is the name of the user creating the
job, and on an nonsecure cluster but with the yarn.nodemanager.container-executor.class configured
is the value of yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user.
> The problem with this is that the auxiliary shuffle manager runs as part of the nodemanager,
which is typically running as the user 'yarn'. This can't access these files that are only
owner-readable.
> YARN already attempts to secure files created under appcache but keep them readable by
the nodemanager, by setting the group of the appcache directory to 'yarn' and also setting
the setgid flag. This means that files and directories created under this should also have
the 'yarn' group. Normally this means that the nodemanager should also be able to read these
files, but Spark setting chmod700 wipes this out.
> I'm not sure what the right approach is here. Commenting out the chmod700 functionality
makes this work on YARN, and still makes the application files only readable by the owner
and the group:
> {code}
> /data/1/yarn/nm/usercache/username/appcache/application_1423247249655_0001/spark-c7a6fc0f-e5df-49cf-a8f5-e51a1ca087df/0c
# ls -lah
> total 206M
> drwxr-s---  2 nobody yarn 4.0K Feb  6 18:30 .
> drwxr-s--- 12 nobody yarn 4.0K Feb  6 18:30 ..
> -rw-r-----  1 nobody yarn 206M Feb  6 18:30 shuffle_0_0_0.data
> {code}
> But this may not be the right approach on non-YARN. Perhaps an additional step to see
if this chmod700 step is necessary (ie non-YARN) is required. Sadly, I don't have a non-YARN
environment to test, otherwise I'd be able to suggest a patch.
> I believe this is a related issue in the MapReduce framwork: https://issues.apache.org/jira/browse/MAPREDUCE-3728



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message