spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-3967) Spark applications fail in yarn-cluster mode when the directories configured in yarn.nodemanager.local-dirs are located on different disks/partitions
Date Sat, 18 Oct 2014 00:33:33 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175748#comment-14175748
] 

Sean Owen commented on SPARK-3967:
----------------------------------

You guys should make PRs for these. I am also not sure if it's so necessary to download the
file into a temp directory and move it... it may cause a copy instead of rename, and in fact
does here, and so is not like the file appears in the target dir atomically anyway. I'm not
sure the code here cleans up the partially downloaded file in case of error and that could
leave a broken file in the target dir instead of just a temp dir.

The change to not copy the file when identical looks sound; I bet you can avoid checking if
it exists twice.

> Spark applications fail in yarn-cluster mode when the directories configured in yarn.nodemanager.local-dirs
are located on different disks/partitions
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-3967
>                 URL: https://issues.apache.org/jira/browse/SPARK-3967
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Christophe PR√ČAUD
>         Attachments: spark-1.1.0-utils-fetch.patch, spark-1.1.0-yarn_cluster_tmpdir.patch
>
>
> Spark applications fail from time to time in yarn-cluster mode (but not in yarn-client
mode) when yarn.nodemanager.local-dirs (Hadoop YARN config) is set to a comma-separated list
of directories which are located on different disks/partitions.
> Steps to reproduce:
> 1. Set yarn.nodemanager.local-dirs (in yarn-site.xml) to a list of directories located
on different partitions (the more you set, the more likely it will be to reproduce the bug):
> (...)
> <property>
>   <name>yarn.nodemanager.local-dirs</name>
>   <value>file:/d1/yarn/local/nm-local-dir,file:/d2/yarn/local/nm-local-dir,file:/d3/yarn/local/nm-local-dir,file:/d4/yarn/local/nm-local-dir,file:/d5/yarn/local/nm-local-dir,file:/d6/yarn/local/nm-local-dir,file:/d7/yarn/local/nm-local-dir</value>
> </property>
> (...)
> 2. Launch (several times) an application in yarn-cluster mode, it will fail (apparently
randomly) from time to time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message