spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-19812) YARN shuffle service fails to relocate recovery DB across NFS directories
Date Mon, 24 Apr 2017 18:03:04 GMT

     [ https://issues.apache.org/jira/browse/SPARK-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-19812:
------------------------------------

    Assignee: Thomas Graves  (was: Apache Spark)

> YARN shuffle service fails to relocate recovery DB across NFS directories
> -------------------------------------------------------------------------
>
>                 Key: SPARK-19812
>                 URL: https://issues.apache.org/jira/browse/SPARK-19812
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 2.0.1
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>
> The yarn shuffle service tries to switch from the yarn local directories to the real
recovery directory but can fail to move the existing recovery db's.  It fails due to Files.move
not doing directories that have contents.
> 2017-03-03 14:57:19,558 [main] ERROR yarn.YarnShuffleService: Failed to move recovery
file sparkShuffleRecovery.ldb to the path /mapred/yarn-nodemanager/nm-aux-services/spark_shuffle
> java.nio.file.DirectoryNotEmptyException:/yarn-local/sparkShuffleRecovery.ldb
>         at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:498)
>         at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
>         at java.nio.file.Files.move(Files.java:1395)
>         at org.apache.spark.network.yarn.YarnShuffleService.initRecoveryDb(YarnShuffleService.java:369)
>         at org.apache.spark.network.yarn.YarnShuffleService.createSecretManager(YarnShuffleService.java:200)
>         at org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:174)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:143)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:262)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:357)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:636)
>         at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:684)
> This used to use f.renameTo and we switched it in the pr due to review comments and it
looks like didn't do a final real test. The tests are using files rather then directories
so it didn't catch. We need to fix the test also.
> history: https://github.com/apache/spark/pull/14999/commits/65de8531ccb91287f5a8a749c7819e99533b9440



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message