spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-4896) Don't redundantly copy executor dependencies in Utils.fetchFile
Date Fri, 19 Dec 2014 18:48:14 GMT

    [ https://issues.apache.org/jira/browse/SPARK-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14253779#comment-14253779
] 

Apache Spark commented on SPARK-4896:
-------------------------------------

User 'ryan-williams' has created a pull request for this issue:
https://github.com/apache/spark/pull/2848

> Don't redundantly copy executor dependencies in Utils.fetchFile
> ---------------------------------------------------------------
>
>                 Key: SPARK-4896
>                 URL: https://issues.apache.org/jira/browse/SPARK-4896
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Josh Rosen
>
> This JIRA is spun off from a comment by [~rdub] on SPARK-3967, quoted here:
> {quote}
> I've been debugging this issue as well and I think I've found an issue in {{org.apache.spark.util.Utils}}
that is contributing to / causing the problem:
> {{Files.move}} on [line 390|https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/util/Utils.scala#L390]
is called even if {{targetFile}} exists and {{tempFile}} and {{targetFile}} are equal.
> The check on [line 379|https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/util/Utils.scala#L379]
seems to imply the desire to skip a redundant overwrite if the file is already there and has
the contents that it should have.
> Gating the {{Files.move}} call on a further {{if (!targetFile.exists)}} fixes the issue
for me; attached is a patch of the change.
> In practice all of my executors that hit this code path are finding every dependency
JAR to already exist and be exactly equal to what they need it to be, meaning they were all
needlessly overwriting all of their dependency JARs, and now are all basically no-op-ing in
{{Utils.fetchFile}}; I've not determined who/what is putting the JARs there, why the issue
only crops up in {{yarn-cluster}} mode (or {{--master yarn --deploy-mode cluster}}), etc.,
but it seems like either way this patch is probably desirable.
> {quote}
> I'm spinning this off into its own JIRA so that we can track the merging of https://github.com/apache/spark/pull/2848
separately (since we have multiple PRs that contribute to fixing the original issue).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message