spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tathagata Das (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (SPARK-5836) Highlight in Spark documentation that by default Spark does not delete its temporary files
Date Fri, 21 Aug 2015 20:51:29 GMT

     [ https://issues.apache.org/jira/browse/SPARK-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tathagata Das reopened SPARK-5836:
----------------------------------

> Highlight in Spark documentation that by default Spark does not delete its temporary
files
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-5836
>                 URL: https://issues.apache.org/jira/browse/SPARK-5836
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Tomasz Dudziak
>            Assignee: Ilya Ganelin
>            Priority: Minor
>             Fix For: 1.3.1, 1.4.0
>
>
> We recently learnt the hard way (in a prod system) that Spark by default does not delete
its temporary files until it is stopped. WIthin a relatively short time span of heavy Spark
use the disk of our prod machine filled up completely because of multiple shuffle files written
to it. We think there should be better documentation around the fact that after a job is finished
it leaves a lot of rubbish behind so that this does not come as a surprise.
> Probably a good place to highlight that fact would be the documentation of {{spark.local.dir}}
property, which controls where Spark temporary files are written. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message