spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruslan Shestopalyuk (JIRA)" <>
Subject [jira] [Commented] (SPARK-21942) DiskBlockManager crashing when a root local folder has been externally deleted by OS
Date Fri, 08 Sep 2017 08:37:00 GMT


Ruslan Shestopalyuk commented on SPARK-21942:

[~jerryshao] I believe the only objective reason here would be to make the Spark code more

Regarding the rest - I agree it's not a valid issue, since if problem like this happens, one
can always spend some time debugging the Spark code and realize what a workaround could be.

Also, hopefully this very page gets indexed in the search engines, so maybe even that won't
be needed :) 

> DiskBlockManager crashing when a root local folder has been externally deleted by OS
> ------------------------------------------------------------------------------------
>                 Key: SPARK-21942
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.1, 1.6.2, 1.6.3, 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.2.0, 2.2.1,
2.3.0, 3.0.0
>            Reporter: Ruslan Shestopalyuk
>            Priority: Minor
>              Labels: storage
>             Fix For: 2.3.0
> _DiskBlockManager_ has a notion of a "scratch" local folder(s), which can be configured
via _spark.local.dir_ option, and which defaults to the system's _/tmp_. The hierarchy is
two-level, e.g. _/blockmgr-XXX.../YY_, where the _YY_ part is a hash bit, to spread files
> Function _DiskBlockManager.getFile_ expects the top level directories (_blockmgr-XXX..._)
to always exist (they get created once, when the spark context is first created), otherwise
it would fail with a message like:
> {code}
> ... Failed to create local dir in /tmp/blockmgr-XXX.../YY
> {code}
> However, this may not always be the case.
> In particular, *if it's the default _/tmp_ folder*, there can be different strategies
of automatically removing files from it, depending on the OS:
> * on the boot time
> * on a regular basis (e.g. once per day via a system cron job)
> * based on the file age
> The symptom is that after the process (in our case, a service) using spark is running
for a while (a few days), it may not be able to load files anymore, since the top-level scratch
directories are not there and _DiskBlockManager.getFile_ crashes.
> Please note that this is different from people arbitrarily removing files manually.
> We have both the facts that _/tmp_ is the default in the spark config and that the system
has the right to tamper with its contents, and will do it with a high probability, after some
period of time.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message