spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kingsley Jones (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
Date Sun, 01 Apr 2018 00:22:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421530#comment-16421530
] 

Kingsley Jones edited comment on SPARK-12216 at 4/1/18 12:21 AM:
-----------------------------------------------------------------

Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 2.2.1, Hadoop 2.7

My tests support the contention of [~IgorBabalich] ... it seems that classloaders instantiated
by the code are not ever being closed. On *nix this is not a problem since the files are not
locked. However, on windows the files are locked.

In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 seems relevant:

[https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html]

A new method "close()" was introduced to address the problem, which shows up on Windows due
to the differing treatment of file locks between the Windows file system and *nix file system.

I would point out that this is a generic java issue which breaks the cross-platform intention
of that platform as a whole.

The Oracle blog also contains a post:

[https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader]

I have been searching the Apache Spark code-base for classloader instances, in search of any
".close()" action. I could not find any, so I believe [~IgorBabalich] is correct - the issue
has to do with classloaders not being closed.

I would fix it myself, but thusfar it is not clear to me *when* the classloader needs to be
closed. That is just ignorance on my part. The question is whether the classloader should
be closed when still available as variable at the point where it has been instantiated, or
later during the ShutdownHookManger cleanup. If the latter, then it was not clear to me how
to actually get a list of open class loaders.

That is where I am at so far. I am prepared to put some work into this, but I need some help
from those who know the codebase to help answer the above question - maybe with a well-isolated
test.

MY TESTS...

This issue has been around in one form or another for at least four years and shows up on
many threads.

The standard answer is that it is a "permissions issue" to do with Windows.

That assertion is objectively false.

There is simple test to prove it.

At a windows prompt, start spark-shell

C:\spark\spark-shell   

then get the temp file directory:

scala> sc.getConf.get("spark.repl.class.outputDir")

it will be in %AppData%\Local\Temp tree e.g.

C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f

where the last file name has GUID that changes in each iteration.

With the spark session still open, go to the Temp directory and try to delete the given directory.

You won't be able to... there is a lock on it.

Now issue

scala> :quit

to quit the session.

The stack trace will show that ShutdownHookManager tried to delete the directory above but
could not.

If you now try and delete it through the file system you can.

This is because the JVM actually cleans up the locks on exit.

So, it is not a permission issue, but a feature of the Windows treatment of file locks.

This is the *known issue* that was addressed in the Java bug fix through introduction of a
Closeable interface close method for URLClassLoader. It was fixed there since many enterprise
systems run on Windows.

Now... to further test the cause, I used the Windows Linux Subsytem.

To acces this (post install) you run

C:> bash

from a command prompt.

In order to get this to work, I used the same spark install, but had to install a fresh copy
of jdk on ubuntu within the Windows bash subsystem. This is standard ubuntu stuff, but the
path to your windows c drive is /mnt/c

If I rerun the same test, the new output of 

scala> sc.getConf.get("spark.repl.class.outputDir")

will be a different folder location under Linux /tmp but with the same setup.

With the spark session still active it is possible to delete the spark folders in the /tmp
folder *while the session is still active*. This is the difference between Windows and Linux.
While bash is running Ubuntu on Windows, it has the different file locking behaviour which
means you can delete the spark temp folders while a session is running.

If you run through a new session with spark-shell at the linux prompt and issue :quit it will
shutdown without any stacktrace error from ShutdownHookManger.

So, my conclusions are as follows:

1) this is not a permissions issue as per the common assertion

2) it is a Windows specific problem for *known* reasons - namely the difference on file-locking
as compared with Linux

3) it was considered a *bug* in the Java ecosystem and was fixed as such from Java 1.7 with
the .close() method

Further...

People who need to run Spark on windows infrastructure (like me) can either run a docker container
or use the windows linux subsystem to launch processes. So we do have a workaround.

However, it does concern me that this bug has been hanging around for four years or more when
it seems to come from a lax coding practise in the use of classloaders. That kind of breaks
the cross-platform promise of Java and Scala, which is why they were popular in the first
place :)

Linux is good.

Windows is good.

The addressable pool of Apache Spark developers *will* expand very significantly if Windows
developers are not shut out of the ecosystem by (apparently) fixable issues.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


was (Author: kingsley):
Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 2.2.1, Hadoop 2.7

My tests support the contention of [~IgorBabalich]Igor Bablich... it seems that classloaders
instantiated by the code are not ever being closed. On *nix this is not a problem since the
files are not locked. However, on windows the files are locked.

In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 seems relevant:

[https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html]

A new method "close()" was introduced to address the problem, which shows up on Windows due
to the differing treatment of file locks between the Windows file system and *nix file system.

I would point out that this is a generic java issue which breaks the cross-platform intention
of that platform as a whole.

The Oracle blog also contains a post:

[https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader]

I have been searching the Apache Spark code-base for classloader instances, in search of any
".close()" action. I could not find any, so I believe [~IgorBabalich] is correct - the issue
has to do with classloaders not being closed.

I would fix it myself, but thusfar it is not clear to me *when* the classloader needs to be
closed. That is just ignorance on my part. The question is whether the classloader should
be closed when still available as variable at the point where it has been instantiated, or
later during the ShutdownHookManger cleanup. If the latter, then it was not clear to me how
to actually get a list of open class loaders.

That is where I am at so far. I am prepared to put some work into this, but I need some help
from those who know the codebase to help answer the above question - maybe with a well-isolated
test.

MY TESTS...

This issue has been around in one form or another for at least four years and shows up on
many threads.

The standard answer is that it is a "permissions issue" to do with Windows.

That assertion is objectively false.

There is simple test to prove it.

At a windows prompt, start spark-shell

C:\spark\spark-shell   

then get the temp file directory:

scala> sc.getConf.get("spark.repl.class.outputDir")

it will be in %AppData%\Local\Temp tree e.g.

C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f

where the last file name has GUID that changes in each iteration.

With the spark session still open, go to the Temp directory and try to delete the given directory.

You won't be able to... there is a lock on it.

Now issue

scala> :quit

to quit the session.

The stack trace will show that ShutdownHookManager tried to delete the directory above but
could not.

If you now try and delete it through the file system you can.

This is because the JVM actually cleans up the locks on exit.

So, it is not a permission issue, but a feature of the Windows treatment of file locks.

This is the *known issue* that was addressed in the Java bug fix through introduction of a
Closeable interface close method for URLClassLoader. It was fixed there since many enterprise
systems run on Windows.

Now... to further test the cause, I used the Windows Linux Subsytem.

To acces this (post install) you run

C:> bash

from a command prompt.

In order to get this to work, I used the same spark install, but had to install a fresh copy
of jdk on ubuntu within the Windows bash subsystem. This is standard ubuntu stuff, but the
path to your windows c drive is /mnt/c

If I rerun the same test, the new output of 

scala> sc.getConf.get("spark.repl.class.outputDir")

will be a different folder location under Linux /tmp but with the same setup.

With the spark session still active it is possible to delete the spark folders in the /tmp
folder *while the session is still active*. This is the difference between Windows and Linux.
While bash is running Ubuntu on Windows, it has the different file locking behaviour which
means you can delete the spark temp folders while a session is running.

If you run through a new session with spark-shell at the linux prompt and issue :quit it will
shutdown without any stacktrace error from ShutdownHookManger.

So, my conclusions are as follows:

1) this is not a permissions issue as per the common assertion

2) it is a Windows specific problem for *known* reasons - namely the difference on file-locking
as compared with Linux

3) it was considered a *bug* in the Java ecosystem and was fixed as such from Java 1.7 with
the .close() method

Further...

People who need to run Spark on windows infrastructure (like me) can either run a docker container
or use the windows linux subsystem to launch processes. So we do have a workaround.

However, it does concern me that this bug has been hanging around for four years or more when
it seems to come from a lax coding practise in the use of classloaders. That kind of breaks
the cross-platform promise of Java and Scala, which is why they were popular in the first
place :)

Linux is good.

Windows is good.

The addressable pool of Apache Spark developers *will* expand very significantly if Windows
developers are not shut out of the ecosystem by (apparently) fixable issues.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

> Spark failed to delete temp directory 
> --------------------------------------
>
>                 Key: SPARK-12216
>                 URL: https://issues.apache.org/jira/browse/SPARK-12216
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell
>         Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>            Reporter: stefan
>            Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark temp dir:
C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
>         at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
>         at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
>         at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>         at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
>         at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
>         at scala.util.Try$.apply(Try.scala:161)
>         at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
>         at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message