hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2815) Allowing processes to cleanup dfs on shutdown
Date Mon, 03 Mar 2008 09:00:52 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574385#action_12574385
] 

Hemanth Yamijala commented on HADOOP-2815:
------------------------------------------

bq. Another proposal would be to expose a "temporary" flag to the FileSystem.create() call.
This allows applications to create temporary HDFS files. This will be equivalent to the File.createTmpFile()
API in Java. When an application dies or that file is closed, HDFS client will make every
attempt to delete that file. This proposal is better than registering shutdown hooks with
HDFS.

This will work for the jobtracker if the flag can be set on directories as well. Because what
the jobtracker creates is the directory pointed to by mapred.system.dir. So, I guess we need
some flag similar to this for FileSystem.mkdirs as well. Devaraj pointed out that this might
be ambigous if the path being created is "/a/b/c" where all components are created. Which
will be marked temporary - all of a,b,c or only c ? Since closest definitions (For e.g. File.deleteOnExit)
seem to handle absolute paths, we could say it will delete only 'c'. 

Something like this should work. Devaraj ?

> Allowing processes to cleanup dfs on shutdown
> ---------------------------------------------
>
>                 Key: HADOOP-2815
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2815
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Olga Natkovich
>            Assignee: dhruba borthakur
>
> Pig creates temp files that it wants to be removed at the end of the processing. The
code that removes the temp file is in the shutdown hook so that they get removed both under
normal shutdown as well as when process gets killed.
> The problem that we are seeing is that by the time the code is called the DFS might already
be closed and the delete fails leaving temp files behind. Since we have no control over the
shutdown order, we have no way to make sure that the files get removed.
> One way to solve this issue is to be able to mark the files as temp files so that hadoop
can remove them during its shutdown.
> The stack trace I am seeing is
> at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:158)
>         at org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:417)
>         at org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:144)
>         at org.apache.pig.backend.hadoop.datastorage.HPath.delete(HPath.java:96)
>         at org.apache.pig.impl.io.FileLocalizer$1.run(FileLocalizer.java:275)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message