hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1292) dfs -copyToLocal should guarantee file is complete
Date Fri, 22 Jun 2007 05:47:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507129

dhruba borthakur commented on HADOOP-1292:

The test creates files/directories in /test/copytolocal. This could cause the test to fail
if this directory does not have permissions to writes, etc. It is better to use FsShell.TEST_ROOT_DIR
as the temporary work directory for the test.

> dfs -copyToLocal should guarantee file is complete
> --------------------------------------------------
>                 Key: HADOOP-1292
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1292
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: eric baldeschwieler
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: HADOOP-1292_20070621c.patch
> We should copy to a temporary file, maybe _tmp.<realname>, and then rename the
file when the copy is complete.  Restarting a copy should reuse the _tmp file, just checksumming
it.  Then ^Cing a copy will do the right thing.
> Original suggestion:
> On Apr 23, 2007, at 2:38 AM, Richard Kasperski wrote:
> I'd like to have a guarantee that a file copy is both completed and that the file is
whole. In the past I've done this  by copying the file to a temporary name tmp.<realname>
and then moving it to <realname> once I have the file copy is complete. This has the
following very nice properties; If the <realname> exists then the file copy is complete
and I'm not looking at a partial copy of the file. I believe that the copy to the cluster
has both of these properties in that the file doesn't appear in a DFS directory until the
whole file has been copied. The copy from the cluster to a local file system does not have
these guarantees and it would be very nice if it did. There are two scenarios under what I
wish to use this. First is that if I ctrl-c the 'hadoop dfs -copyToLocal' I know what parts
are complete and what parts aren't. Second I can run a background compressor to compress the
files as they are copied.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message