hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1292) dfs -copyToLocal should guarantee file is complete
Date Wed, 20 Jun 2007 07:20:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506427

dhruba borthakur commented on HADOOP-1292:

Looks good. A few comments:

1. I would rather not add a new method to FileSystem. Instead I would use FileSystem.get(Uri,
conf) to get the local file system where-ever needed in FsShell.java
2. the tmp file prefix or suffix could be "tmp.fsshell" so that it is helpful to debug certain
scenarios. Most applications uses "tmp" or some variations of that.
3. I am unable to understand the behaviour of "another" file in FsShell.copyToLocal. Will
discuss this with you.
4. Maybe enhance TestDFSShell.java to encompass this scenario. At least invoke FsShell.copyToLocal

> dfs -copyToLocal should guarantee file is complete
> --------------------------------------------------
>                 Key: HADOOP-1292
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1292
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: eric baldeschwieler
>         Attachments: HADOOP-1292_20070619b.patch
> We should copy to a temporary file, maybe _tmp.<realname>, and then rename the
file when the copy is complete.  Restarting a copy should reuse the _tmp file, just checksumming
it.  Then ^Cing a copy will do the right thing.
> Original suggestion:
> On Apr 23, 2007, at 2:38 AM, Richard Kasperski wrote:
> I'd like to have a guarantee that a file copy is both completed and that the file is
whole. In the past I've done this  by copying the file to a temporary name tmp.<realname>
and then moving it to <realname> once I have the file copy is complete. This has the
following very nice properties; If the <realname> exists then the file copy is complete
and I'm not looking at a partial copy of the file. I believe that the copy to the cluster
has both of these properties in that the file doesn't appear in a DFS directory until the
whole file has been copied. The copy from the cluster to a local file system does not have
these guarantees and it would be very nice if it did. There are two scenarios under what I
wish to use this. First is that if I ctrl-c the 'hadoop dfs -copyToLocal' I know what parts
are complete and what parts aren't. Second I can run a background compressor to compress the
files as they are copied.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message