hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Gummadi (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-5444) Add atomic move option
Date Wed, 17 Jun 2009 05:52:07 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ravi Gummadi updated HADOOP-5444:

    Attachment: d_retries_atomic.patch

Here is a patch that supports atomic copies and atomic updates.

distcp -atomic <stagedir> src* dst

Instead of ending up in quota issues(if we consider our own stage dir some where) or access
permissions(if we consider the stage dir as a sibling to the dest dir), stagedir is taken
as argument with -atomic option.

Mapreduce job would copy to stagedir in case of atomic copy and finally the contents of stagedir
are moved to dest dir by distcp. In case of atomic update(-update and -atomic <stagedir>),
final move would happen file by file as some of the files/dirs could already be there in dest

This patch also includes code changes of -retries <num_tries> option (HADOOP-6060),
as there are dependent code changes.
With -retries <num_tries>, distcp would launch at most num_tries jobs in case of transient
failures. Retries are done with -update option enabled.

Patch also contains a testcase to test atomic copy with job retries.

Please review and provide your comments.

> Add atomic move option
> ----------------------
>                 Key: HADOOP-5444
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5444
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>    Affects Versions: 0.18.3
>            Reporter: Richard Theige
>         Attachments: d_retries_atomic.patch
> Provide support for update to move directories/files atomically by copying the src directory
to a tmp directory (with random/unique name) then move the directory to its target destination
name after all subdirs/files are copied and verified.
> example option ideas
>   hadoop ... distcp -update -move src dst
> or
>   hadoop ... distcp -update -atomic src dst
> to assure file correctness at the destination, before distcp performs the  'move' at
the end of the copy process, it should first perform a strong signature/cksum (e.g. MD4) on
the files.
> The issue/need for this is that applications may attempt to start processing data (because
files are present), prior to completion of a whole directory copy -- resulting in work against
an incomplete data set.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message