hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Gummadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-649) distcp should validate the data copied
Date Tue, 15 Sep 2009 10:29:57 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755445#action_12755445

Ravi Gummadi commented on MAPREDUCE-649:

>>However, failure to match the checksum doesn't count as a failed copy, but as an attempt
in a different retry mechanism.

After specified number of retries(distcp.file.retries is the config property, with a default
value of 3 => by default, 2 more times copy of file is tried in case of checksum mismatch),
it is considered as a failure.

>>Are invalid/corrupt copies more common than other failures that throw exceptions (e.g.
(re)moved source files, IOExceptions, etc.)?

Am not sure. But this feature would give more confidence to users about copy done by distcp.

>> as long as it's been tested then that's OK.

Yes. It is tested with different cases.

> distcp should validate the data copied
> --------------------------------------
>                 Key: MAPREDUCE-649
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-649
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: distcp
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>         Attachments: d_verify.patch, d_verify649.patch
> distcp should validate the files copied by checking the checksums, if the filesystem
supports checksums.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message