hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-972) distcp can timeout during rename operation to s3
Date Sun, 18 Oct 2009 09:12:31 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chris Douglas updated MAPREDUCE-972:

    Status: Open  (was: Patch Available)

Really sorry to find this issue so late, but a progress thread that forbids tasks from timing
out is not a good solution, particularly for distcp, where task timeouts are both legal and
useful. If s3 requires a more elaborate rename mechanism, is there a way to push this into
its implementation? While distcp may be a heavier user than most user jobs, the latter would
also appreciate a more robust solution.

Starting and waiting a thread for every rename is also not an ideal design; the current polls
{{isComplete}} only every three seconds, slowing all the renames.

> distcp can timeout during rename operation to s3
> ------------------------------------------------
>                 Key: MAPREDUCE-972
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>    Affects Versions: 0.20.1
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-972.2.patch, MAPREDUCE-972.3.patch, MAPREDUCE-972.4.patch,
MAPREDUCE-972.5.patch, MAPREDUCE-972.patch
> rename() in S3 is implemented as copy + delete. The S3 copy operation can perform very
slowly, which may cause task timeout.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message