hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-972) distcp can timeout during rename operation to s3
Date Fri, 11 Sep 2009 01:36:00 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aaron Kimball updated MAPREDUCE-972:
------------------------------------

    Attachment: MAPREDUCE-972.patch

Attaching a patch which starts a background thread to increment mapper progress when the rename
operation is running.

We benchmarked S3 copy performance at ~4 MB/sec, which means that files in the 3--5 GB size
range may cause task timeouts during their renames into their final locations. This patch
will fix this issue.

This patch was tested manually by running distcp to upload data to s3n and verifying that
renames still worked as expected, and that log messages confirmed creation and destruction
of the background progress thread.

> distcp can timeout during rename operation to s3
> ------------------------------------------------
>
>                 Key: MAPREDUCE-972
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>    Affects Versions: 0.20.1
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-972.patch
>
>
> rename() in S3 is implemented as copy + delete. The S3 copy operation can perform very
slowly, which may cause task timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message