hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-972) distcp can timeout during rename operation to s3
Date Mon, 19 Oct 2009 00:01:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767166#action_12767166

Aaron Kimball commented on MAPREDUCE-972:

My proposal is slightly different than that:

The progress thread is in one of three states:

1) {{inRename = true && isComplete == false}}
2) {{inRename = false && isComplete == false}}
3) {{isComplete = true}}

When inRename is set to true, the progress thread will call {{progress()}} every few seconds,
for up to a max of {{distcp.rename.timeout}} seconds. If it is still in this state after {{distcp.rename.timeout}}
seconds have elapsed since the state began, it will set inRename to false.

When inRename is false, it just sits there, waiting for another rename operation to start.
It sleeps and occasionally polls for a state change on inRename or isComplete. Changing inRename
back to true again will go into the previously-described state; {{distcp.rename.timeout}}
starts anew from this time point.

If isComplete is true, the thread exits immediately. The {{Mapper.close()}} method will set
isComplete to true to ensure that the thread shuts down. (As the thread is {{setDaemon(true)}},
the JVM will exit even without this detail, but it is good hygeine to do so anyway.)

It is not sufficient to simply call progress() right before rename(). Experience has shown
that when uploading large files to S3, the rename() operation itself can take in excess of
10 minutes. rename() in S3 is implemented as copy-and-delete. For multi-GB files, this can
take a long time.

If we just tell people to set their global task timeout to 30 minutes, then this will delay
task restarts under other conditions where the timeout value is expected to be considerably
shorter (e.g., an individual file {{write()}} operation). This can adversely affect distcp
performance in the general case.

> distcp can timeout during rename operation to s3
> ------------------------------------------------
>                 Key: MAPREDUCE-972
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>    Affects Versions: 0.20.1
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-972.2.patch, MAPREDUCE-972.3.patch, MAPREDUCE-972.4.patch,
MAPREDUCE-972.5.patch, MAPREDUCE-972.patch
> rename() in S3 is implemented as copy + delete. The S3 copy operation can perform very
slowly, which may cause task timeout.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message