hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-972) distcp can timeout during rename operation to s3
Date Mon, 19 Oct 2009 22:21:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767579#action_12767579

Chris Douglas commented on MAPREDUCE-972:

bq. Isn't the FileSystem API mid-rewrite right now in HADOOP-6223? So now might actually be
the rare time to consider something like this. It's unfortunate that Rename.Options is an
Enum, so it'd be hard to add a progress function there without changing that. Perhaps Rename.Options.OVERWRITE
could still be a constant, but Rename.Options#createProgress(Progressible) could return a
subclass of Rename.Options that wraps a Progressible or somesuch. I don't mean to push this
approach, rather just to question whether it should be ruled out completely. If it seems reasonable
for file rename implementations to take a long time, then adding a progress callback might
be a reasonable approach.

I agree. Long-running renames should be considered/debated in the design of FileContext/AFS,
particularly since we often use rename to promote output. Adding a cstr for FileContext that
takes a Progressable, then adding a Progressable to all the AFS APIs would probably work.
The Util inner class could also be created with it, so listStatus and copy could also update
progress. Most implementations can ignore it, but that would at least push the workaround
for S3 into the right layer.

However, for this issue, patching either the old or new API is a non-starter. DistCp uses
old APIs, and I'd much rather upgrade it and address the more general Progressable questions
in other issues. Approaching all that in a single issue, particularly one devoted to a timeout
in S3, imports a lot of baggage. Interesting, important baggage, but this is only a use case
in that broader context.

bq. Using a single timeout value for all operations makes program execution overall considerably
less efficient than it should be. Writes and renames in distcp can expect different running
times; we should treat them this way.

Every file is copied twice. I'm not sure a long task timeout leaves too much performance on
the table. Your point about tuning timeouts for particular operations is taken, but the payoff
is too low for the complexity this adds. Both this and raising the task timeout for the job
are hacks, but as Doug points out: we're going to have to solve this in general, too. The
task timeout is a hack we have and know.

bq. Looking at FilterFileSystem, I think that's the most general and non-invasive solution.

that would be the cleanest place to add a thread, but it's still not much of a win over bumping
the task timeout for the job. Updating the DistCp guide with notes for S3 users is an unambiguous

> distcp can timeout during rename operation to s3
> ------------------------------------------------
>                 Key: MAPREDUCE-972
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>    Affects Versions: 0.20.1
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-972.2.patch, MAPREDUCE-972.3.patch, MAPREDUCE-972.4.patch,
MAPREDUCE-972.5.patch, MAPREDUCE-972.patch
> rename() in S3 is implemented as copy + delete. The S3 copy operation can perform very
slowly, which may cause task timeout.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message