hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Ahrens (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6345) Refactor Trash::moveToTrash() and its accompanying poor tests
Date Wed, 23 Dec 2009 16:04:29 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794088#action_12794088

Matt Ahrens commented on HADOOP-6345:

Regarding the two general concerns, here are my comments:

"It seems like the essential, convoluted logic of the moveToTrash method remains. I had thought
that this would be a good chance to clean up some of the nested try-catch blocks. And the
existing for loops seems particularly confusing."

The convoluted logic exists because of the potential race condition that can occur between
the mkdir and the rename.  If a checkpoint() call gets executed between the mkdir and rename
call, it will have moved the newly made dir and the rename will fail.  So the author must
have handled it by running the commands twice, figuring the odds of two checkpoints happening
that closely together is very small.  One possible alternative (to eliminate the race condition)
would be to move the file to a temporary workspace before moving it to the trash to avoid
conflicts with the checkpoint.  For example:

CURRENT BEHAVIOR: file foo/bar gets moved to trash directly 
a) File foo/bar is designated to be moved to  trash
b) moveToTrash() creates .Trash/Current/foo
c) moveToTrash() renames foo/bar to .Trash/Current/foo
d) if rename fails (because of checkpoint() race condition), repeat (b) and (c) once
e) result is .Trash/Current/foo/bar

PROPOSED BEHAVIOR (to eliminate race condition):
a) File foo/bar is designated to be moved to  trash
b) moveToTrash() creates .Trash/.tmpspace/foo
c) moveToTrash() renames foo/bar to .Trash/.tmpspace/foo/bar
d) moveToTrash() renames .Trash/.tmpspace/foo/bar to ./Trash/Current/foo/bar
e) if (d) rename fails because of dir name conflict, rename .Trash/.tmpspace/foo/bar to .Trash/.tmpspace/foo-<timestamp>/bar
and repeat (d)
f) e) result is .Trash/Current/foo/bar (or .Trash/Current/foo-<timestamp>/bar if name
conflict on top-level dir)

So the proposed behavior definitely would be possible, but it would be a change to a function
that has been stable for while (even though is has convoluted logic).  Any thoughts???

"Of slight concern to me is the change that prevents the user from deleting a file that's
in the trash already from the trash. I guess now that we have -skipTrash there is a method
for a use to selectively purge items from the trash, but this will still be an incompatible

Yes, I understand that the incompatible change may not be necessary.  I am not again leaving
it such that a user can delete a file from the trash to leave compatibility there.  So it
definitely is possible to leave Path 3 (as referenced above) to just return false and not
throw an exception as my patch does.

> Refactor Trash::moveToTrash() and its accompanying poor tests
> -------------------------------------------------------------
>                 Key: HADOOP-6345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6345
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.20.1
>            Reporter: Jakob Homan
>            Assignee: Matt Ahrens
>         Attachments: HADOOP-6345.2.patch, HADOOP-6345.patch
> We've had several issues relating to the Trash and its access via the shell, none of
which were picked up by the unit tests.  The current moveToTrash method has 8 different ways
it can terminate and sometimes uses a false value and sometimes uses an exception to indicate
failure.  This method should be refactored to improve readability and testability, and new
tests written to exercise all possible code paths.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message