hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-14236) S3Guard: S3AFileSystem::rename() should move non-listed sub-directory entries in metadata store
Date Wed, 29 Mar 2017 22:32:41 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948009#comment-15948009
] 

Mingliang Liu edited comment on HADOOP-14236 at 3/29/17 10:32 PM:
------------------------------------------------------------------

[~fabbri], thanks for your comment and review. You get the point precisely, and even better.

For the inferred ancestor directories of nested directory, I was thinking that they're listed
explicitly in S3A {{listObjects()}}. I got that impression because when we mkdir(), unlike
creating new file, we don't delete its fake parent directory objects. There is a long-pending
TODO for {{mkdir()}}:
{quote}
// TODO: If we have created an empty file at /foo/bar and we then call
// mkdirs for /foo/bar/baz/roo what happens to the empty file /foo/bar/?
private boolean innerMkdirs(Path p, FsPermission permission)
{quote}
However, we can see that we still need to take care of its ancestors, as all ancestors are
not listed in later {{listObjects()}} call. For example (using your example), {{/a/source}}
is empty directory and after making subdirectory {{/a/source/c/d}}, later {{listObjects("/a/source")}}
will get {{/a/source}} and {{/a/source/c/d}}: w/o {{/a/source/c}} but w/ {{/a/source}}. I
filed [HADOOP-14255] to address this by deleting fake directory objects when {{mkdir()}}:
that will make the senmatic easier. Thanks [~stevel@apache.org] for offline discussion.

The new patch (tested against us-west-1):
# takes care of its ancestors for both a nested file and a nested directory.
# makes test code FileSystem-only in the new patch. They can not pass w/o this patch while
can pass w/ this patch.
# moves the "add inferred directories" logic into a helper function

For asserting MetadataStore (and {{AbstractMSContract#allowMissing()}} check), I found the
file system object in {{MetadataStoreTestBase}} is mocked. So testing integration code between
S3AFileSystem and S3Guard seems challenging there. If that part of code is MS dependent, perhaps
we can skip testing that part as long as the FS contract is verified carefully.


was (Author: liuml07):
[~fabbri], thanks for your comment and review. You get the point precisely, and even better.

For the inferred ancestor directories of nested directory, I was thinking that they're listed
explicitly in S3A {{listObjects()}}. That's because when we mkdir(), unlike creating new file,
we don't delete it's fake parent directory objects. There is a long-pending TODO for {{mkdir()}}:
{quote}
// TODO: If we have created an empty file at /foo/bar and we then call
// mkdirs for /foo/bar/baz/roo what happens to the empty file /foo/bar/?
private boolean innerMkdirs(Path p, FsPermission permission)
{quote}
However, we can see that we still need to take care of its ancestors, as all ancestors are
not listed in later {{listObjects()}} call. For example (using your example), {{/a/source}}
is empty directory and after making subdirectory {{/a/source/c/d}}, later {{listObjects("/a/source")}}
will get {{/a/source}} and {{/a/source/c/d}}: w/o {{/a/source/c}} but w/ {{/a/source}}. I
filed [HADOOP-14255] to address this by deleting fake directory objects when {{mkdir()}}:
that will make the senmatic easier.

The new patch (tested against us-west-1):
# takes care of its ancestors for both a nested file and a nested directory.
# makes test code FileSystem-only in the new patch. They can not pass w/o this patch while
can pass w/ this patch.
# moves the "add inferred directories" logic into a helper function

For asserting MetadataStore (and {{AbstractMSContract#allowMissing()}} check), I found the
file system object in {{MetadataStoreTestBase}} is mocked. So testing integration code between
S3AFileSystem and S3Guard seems challenging there. If that part of code is MS dependent, perhaps
we can skip testing that part as long as the FS contract is verified carefully.

> S3Guard: S3AFileSystem::rename() should move non-listed sub-directory entries in metadata
store
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-14236
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14236
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Mingliang Liu
>            Assignee: Mingliang Liu
>            Priority: Critical
>         Attachments: HADOOP-14236-HADOOP-13345.000.patch, HADOOP-14236-HADOOP-13345.001.patch,
HADOOP-14236-HADOOP-13345.002.patch, HADOOP-14236-HADOOP-13345.003.patch
>
>
> After running integration test {{ITestS3AFileSystemContract}}, I found the following
items are not cleaned up in DynamoDB:
> {code}
> parent=/mliu-s3guard/user/mliu/s3afilesystemcontract/testRenameDirectoryAsExisting/dir,
child=subdir
> parent=/mliu-s3guard/user/mliu/s3afilesystemcontract/testRenameDirectoryAsExistingNew/newdir/subdir,
child=file2
> {code}
> At first I thought it’s similar to [HADOOP-14226] or [HADOOP-14227], and we need to
be careful when cleaning up test data.
> Then I found it’s a bug(?) in the code of integrating S3Guard with S3AFileSystem: for
rename we miss sub-directory items to put (dest) and delete (src). The reason is that in S3A,
we delete those fake directory objects if they are not necessary, e.g. non-empty. So when
we list the objects to rename, the object summaries will only return _file_ objects. This
has two consequences after rename:
> #  there will be left items for src path in metadata store - left-overs will confuse
{{get(Path)}} which should return null
> # we are not persisting the whole subtree for dest path to metadata store - this will
break the DynamoDBMetadataStore invariant: _if a path exists, all its ancestors will also
exist in the table_.
> UPDATE: modified test case {{ITestS3AFileSystemContract:: testRenameDirectoryAsExistingDirectory()}}
will fail w/o this patch; it passes w/ this patch. If the test case makes sense, the proposal
follows.
> Existing tests are not complaining about this though. If this is a real bug, let’s
address it here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message