hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13914) s3guard: improve S3AFileStatus#isEmptyDirectory handling
Date Fri, 03 Mar 2017 00:05:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893332#comment-15893332

Aaron Fabbri commented on HADOOP-13914:

Thanks for the review [~mackrorysd].

I'm a bit concerned about all the S3AFileStatus -> FileStatus changes (although I could've
sworn that we already removed that assertion in PathMetadataTranslation...). I think it's
definitely a change for the better, but I'm a little worried there is some application out
there that this change will break, despite being @Private and @Evolving... 

Yes, this is a fundamental challenge for this patch.  I explicitly want to remove isEmptyDirectory
from the public API.  It should only be used internally because we don't want the cost of
always computing it, when it is usually never used.

Along similar lines, I wondered if a `public S3AFileStatus getFileStatus(Path, bool)` function
might be in order in S3AFileSystem? No idea how much it'll get used, but if anyone IS depending
on the current S3AFileStatus return type and wants that work done it'd be useful.

My understanding is that having isEmptyDirectory() on the S3AFileStatus is an internal optimization
to save a round trip for delete and rename.  Without a compelling need for this shortcut in
the public API, I would suggest people just use {{listStatus(dir)}} to determine emptiness.

There's an assertion in TestDynamoDBMetadataStore.java that isEmptyDirectory() is what we
expect. Why is that removed? Is it a Boolean -> Tristate issue? If so, shouldn't we modify
the logic to accept either the expected one or UNKNOWN?

After this patch, MetadataStore implementations are not required to return S3AFileStatus with
isEmptyDirectory set properly (which was broken anyways for DDB).  So that assertion no longer
makes sense.  I should probably:

- Replace that assert with one that checks PathMetadata#isEmptyDirectory() does not conflict
with expected value (i.e. UNKNOWN is always correct, but false versus true would be a failure).
- We should probably have some new MetadataStore contract test cases around {{PathMetadata#isEmptyDirectory}}.
 I can add one in the next patch. 

> s3guard: improve S3AFileStatus#isEmptyDirectory handling
> --------------------------------------------------------
>                 Key: HADOOP-13914
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13914
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
>         Attachments: HADOOP-13914-HADOOP-13345.000.patch, HADOOP-13914-HADOOP-13345.002.patch,
HADOOP-13914-HADOOP-13345.003.patch, s3guard-empty-dirs.md, test-only-HADOOP-13914.patch
> As discussed in HADOOP-13449, proper support for the isEmptyDirectory() flag stored in
S3AFileStatus is missing from DynamoDBMetadataStore.
> The approach taken by LocalMetadataStore is not suitable for the DynamoDB implementation,
and also sacrifices good code separation to minimize S3AFileSystem changes pre-merge to trunk.
> I will attach a design doc that attempts to clearly explain the problem and preferred
solution.  I suggest we do this work after merging the HADOOP-13345 branch to trunk, but am
open to suggestions.
> I can also attach a patch of a integration test that exercises the missing case and demonstrates
a failure with DynamoDBMetadataStore.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message