hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13651) S3Guard: S3AFileSystem Integration with MetadataStore
Date Mon, 31 Oct 2016 19:16:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623076#comment-15623076

Aaron Fabbri commented on HADOOP-13651:

Great comments, thank you. 

Wrap with a LOG.isDebugEnabled()
Good call, will do.  I'll roll another patch later today with this.

The is empty dir logic here is getting ugly already. I think it may be best to revisit that
entire empty-dir logic 

Totally agree.  Having a FileStatus which is cacheable (doesn't change based on activity in
other files)  is sort of a prerequisite for playing nicely with the MetadataStore division
of labor.  I can reason about the logic here, but the isS3A flag in LocalMetadataStore is
a layering violation, IMO.

My thought was to work on isEmptyDirectory as a separate effort after S3Guard v1 is merged.
 I kept the isS3A stuff separated for that reason.  I think we may want to revisit the invariants
around the empty directory blobs as well.  E.g. instead of "exists(empty directory blob) iff
directory is empty" the condition would be "exists(empty directory blob) implies there is
a directory at that path", which is only necessary if there are no other keys with a matching
prefix.  I wonder if being lazier about cleaning up those blobs would improve s3a perf, etc.

 If, instead of asking of the s3a status, it was just something which could be queried off
the metadata store, then it gets to implement the logic behind S3Guard.isEmptyDirectory(metadatastore,
MetadataStore may give one of three answers to this question:

(1) Yes, that path is an empty directory
(2) No, that path is not an empty directory
(3) I do not have full state for that directory, I cannot answer.

This could be used to do the right thing in the client, but it may take some refactoring.
 Let me know how you want to tackle this part.  I'd vote to defer work on this part to a separate
jira because we want to keep other parts of the project moving/integrated, and i think this
will be tricky enough to benefit from a separate code review and discussion.  The existing
solution is bad layering but keeps the semantics of isEmptyDirectory as is for S3A. 

Also, on TODOs in code in the feature branch. Intent is to commit frequently so folks can
keep working in parallel, and that all TODOs will be addressed before merging the feature

> S3Guard: S3AFileSystem Integration with MetadataStore
> -----------------------------------------------------
>                 Key: HADOOP-13651
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13651
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
>         Attachments: HADOOP-13651-HADOOP-13345.001.patch, HADOOP-13651-HADOOP-13345.002.patch,
> Modify S3AFileSystem et al. to optionally use a MetadataStore for metadata consistency
and caching.
> Implementation should have minimal overhead when no MetadataStore is configured.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message