hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13926) S3Guard: Improve listLocatedStatus and listFiles
Date Fri, 31 Mar 2017 07:06:41 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950435#comment-15950435
] 

Aaron Fabbri commented on HADOOP-13926:
---------------------------------------

This is a good start, thank you for rebasing. I think this still needs:

1. To handle {{listFiles(recursive=true)}}.

2. Merge S3 and MetadataStore (like {{listStatus()}}) for non-authoritative (i.e. "not all
directory contents are in MetadataStore") case.

For the {{recursive=false}} (and {{listLocatedStatus()}}) case, this patch is almost there,
except it needs to handle non-authoritative case where we have to merge MetadataStore output
with the S3 iterator.  I can think of a simple algorithm for that case (until we add paging
for MetadataStore).  (Make a {{Set}} which is copy of DirListingMetadata, as you return S3
iterator results, remove those paths from the {{Set}}.  When S3 iterator becomes empty, return
remaining entries in the {{Set}}.

For {{recursive=true}} it will be a little trickier.  I can think of another non-paged (non-scalable)
algorithm.  Later, when we have full directory entry paging for DirListingMetadata it will
get more interesting.  We may have to introduce some ordering to the S3 iterator to do it
efficiently.

 For unblocking merge to trunk, how about the caveat that S3Guard list consistency does not
support listFiles() yet?  You simply get S3 results without additional consistency guarantees
and we'd implement listFiles() after merge.

I will be available to work on this soon (I budgeted some time in a week or two) if that helps.


> S3Guard: Improve listLocatedStatus and listFiles
> ------------------------------------------------
>
>                 Key: HADOOP-13926
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13926
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Rajesh Balamohan
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13926-HADOOP-13345.001.patch, HADOOP-13926.wip.proto.branch-13345.1.patch
>
>
> Need to check if {{listLocatedStatus}} can make use of metastore's listChildren feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message