hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14000) s3guard metadata stores to support millons of children
Date Thu, 19 Jan 2017 13:50:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829961#comment-15829961

Steve Loughran commented on HADOOP-14000:

DDB docs say 

bq. The result set from a Query is limited to 1 MB per call. You can use the LastEvaluatedKey
from the query response to retrieve more results.

the max # of files you get then will be limited by parent path length and the # of children:
the longer the directory tree, the fewer children you get

As well as this limit marker, there's a paging mechanism for paged responses, which can then
be iterated over.

To scale, then

# {{DirListingMetadata()}} needs to move from a simple collection of children, to an abstract
class offering an iterator over the children
# the DDB store must return a special iterator here, with the same flow as {{org.apache.hadoop.fs.s3a.Listing}}.
Ideally, it should return {{RemoteIterator<LocatedFileStatus>}}, so that it can be directly
wired up to the listing mechanism of {{LocatedFileStatusIterator}}
# the local store could still cache the values in its own subclass of {{DirListingMetadata()}}
# testing!

> s3guard metadata stores to support millons of children
> ------------------------------------------------------
>                 Key: HADOOP-14000
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14000
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
> S3 repos can have millions of child entries
> Currently {{DirListingMetaData}} can't and {{MetadataStore.listChildren(Path path)}}
won't be able to handle directories that big, for listing, deleting or naming.
> We will need a paged response from the listing operation, something which can be iterated

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message