hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10987) Provide an iterator-based listing API for FileSystem
Date Fri, 10 Jun 2016 15:43:21 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324657#comment-15324657

Steve Loughran commented on HADOOP-10987:

I've only just noticed this. It would have been really nice if someone had actually written
the bit of the filesystem specification to cover this, as well as adding a specific contract
test which could then be applied, consistently, to all filesystems. The way we do with every
bit of the FS API.

Instead, there is a method whose javadocs are incomplete. What are the prerequisites? The
post reqs? What happens if the path is missing? what concurrency guarantees —if any— can
be made? 

These are the kind of things people trying to use an API need to know, and people trying to
do other implementations of an API need to understand. Even HDFS benefits from them as it
helps define how much of its behaviour is intentional.

I'm not going to address this as part of HADOOP-13207; instead I've filed a new JIRA, HADOOP-13256,
to cover the task of defining this API properly and producing the cross-FS tests to validate
its behaviour. Does anyone want to do this?

> Provide an iterator-based listing API for FileSystem
> ----------------------------------------------------
>                 Key: HADOOP-10987
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10987
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 2.7.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 2.7.0
>         Attachments: HADOOP-10987.patch, HADOOP-10987.v2.patch, HADOOP-10987.v3.patch,
HADOOP-10987.v4.branch-2.patch, HADOOP-10987.v4.patch, HADOOP-10987.v4_with_comment_fix.branch-2.patch,
> Iterator based listing methods already exist in {{FileContext}} for both simple listing
and listing with locations. However, {{FileSystem}} lacks the former.  From what I understand,
it wasn't added to {{FileSystem}} because it was believed to be phased out soon. Since {{FileSystem}}
is very well alive today and new features are getting added frequently, I propose adding an
iterator based {{listStatus}} method. As for the name of the new method, we can use the same
name used in {{FileContext}} : {{listStatusIterator()}}.
> It will be particularly useful when listing giant directories. Without this, the client
has to build up a huge data structure and hold it in memory. We've seen client JVMs running
out of memory because of this.
> Once this change is made, we can modify FsShell, etc. in followup jiras.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message