hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14159) Add some Java-8 friendly way to work with RemoteIterable, especially listings
Date Tue, 05 Dec 2017 14:57:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278669#comment-16278669
] 

Steve Loughran commented on HADOOP-14159:
-----------------------------------------

there's some of this in S3AUtils with HADOOP-13786; we can just pull it up to FS or nearby

> Add some Java-8 friendly way to work with RemoteIterable, especially listings
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-14159
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14159
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Steve Loughran
>
> There's a fair amount of Hadoop code which uses {{FileSystem.listStatus(path) }} just
to get an {{FileStatus[]}} array which they can then iterate over in a {{for}} loop.
> This is inefficient and scales badly, as the entire listing is done before the compute;
it cannot handle directories with millions of entries. 
> The listLocatedStatus() calls return a RemoteIterator class, which can't be used in for
loops as it has the right to throw an IOE in any hasNext/next call. That doesn't matter, as
we now have closures and simple stream operations.
> {code}
>  listLocatedStatus(path).filter((st) -> st.length > 0).apply(st -> fs.delete(st.path))}}
> {code}
> See? We could do shiny new closure things. It wouldn't necessarily need changes to FileSystem
either, just something which took {{RemoteIterator}} and let you chain some closures off it,
similar to the java 8 streams operations.
> Once implemented, we can move to using it in the Hadoop code wherever we  use listFiles()
today



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message