hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Liang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14477) FileSystem Simplify / Optimize listStatus Method
Date Wed, 07 Jun 2017 21:34:18 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041661#comment-16041661
] 

Chen Liang commented on HADOOP-14477:
-------------------------------------

Thanks [~belugabehr] for pointing this out! 

v2 patch makes sense to me, but one thing concerns me a little bit. It seems that, unless
empty array is returned, there will be one ArrayList object created for each call of {{filterListStatus}}.
As a result, seems for the {{listStatus(Path[] files, PathFilter filter)}} code path, one
extra ArrayList object will be created for each element in the {{files}} array compared to
original code. This might add GC overhead if this call happens frequently enough.

> FileSystem Simplify / Optimize listStatus Method
> ------------------------------------------------
>
>                 Key: HADOOP-14477
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14477
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 2.7.3, 3.0.0-alpha3
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Minor
>         Attachments: HADOOP-14477.1.patch, HADOOP-14477.2.patch
>
>
> {code:title=org.apache.hadoop.fs.FileSystem.listStatus(ArrayList<FileStatus>, Path,
PathFilter)}
>   /*
>    * Filter files/directories in the given path using the user-supplied path
>    * filter. Results are added to the given array <code>results</code>.
>    */
>   private void listStatus(ArrayList<FileStatus> results, Path f,
>       PathFilter filter) throws FileNotFoundException, IOException {
>     FileStatus listing[] = listStatus(f);
>     if (listing == null) {
>       throw new IOException("Error accessing " + f);
>     }
>     for (int i = 0; i < listing.length; i++) {
>       if (filter.accept(listing[i].getPath())) {
>         results.add(listing[i]);
>       }
>     }
>   }
> {code}
> {code:title=org.apache.hadoop.fs.FileSystem.listStatus(Path, PathFilter)}
>   public FileStatus[] listStatus(Path f, PathFilter filter) 
>                                    throws FileNotFoundException, IOException {
>     ArrayList<FileStatus> results = new ArrayList<FileStatus>();
>     listStatus(results, f, filter);
>     return results.toArray(new FileStatus[results.size()]);
>   }
> {code}
> We can be smarter about this:
> # Use enhanced for-loops
> # Optimize for the case where there are zero files in a directory, save on object instantiation
> # More encapsulated design



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message