hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8598) Add and optimize for get LocatedFileStatus in DFSClient
Date Thu, 18 Jun 2015 16:12:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592038#comment-14592038

Andrew Wang commented on HDFS-8598:

Sorry if I missed something, doesn't DFS#listLocatedStatus already batch up calls via the
DirListingIterator? It shouldn't be one RPC per file.

We also generally don't want listing APIs that aren't iterators. Forming up a big listing
can be very expensive on the NN, which impacts other clients.

> Add and optimize for get LocatedFileStatus  in DFSClient
> --------------------------------------------------------
>                 Key: HDFS-8598
>                 URL: https://issues.apache.org/jira/browse/HDFS-8598
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Yong Zhang
>            Assignee: Yong Zhang
>         Attachments: HDFS-8598.001.patch
> If we want to get all files block locations in one directory, we have to call getFileBlockLocations
for each file, it will take long time because of too many request. 
> LocatedFileStatus has block location, but we can find it also call getFileBlockLocations
 for each file in DFSClient. this jira is trying to optimize with only one RPC. 

This message was sent by Atlassian JIRA

View raw message