hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1658) A less expensive way to figure out directory size
Date Thu, 14 Apr 2011 20:39:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020021#comment-13020021

Tsz Wo (Nicholas), SZE commented on HDFS-1658:

- Here is an imaginary case unfavorable to option #1: a user program might first put {{FileStatus}}'s,
including directories and files, on a list.  Then, iterate all the entries and sum up the

- For {{FileSystem}}, how about add a new method {{FileStatus.getChildrenCount()}} instead
of overloading {{FileStatus.getLen()}}?

- For {{FileContext}}, how about remove {{FileStatus.getLen()}} and add {{FileStatus.getBytes()}},

> A less expensive way to figure out directory size
> -------------------------------------------------
>                 Key: HDFS-1658
>                 URL: https://issues.apache.org/jira/browse/HDFS-1658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
> Currently in order to figure out a directory size, we have to list a directory by calling
RPC getListing and get the number of its children. This is an expensive operation especially
when a directory has many children because it may require multiple RPCs.
> On the other hand when fetching the status of a path (i.e. calling RPC getFileInfo),
the length field of FileStatus is set to be 0 if the path is a directory.
> I am thinking to change this field (FileStatus#length) to be the directory size when
the path is a directory. So we can call getFileInfo to get the directory size. This call is
much less expensive and simpler than getListing.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message