hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1658) A less expensive way to figure out directory size
Date Thu, 14 Apr 2011 20:39:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020021#comment-13020021
] 

Tsz Wo (Nicholas), SZE commented on HDFS-1658:
----------------------------------------------

- Here is an imaginary case unfavorable to option #1: a user program might first put {{FileStatus}}'s,
including directories and files, on a list.  Then, iterate all the entries and sum up the
lengths.

- For {{FileSystem}}, how about add a new method {{FileStatus.getChildrenCount()}} instead
of overloading {{FileStatus.getLen()}}?

- For {{FileContext}}, how about remove {{FileStatus.getLen()}} and add {{FileStatus.getBytes()}},
{{FileStatus.getChildrenCount()}}?

> A less expensive way to figure out directory size
> -------------------------------------------------
>
>                 Key: HDFS-1658
>                 URL: https://issues.apache.org/jira/browse/HDFS-1658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>
> Currently in order to figure out a directory size, we have to list a directory by calling
RPC getListing and get the number of its children. This is an expensive operation especially
when a directory has many children because it may require multiple RPCs.
> On the other hand when fetching the status of a path (i.e. calling RPC getFileInfo),
the length field of FileStatus is set to be 0 if the path is a directory.
> I am thinking to change this field (FileStatus#length) to be the directory size when
the path is a directory. So we can call getFileInfo to get the directory size. This call is
much less expensive and simpler than getListing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message