hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1658) A less expensive way to figure out directory size
Date Sat, 26 Feb 2011 00:33:22 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999666#comment-12999666

Tsz Wo (Nicholas), SZE commented on HDFS-1658:

How would you define directory size then?

> A less expensive way to figure out directory size
> -------------------------------------------------
>                 Key: HDFS-1658
>                 URL: https://issues.apache.org/jira/browse/HDFS-1658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
> Currently in order to figure out a directory size, we have to list a directory by calling
RPC getListing and counts its child size. This is an expensive operation if a directory is
> On the other hand when fetching the status of a path (i.e. calling RPC getFileInfo),
the length field of FileStatus is set to be 0 if the path is a directory.
> I am thinking to change this field (FileStatus#length) to be the directory size when
the path is a directory. So we can call getFileInfo to get the directory size. This call is
much less expensive and simpler than getListing.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message