hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-1658) A less expensive way to figure out directory size
Date Fri, 25 Feb 2011 22:59:21 GMT
A less expensive way to figure out directory size
-------------------------------------------------

                 Key: HDFS-1658
                 URL: https://issues.apache.org/jira/browse/HDFS-1658
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Hairong Kuang
            Assignee: Hairong Kuang


Currently in order to figure out a directory size, we have to list a directory by calling
RPC getListing and counts its child size. This is an expensive operation if a directory is
huge.

On the other hand when fetching the status of a path (i.e. calling RPC getFileInfo), the length
field of FileStatus is set to be 0 if the path is a directory.

I am thinking to change this field (FileStatus#length) to be the directory size when the path
is a directory. So we can call getFileInfo to get the directory size. This call is much less
expensive and simpler than getListing.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message