hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arkady Borkovsky <ark...@yahoo-inc.com>
Subject Re: [jira] Created: (HADOOP-713) dfs list operation is too expensive
Date Tue, 14 Nov 2006 01:07:14 GMT
When listing a directory, for directory entries it may be more useful 
to display the number of files in a directory, rather than the number 
of bytes used by all the files in the directory and its subdirectories.
This a subjective opinion -- comments?

(Currently, the value displayed subdirectory is "0")

On Nov 13, 2006, at 3:25 PM, Hairong Kuang (JIRA) wrote:

> dfs list operation is too expensive
> -----------------------------------
>                  Key: HADOOP-713
>                  URL: http://issues.apache.org/jira/browse/HADOOP-713
>              Project: Hadoop
>           Issue Type: Improvement
>           Components: dfs
>     Affects Versions: 0.8.0
>             Reporter: Hairong Kuang
> A list request to dfs returns an array of DFSFileInfo. A DFSFileInfo 
> of a directory contains a field called contentsLen, indicating its 
> size  which gets computed at the namenode side by resursively going 
> through its subdirs. At the same time, the whole dfs directory tree is 
> locked.
> The list operation is used a lot by DFSClient for listing a directory, 
> getting a file's size and # of replicas, and getting the size of dfs. 
> Only the last operation needs the field contentsLen to be computed.
> To reduce its cost, we can add a flag to the list request. ContentsLen 
> is computed If the flag is set. By default, the flag is false.
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the 
> administrators: 
> http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see: 
> http://www.atlassian.com/software/jira

View raw message