hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang" <hair...@yahoo-inc.com>
Subject RE: [jira] Created: (HADOOP-713) dfs list operation is too expensive
Date Tue, 14 Nov 2006 18:55:24 GMT
Setting the size of a directory to be the # of files is a good idea. But the
problem is that dfs name node has no idea of checksum files. So the number
of files include that of checksum files. But what's displayed at the client
side has filtered out the checksum files. So the # of files does not match
what's really displayed at the client side.


-----Original Message-----
From: Arkady Borkovsky [mailto:arkady@yahoo-inc.com] 
Sent: Monday, November 13, 2006 5:07 PM
To: hadoop-dev@lucene.apache.org
Subject: Re: [jira] Created: (HADOOP-713) dfs list operation is too

When listing a directory, for directory entries it may be more useful to
display the number of files in a directory, rather than the number of bytes
used by all the files in the directory and its subdirectories.
This a subjective opinion -- comments?

(Currently, the value displayed subdirectory is "0")

On Nov 13, 2006, at 3:25 PM, Hairong Kuang (JIRA) wrote:

> dfs list operation is too expensive
> -----------------------------------
>                  Key: HADOOP-713
>                  URL: http://issues.apache.org/jira/browse/HADOOP-713
>              Project: Hadoop
>           Issue Type: Improvement
>           Components: dfs
>     Affects Versions: 0.8.0
>             Reporter: Hairong Kuang
> A list request to dfs returns an array of DFSFileInfo. A DFSFileInfo 
> of a directory contains a field called contentsLen, indicating its 
> size  which gets computed at the namenode side by resursively going 
> through its subdirs. At the same time, the whole dfs directory tree is 
> locked.
> The list operation is used a lot by DFSClient for listing a directory, 
> getting a file's size and # of replicas, and getting the size of dfs.
> Only the last operation needs the field contentsLen to be computed.
> To reduce its cost, we can add a flag to the list request. ContentsLen 
> is computed If the flag is set. By default, the flag is false.
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the
> administrators: 
> http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see: 
> http://www.atlassian.com/software/jira

View raw message