hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yoram Arnon" <yar...@yahoo-inc.com>
Subject RE: [jira] Created: (HADOOP-713) dfs list operation is too expensive
Date Wed, 15 Nov 2006 20:19:38 GMT
 I opt for displaying the size in bytes for now, since it's computed anyway,
is readily available for free, and improves the UI.
If/when we fix HADOOP-713 we can replace the computation of size with a
better value for #files.
Let's not prevent an improvement just because it might change in the future.
Yoram

> -----Original Message-----
> From: Eric Baldeschwieler [mailto:eric14@yahoo-inc.com] 
> Sent: Tuesday, November 14, 2006 7:10 PM
> To: hadoop-dev@lucene.apache.org
> Subject: Re: [jira] Created: (HADOOP-713) dfs list operation 
> is too expensive
> 
> So let's display nothing for now and revisit this once we have a  
> cleaner CRC story.
> 
> 
> On Nov 14, 2006, at 10:55 AM, Hairong Kuang wrote:
> 
> > Setting the size of a directory to be the # of files is a good  
> > idea. But the
> > problem is that dfs name node has no idea of checksum 
> files. So the  
> > number
> > of files include that of checksum files. But what's displayed at  
> > the client
> > side has filtered out the checksum files. So the # of files does  
> > not match
> > what's really displayed at the client side.
> >
> > Hairong
> >
> > -----Original Message-----
> > From: Arkady Borkovsky [mailto:arkady@yahoo-inc.com]
> > Sent: Monday, November 13, 2006 5:07 PM
> > To: hadoop-dev@lucene.apache.org
> > Subject: Re: [jira] Created: (HADOOP-713) dfs list operation is too
> > expensive
> >
> > When listing a directory, for directory entries it may be more  
> > useful to
> > display the number of files in a directory, rather than the number  
> > of bytes
> > used by all the files in the directory and its subdirectories.
> > This a subjective opinion -- comments?
> >
> > (Currently, the value displayed subdirectory is "0")
> >
> > On Nov 13, 2006, at 3:25 PM, Hairong Kuang (JIRA) wrote:
> >
> >> dfs list operation is too expensive
> >> -----------------------------------
> >>
> >>                  Key: HADOOP-713
> >>                  URL: 
> http://issues.apache.org/jira/browse/HADOOP-713
> >>              Project: Hadoop
> >>           Issue Type: Improvement
> >>           Components: dfs
> >>     Affects Versions: 0.8.0
> >>             Reporter: Hairong Kuang
> >>
> >>
> >> A list request to dfs returns an array of DFSFileInfo. A 
> DFSFileInfo
> >> of a directory contains a field called contentsLen, indicating its
> >> size  which gets computed at the namenode side by resursively going
> >> through its subdirs. At the same time, the whole dfs directory  
> >> tree is
> >> locked.
> >>
> >> The list operation is used a lot by DFSClient for listing a  
> >> directory,
> >> getting a file's size and # of replicas, and getting the 
> size of dfs.
> >> Only the last operation needs the field contentsLen to be computed.
> >>
> >> To reduce its cost, we can add a flag to the list request.  
> >> ContentsLen
> >> is computed If the flag is set. By default, the flag is false.
> >>
> >> --
> >> This message is automatically generated by JIRA.
> >> -
> >> If you think it was sent incorrectly contact one of the
> >> administrators:
> >> http://issues.apache.org/jira/secure/Administrators.jspa
> >> -
> >> For more information on JIRA, see:
> >> http://www.atlassian.com/software/jira
> >>
> >>
> >
> >
> 
> 


Mime
View raw message