hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HADOOP-79) listFiles optimization
Date Tue, 14 Mar 2006 20:57:45 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-79?page=all ]
     
Doug Cutting resolved HADOOP-79:
--------------------------------

    Fix Version: 0.1
     Resolution: Fixed
      Assign To: Konstantin Shvachko

This looks fine to me.  I simplified FSDirectory.isDir() a bit more & committed this.

Did you find this to be a bottleneck in benchmarks?  BTW, I have had some success profiling
Hadoop daemons using Sun's built-in sampling profiler.  I simply set HADOOP_OPTS to  '-agentlib:hprof=cpu=samples,interval=20'
before starting a daemon.  Then, when I stop that daemon, it dumps profile data to a text
file.

And, finally, yes, DFSFileInfo could re-use the length field for both purposes.  But this
class is only used for interchange, right?, so making it small will only serve to make RPC's
a bit faster and won't save a lot of memory.

> listFiles optimization
> ----------------------
>
>          Key: HADOOP-79
>          URL: http://issues.apache.org/jira/browse/HADOOP-79
>      Project: Hadoop
>         Type: Improvement
>   Components: dfs
>     Reporter: Konstantin Shvachko
>     Assignee: Konstantin Shvachko
>      Fix For: 0.1
>  Attachments: DFSFileInfo.patch
>
> In FSDirectory.getListing() looking at line
> listing[i] = new DFSFileInfo(curName, cur.computeFileLength(), cur.computeContentsLength(),
isDir(curName));
> 1. computeContentsLength() is actually calling computeFileLength(), so this is called
twice,
> meaning that file length is calculated twice.
> 2. isDir() is looking for the INode (starting from the rootDir) that has actually been
obtained
> just two lines above, note that the tree is locked by that time.
> I propose a simple optimization for this, see attachment.
> 3. A related question: Why DFSFileInfo needs 2 separate fields len for file length and
> contentsLen for directory contents size? It looks like these fields are mutually exclusive,
> and we can use just one, interpreting it one way or another with respect to the value
of isDir.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message