hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Phillips (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-4339) Improve FsShell -du/-dus and FileSystem.getContentSummary efficiency
Date Fri, 03 Oct 2008 21:23:44 GMT
Improve FsShell -du/-dus and FileSystem.getContentSummary efficiency
--------------------------------------------------------------------

                 Key: HADOOP-4339
                 URL: https://issues.apache.org/jira/browse/HADOOP-4339
             Project: Hadoop Core
          Issue Type: Bug
          Components: fs
    Affects Versions: 0.18.1
            Reporter: David Phillips


FsShell.du has two inefficiencies:

* calling getContentSummary twice for each top-level item rather than calling it once and
saving the result
* calling getContentSummary for files rather than using the size it already has in FileStatus

getContentSummary has one:

* calling itself for files rather than using the length it already has in FileStatus

Every call to getContentSummary results in a call to getFileStatus, which may be expensive
(e.g. NativeS3FileSystem has both network latency and actual monetary cost).

The simple solution:

* FsShell.du calls once per item and saves the ContentSummary
* FsShell.du uses FileStatus.getLen for files
* getContentSummary only calls itself for directories

Another solution, rather than adding special casing to callers, is to add a getContentSummary
that takes a FileStatus.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message