hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yoram Arnon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-786) Implement getContentSummary(..) in HftpFileSystem
Date Wed, 25 Nov 2009 22:47:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782661#action_12782661

Yoram Arnon commented on HDFS-786:

Adding some numbers:

running the command locally, on a particular webmap cluster's namenode, takes 3 seconds:
time hadoop/bin/hadoop fs -dus /.../atoms

real    0m2.916s
user    0m1.215s
sys     0m0.171s

running the same command, still locally, using hftp, it takes 18 minutes:
time hadoop/bin/hadoop fs -dus hftp://.../atoms
real    18m11.154s
user    10m37.726s
sys     0m16.516s

running the command remotely, from a client in a different datacenter, again using hftp, took
3 hours and change (sorry, no 'time' info)

> Implement getContentSummary(..) in HftpFileSystem
> -------------------------------------------------
>                 Key: HDFS-786
>                 URL: https://issues.apache.org/jira/browse/HDFS-786
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
> HftpFileSystem does not override getContentSummary(..).  As a result, it uses FileSystem's
default implementation, which computes content summary on the client side by calling listStatus(..)
recursively.  In contrast, DistributedFileSystem has overridden getContentSummary(..) and
does the computation on the NameNode.
> As a result, running "fs -dus" on hftp is much slower than running it on hdfs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message