hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-713) dfs list operation is too expensive
Date Wed, 14 Nov 2007 00:30:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542291
] 

Doug Cutting commented on HADOOP-713:
-------------------------------------

> The client computes the size of a directory by recursively traversing all nodes in the
subtree.

I think it worked that way at one time in the past, and was found to put too much RPC load
on the namenode.  When someone wants to know the size of a directory (du -s) it is much more
efficient to do the recursion server-side on the namenode.  We should avoid doing it for every
directory listing, but we should still do it server-side.

> dfs list operation is too expensive
> -----------------------------------
>
>                 Key: HADOOP-713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-713
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.8.0
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.1
>
>         Attachments: optimizeComputeContentLen.patch
>
>
> A list request to dfs returns an array of DFSFileInfo. A DFSFileInfo of a directory contains
a field called contentsLen, indicating its size  which gets computed at the namenode side
by resursively going through its subdirs. At the same time, the whole dfs directory tree is
locked.
> The list operation is used a lot by DFSClient for listing a directory, getting a file's
size and # of replicas, and getting the size of dfs. Only the last operation needs the field
contentsLen to be computed.
> To reduce its cost, we can add a flag to the list request. ContentsLen is computed If
the flag is set. By default, the flag is false.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message