hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sameer Paranjpye (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-713) dfs list operation is too expensive
Date Wed, 14 Nov 2007 00:56:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542296

Sameer Paranjpye commented on HADOOP-713:

> I think it worked that way at one time in the past, and was found to put too much RPC
load on the namenode

True, but at the time I think we were making a getLength RPC for every file encountered. With
the new listStatus API we could get the sizes of all the files in a directory and then recursively
call listStatus for the subdirectories. This would be significantly lower RPC load than an
invocation per file. 

One observation to take into account is that 'ls -r' recurses on the client side, makes a
call per directory in a tree and is a pretty frequent operation (certainly more frequent than
du). Since 'ls -r' doesn't appear to overburden the Namenode with RPCs, it feels like that
ought to be true for du as well.

> dfs list operation is too expensive
> -----------------------------------
>                 Key: HADOOP-713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-713
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.8.0
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.1
>         Attachments: optimizeComputeContentLen.patch
> A list request to dfs returns an array of DFSFileInfo. A DFSFileInfo of a directory contains
a field called contentsLen, indicating its size  which gets computed at the namenode side
by resursively going through its subdirs. At the same time, the whole dfs directory tree is
> The list operation is used a lot by DFSClient for listing a directory, getting a file's
size and # of replicas, and getting the size of dfs. Only the last operation needs the field
contentsLen to be computed.
> To reduce its cost, we can add a flag to the list request. ContentsLen is computed If
the flag is set. By default, the flag is false.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message