hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2158) hdfsListDirectory in libhdfs does not scale
Date Tue, 06 Nov 2007 07:51:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540366
] 

Devaraj Das commented on HADOOP-2158:
-------------------------------------

So if the namenode is taking a long time to respond, and if the task is not  doing statusUpdates
in the interim, this could result in tasks timing out at the tasktracker on statusUpdates
and getting killed. Could this be connected to HADOOP-2076 in some way?

> hdfsListDirectory in libhdfs does not scale
> -------------------------------------------
>
>                 Key: HADOOP-2158
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2158
>             Project: Hadoop
>          Issue Type: Bug
>          Components: libhdfs
>    Affects Versions: 0.15.0
>            Reporter: Christian Kunz
>            Priority: Blocker
>         Attachments: 2158.patch
>
>
> hdfsListDirectory makes one rpc call using deprecated fs.FileSystem.listPaths, and then
two rpc calls for every entry in the returned array. When running a job with more than 3000
mappers each running a pipes application using libhdfs to scan a dfs directory with about
100-200 entries, this results in about 1M rpc calls to the namenode server overwhelming it.
> hdfsListDirectory should call fs.FileSystem.listStatus instead.
> I will submit a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message