hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-6502) DistributedFileSystem#listStatus is very slow when listing a directory with a size of 1300
Date Tue, 14 Feb 2012 06:26:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207544#comment-13207544
] 

Hadoop QA commented on HADOOP-6502:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12514444/hadoop-6502-trunk.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit
warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/595//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/595//console

This message is automatically generated.
                
> DistributedFileSystem#listStatus is very slow when listing a directory with a size of
1300
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6502
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6502
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.20.0
>            Reporter: Hairong Kuang
>            Assignee: Todd Lipcon
>            Priority: Critical
>         Attachments: 6502.patch, 6502_v2.patch, hadoop-6502-trunk.txt, hadoop-6502-trunk.txt
>
>
> When listing a directory of around 1300 children, it takes hundreds of milliseconds.
It turns out the slowdowness is caused by the change made by HADOOP-4187. The return value
of listStatus is an array of FileStatus. When deserializing each element of the array, ReflectionUtils#newInstance(Class<T>,
Configuration) is called and then calls setConf, which calls setJobConf. SetJobConf checks
if JobConf is on the class path by calling Configuration#getClassByName. Even though Configuration#getClassByName
tries to optimize the lookup using a cached map, but since JobConf is not in the class path,
so it is not in the cache. Every checkup ends up calling Class.ForName which is very expensive.
Deserializing an array of 1300 entries requires calling of Class#ForName 1300 times!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message