Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 21D959074 for ; Wed, 15 Feb 2012 18:29:27 +0000 (UTC) Received: (qmail 72029 invoked by uid 500); 15 Feb 2012 18:29:26 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 71996 invoked by uid 500); 15 Feb 2012 18:29:26 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 71987 invoked by uid 99); 15 Feb 2012 18:29:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2012 18:29:26 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2012 18:29:21 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 418DE1B9D4A for ; Wed, 15 Feb 2012 18:29:00 +0000 (UTC) Date: Wed, 15 Feb 2012 18:29:00 +0000 (UTC) From: "Hudson (Commented) (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <1628115751.41724.1329330540269.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HADOOP-6502) DistributedFileSystem#listStatus is very slow when listing a directory with a size of 1300 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208642#comment-13208642 ] Hudson commented on HADOOP-6502: -------------------------------- Integrated in Hadoop-Mapreduce-trunk-Commit #1741 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1741/]) HADOOP-6502. Improve the performance of Configuration.getClassByName when the class is not found by caching negative results. Contributed by Sharad Agarwal and Todd Lipcon. (Revision 1244620) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1244620 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ReflectionUtils.java > DistributedFileSystem#listStatus is very slow when listing a directory with a size of 1300 > ------------------------------------------------------------------------------------------ > > Key: HADOOP-6502 > URL: https://issues.apache.org/jira/browse/HADOOP-6502 > Project: Hadoop Common > Issue Type: Bug > Components: util > Affects Versions: 0.20.0 > Reporter: Hairong Kuang > Assignee: Sharad Agarwal > Priority: Critical > Fix For: 0.24.0, 0.23.2 > > Attachments: 6502.patch, 6502_v2.patch, hadoop-6502-trunk.txt, hadoop-6502-trunk.txt > > > When listing a directory of around 1300 children, it takes hundreds of milliseconds. It turns out the slowdowness is caused by the change made by HADOOP-4187. The return value of listStatus is an array of FileStatus. When deserializing each element of the array, ReflectionUtils#newInstance(Class, Configuration) is called and then calls setConf, which calls setJobConf. SetJobConf checks if JobConf is on the class path by calling Configuration#getClassByName. Even though Configuration#getClassByName tries to optimize the lookup using a cached map, but since JobConf is not in the class path, so it is not in the cache. Every checkup ends up calling Class.ForName which is very expensive. Deserializing an array of 1300 entries requires calling of Class#ForName 1300 times! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira