Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 84284 invoked from network); 16 Feb 2007 19:23:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Feb 2007 19:23:27 -0000 Received: (qmail 60278 invoked by uid 500); 16 Feb 2007 19:23:34 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 60248 invoked by uid 500); 16 Feb 2007 19:23:34 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 60236 invoked by uid 99); 16 Feb 2007 19:23:34 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Feb 2007 11:23:34 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Feb 2007 11:23:26 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id B08DD7141B8 for ; Fri, 16 Feb 2007 11:23:05 -0800 (PST) Message-ID: <10126233.1171653785719.JavaMail.jira@brutus> Date: Fri, 16 Feb 2007 11:23:05 -0800 (PST) From: "Ron Bodkin (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1017) Optimization: Reduce Overhead from ReflectionUtils.newInstance In-Reply-To: <8341615.1171393805599.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473794 ] Ron Bodkin commented on HADOOP-1017: ------------------------------------ How about using a concurrent map if one is available (on Java 5+ or if the concurrent backport is on the classpath), but falling back to a synchronized map if one is not? I've implemented code like that before (in our environment we run single threaded Hadoop jobs so I wasn't aware of the need for thread safety). You are right about the values having a reference back to the Class -I think making the map just a HashMap is probably the right approach, since in most programs there would be only a handful and the classes won't need to be gc'd anyhow. it would be possible to make the values SoftReferences instead to allow collecting Classes but to make the cache less likely to lose useful data. The cache member certainly could be final and named CACHE, good idea. > Optimization: Reduce Overhead from ReflectionUtils.newInstance > -------------------------------------------------------------- > > Key: HADOOP-1017 > URL: https://issues.apache.org/jira/browse/HADOOP-1017 > Project: Hadoop > Issue Type: Improvement > Components: util > Reporter: Ron Bodkin > Attachments: cacheCtor.patch, ReflectionUtils.patch.txt, TestReflectionUtils.java > > > I found that a significant amount of time on my project was being spent in creating constructors for each row of data. I dramatically optimized this performance by creating a simple WeakHashMap to cache constructors by class. For example, in a sample job I find that ReflectionUtils.newInstance takes 200 ms (2% of total) with the cache enabled, but it uses 900 ms (6% of total) without the cache. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.