hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ron Bodkin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1017) Optimization: Reduce Overhead from ReflectionUtils.newInstance
Date Fri, 16 Feb 2007 19:23:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473794

Ron Bodkin commented on HADOOP-1017:

How about using a concurrent map if one is available (on Java 5+ or if the concurrent backport
is on the classpath), but falling back to a synchronized map if one is not? I've implemented
code like that before (in our environment we run single threaded Hadoop jobs so I wasn't aware
of the need for thread safety).

You are right about the values having a reference back to the Class -I think making the map
just a HashMap is probably the right approach, since in most programs there would be only
a handful and the classes won't need to be gc'd anyhow. it would be possible to make the values
SoftReferences instead to allow collecting Classes but to make the cache less likely to lose
useful data.

The cache member certainly could be final and named CACHE, good idea.

> Optimization: Reduce Overhead from ReflectionUtils.newInstance
> --------------------------------------------------------------
>                 Key: HADOOP-1017
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1017
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: util
>            Reporter: Ron Bodkin
>         Attachments: cacheCtor.patch, ReflectionUtils.patch.txt, TestReflectionUtils.java
> I found that a significant amount of time on my project was being spent in creating constructors
for each row of data. I dramatically optimized this performance by creating a simple WeakHashMap
to cache constructors by class. For example, in a sample job I find that ReflectionUtils.newInstance
takes 200 ms (2% of total) with the cache enabled, but it uses 900 ms (6% of total) without
the cache.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message