hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henning Blohm <henning.bl...@zfabrik.de>
Subject Problem with org.apache.hadoop.conf.Configuration.REGISTRY
Date Thu, 14 Oct 2010 08:34:05 GMT
This is a follow-up on a HBase mailing list discussion:


When reusing Configuration that has an added addResource(InputStream) a
reload of configuration will fail as the stream
has been read before.

The reload gets triggered when addDefaultResource is called. That method
uses the REGISTRY static WeakHashMap to
reach out to all reachable Configuration instances to reset their

The method addDefaultResource is called by e.g. ConfigUtil in
org.apache.hadoop.mapreduce.util (hadoop trunk) or JobConf (hadoop

The problem has been observed in Hadoop 0.20.2 but the code in trunk has
essentially the same structure.

There are a few problems here:

1. You cannot safely use addResource(InputStream), if Configuration
objects are to be re-used (you can however use addResource(URL) instead)

2. Modifying the state of Configuration instances at some later point in
time as a side effect of some class initialization in some completely
unrelated thread 
leads to unpredictable behavior (properties change under the hood)

3. Configuration instances keep context classloaders to find resources.
After redeployment these may not be "valid" anymore. As long as the
Configuration instance has not been collected, 
addDefaultResource will still invoke reloadConfiguration on them. While
that is harmless today (only resetting members), this looks like a
ticking time bomb.


Define all default resources in Configuration once. Do not hold on to
other configuration instances and do not modify their state as a side
effect of some other activity.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message