xmlgraphics-fop-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Xmlgraphics-fop Wiki] Update of "PropertyHandling/PropertyCache" by AndreasDelmelle
Date Sun, 11 May 2008 10:10:06 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Xmlgraphics-fop Wiki" for change notification.

The following page has been changed by AndreasDelmelle:
http://wiki.apache.org/xmlgraphics-fop/PropertyHandling/PropertyCache

------------------------------------------------------------------------------
  
  and the generated instances only have to be moved from the stack to the heap if they are
really ''new'' instances to be added to the map, the involved overhead should be minimal in
a language like Java, especially with modern VMs. The well-known warning about instantiation
rates in the early days now more apply to circumstances where one would trigger a chain of
constructors with every instantion (e.g. if the second assignment would be {{{this.referenceValue
= new OtherClass();}}}).
  
- 
- == Structure ==
- 
- As the number of properties to which this approach was applied, began to grow, the decision
was made to implement a dedicated class that could be used by any of the {{{Property}}} classes.
The concern for multi-session environments was first taken into account by wrapping the {{{WeakHashMap}}}
in a standard synchronized map. The {{{PropertyCache}}} itself would only have one or more
{{{fetch(Object obj)}}} methods as public access points, that took care of the operations
that were initially performed within the {{{Property}}} subclasses (see above), so the idiom
had become simply: 
+ As the number of objects to which this approach was applied, began to grow, the decision
was made to implement a dedicated class that could be used by any of the {{{Property}}} classes.
The concern for multi-session environments was first taken into account by wrapping the {{{WeakHashMap}}}
in a standard synchronized map. The {{{PropertyCache}}} itself would only have one or more
{{{fetch(Object obj)}}} methods as public access points, that took care of the operations
that were initially performed within the {{{Property}}} subclasses (see above), so the idiom
had become simply: 
  
  {{{
  public static Property getInstance(int someInt, Object someObj) {
@@ -58, +55 @@

  
  In the meantime development went further on the trunk. After reading some more documentation
on multi-threading in Java and studying the concepts of concurrency and contention, it became
obvious that there was a possible performance bottleneck involved in using {{{Collections.synchronizedMap();}}}.
Java 1.5 has its own {{{java.util.concurrent}}} package, but FOP still targets 1.4 environments,
so those could not be used. Even then, the {{{util.concurrent}}} package only offers a {{{ConcurrentHashMap}}},
no variant using {{{WeakReference}}}s, which would be needed to implement a cache that is
shared by many threads. Besides that, a {{{Map}}} always contains ''mappings''. That is: every
entry has a key ''and'' a value. But FOP needed only a weakly referenced key...
  
+ == Structure ==
+ 
  The following article at IBM gave a starting point for building our very own cache: 
  http://www.ibm.com/developerworks/java/library/j-jtp08223/index.html
  
+ As pointed out at the start of the article, this is not for the faint of heart. Maybe we
shouldn't have tried it at home, but we did anyway.
+ 
+ Thinking it over a bit more, it became apparent that the cache did not need to expose a
public {{{remove()}}} operation at all, which already made it a bit simpler. It didn't need
to expose anything but the mentioned {{{fetch()}}}. Since the approach was based on {{{WeakReference}}},
all we needed to do was make sure the map never grew unnecessarily large. The weak references
would be cleared automatically. The reference objects themselves were removed from the cache
upon each {{{put()}}}. Initially, rather audaciously, this was done in separate threads (as
if implementing our own {{{ConcurrentWeakHashSet}}} was not yet enough... ;-)) Pretty soon,
it became obvious that also using threading internally in the map was taking it one step too
far. The cleanup cycle was moved, the threading abandoned, and a {{{ReferenceQueue}}} introduced.
+ 
+ The private {{{cleanSegment()}}} method is triggered upon each {{{put()}}}, but only if
the map segment in question grows beyond twice the number of buckets in the map. Starting
with a map of 8 buckets, the cleanup will be attempted every time a map segment grows beyond
16 elements.
+ 
+ A segment is simply an object on which can be synchronized to lock a portion of the buckets
(see also the IBM article), and whose element-count corresponds to the total for all those
buckets. That element-count is updated with each {{{put()}}}, and during the cleanup. At first,
each segment corresponds to 1 bucket, since the map only contains 8, but once the number of
buckets grows beyond 32, there will be multiple buckets per segment.
+ 
+ The cleanup does no more than check the reference queue of the segment in question for stale
entries, and if there are any, removes the corresponding references from the cache.
+ Upon return of the cleanup, there is a check on whether it had any effect, and the number
of elements for the segment has decreased. If not, then a boolean switch is set, indicating
that the map segment in question votes for a redistribution of the elements. If more than
8 of the map segments agree, a {{{rehash()}}} is performed, which doubles the amount of buckets
to 16. Since each bucket is its own segment, the first rehash will only be triggered if all
of the buckets grow beyond size 16. So in that case, the cache would contain more than 128
elements. The second rehash then, will be triggered once 8 of the 16 buckets grows beyond
size 32, or a total size of over 256 elements. The third when 8 of 32 grow over 64, so 512
elements. The fourth when 8 of 32 grow over 128, or 1024.
+ 
+ Note that 1024 property instances corresponding to distinct values is in most general use-cases
already more than sufficient to cover the distinct values for all those properties ''together''.
The caches are divided over the applicable {{{Property}}} subclasses, so it will likely never
get to a fifth, which would mean 2K distinct values for one property type.
+ 
+ Since a rehash needs exclusive access to the map, this is done by recursively obtaining
locks on all of the segments. The initial entry of the {{{rehash()}}} method obtains the first
lock, then calls itself, and does this until all locks are acquired. From that point on, no
other thread can get access to any of the segments, and will have to wait until the rehash
is completed. That is: each thread that would need to put a new instance in the cache. Calls
to {{{get()}}} can still be served, if a corresponding instance already exists, since it first
tries the usual way, without synchronization.
+ 
+ The {{{fetch()}}} method has been exposed in such a way that it only has public entry points
for those types useful in this context, mostly for convenience. Calls to those are routed
to the private generic method by casting the parameter instance to {{{Object}}}, which triggers
a {{{get()}}} on the internal map. If there is already a weak reference to an equal instance,
we get a hard reference to that one.
+ 
+ The approach has been applied to a number of {{{Property}}} subtypes, roughly corresponding
to those types of property values that can be resolved at parse-time (absolute lengths, strings,
numbers, enums...), and to property-bundles consisting solely of such properties ({{{CommonFont}}}
and {{{CommonHyphenation}}}).
+ 
+ (to be continued)
+ 

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-commits-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-commits-help@xmlgraphics.apache.org


Mime
View raw message