harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Zhou <simon.harm...@gmail.com>
Subject [GSoC] Proposal of GC-1: implement WeakReference support in Harmony concurrent GC
Date Tue, 31 Mar 2009 10:06:57 GMT
Hi All,
Here is my proposal, any suggestion is welcome!

*Title/Summary:* GC-1: Implement WeakReference support in Harmony Concurent
GC

*Student:* Simon.Zhou

*Student e-mail:* simon.harmony@gmail.com

*Student Major:* Computer Science

*Student Degree:* Master

*Student Graduation:* 2009.7

*Organization:* Fudan University

*Assigned Mentor:*

*Abstract:*

weak reference allows a program to maintain special reference to objects,
which is a useful API for the program interacting with the garbage
collector to some extent. Apache Harmony has a concurrent garbage collector
(code name: Tick) which can perform the garbage collection without
suspending the application threads except for root set enumeration. Tick do
not have weak reference supporting now, it treats all the references object
just as  strong references, which may cause inefficiency for applications
using weak references. In this project, I will add the weak reference
support for Tick, which would be different from the implementation in
generational GC, because the consistency should be maintained for the heap
objects carefully. Read barrier of the get() method of reference object will
be used for this implementation, and the performance issue will be seriously
considered. before implement phantom reference, I will also implement the
finalizer feature for Tick, although it is not recommanded to be used except
it is necessary. Besides this, I am trying to reduce the floating garbage
for Tick’s concurrent algorithms.

*Detailed Description:*

Tick is a concurrent GC of Harmony DRLVM, it can do the garbage collection
without suspending application threads. For concurrent GC, the most
important issue that should be dealt with is the inconsistency problem, that
is if we trace the heap object graph while the mutator updating the same
graph, inconsistency problem occurs, some live objects may be mistaken
relaimed as garbage. In order to keep the consistency of objects, write
barriers is inserted when concurrent tracing starts, which intercept all the
write operations of object reference and record useful data for rescaning.
Now there are 3 different concurrent collection algorithms in Tick: DLG,
Sliding-View, Mostly-Concurrent; and 3 different write barriers are used
repectively. see more information [1].

Experiment result shows that Tick can significantly reduce or even eliminate
the pause time caused by garbage collection. However, one drawback of Tick
is that, it do not support weak reference mechanism, which is a important
API of Java programming language.

Weak reference is an useful tool for whom are much care about the
application memory usage, e.g. it can help developers to cache some
temporary large objects for later reusing; also, to do some cleanup
operations just when a object is acturally relaimed. There are 3 kinds of
weak reference class in Java language (the weakness is decreasingly): soft
reference, weak reference, phantom reference. Soft reference is usually used
to maintain better cache for large object, such as pictures load from web;
weak reference can help to reduce the frequency of creating short-live
objects; phantom reference can usully help programmer to do cleanup
operations just when an object is physically relaimed. (from now on, I will
use ‘weakref’ to represent the set of all the three kinds of weak
references) All the three kinds of reference classes have a get() method, if
the object has not been actually relaimed, the get() method will return a
reference of the referent object (referent object is the object which
reference object point to) except phantom reference (phantom reference’s
get() method always return null, since it is just used for cleanup
operations when object is actually relaimed). see more information [2,3]

Weakrefs are always processed in GC. For generational GC, weakref feature
can be implemented easily: when minor collection occurs, weak reference
objects are equeued and their referents are reclaimed. when major collection
occurs, soft reference objects are enqueued and their referents are
reclaimed. when finalizer are processed, phantom reference objects are
enqueued and their referents are reclaimed.

Unlike generational GC, Tick just simply treats all the references in the
heap as strong references, which make memory pressure become bigger for the
applications using much data structure based on weakrefs, such as
WeakHashMap. In this project, my job is to implement this weakref feature
for Tick, and this work should be independent to the concurrent algorithms
used in Tick.

For concurrent GC, weakref processing is different with generational GC,
because the tracing thread is running concurrently with application threads.
Application threads may invoke the get() method to get the reference of
referent object, it may:
1, put it to the thread local stack as local var or calling parameter, such
as
    Object localvar = weakref.get();
    functionA(weakref.get());
2, assign it to another object slot, such as
    obj.field = weakref.get();
these operations will change the reachability of referent object, which may
cause the inconsistency issues in concurrent GC. In order to deal with it,
what we should do is to intercept these operations while application thread
is running. Because all these operations use the get() method to get the
reference of referent object, inserting the read barrier in the get() is a
simple and effective approach. The main job of the read barrier is to add
the pointer of referent object to a remembered set, then the garbage
collection thread will pick it up and trace from it.

I divided the work into 3 major steps:

The first step is to implement the read barrier for get() method of
reference object. My approch is implementing the barrier using a VMHelper
method, which is a VMMagic based helper function writen in Java. That is,
using Java to write some hot funtions in JVM, then these Java methods can be
compilered and even inlined into the native code produced by JIT compiler
[4]. This is very useful approach to reduce the read barrier’s overhead. In
this step, I will write a Java version read barrier and add some code to the
JIT compiler (Jitrino and JET) to make it compile the barrier correctly and
effiently.

The second step is to implement the weakref processing module in Tick. This
module will collect all the weakrefs while concurrent marking and put them
to different queues depending on its reference type. When concurrent marking
finishes, GC threads pick up the reference object in the queues to see if
the referent object is marked, if so, the referent object is live and should
not be processed, otherwise, process the referent object according the type
of the reference object respectively. moreover, if the reference object is
intercepted by the read barrier in the get() method, its referent object
should be marked while tracing, so the consistency is well maintained.

After this step, I will test Tick with benchmark using weakrefs, or using
data structure based on the weakrefs (such as WeakHashMap). The experiment
result will be analyzed and compared to Tick without weakrefs supporting and
generational GC with weakrefs supporting. Then I will form the result data
to a document report.

The last step is trying to reduce the floating garabge [5] for concurrent
GC. This work is not related to weakrefs feature, but its effect may be
similar to it, since the major goal of reducing floating garbage is to ease
the memory pressure while using concurrent GC. Floating garbage always exist
and may not be totally eliminated in concurrent GC, my job is trying to
optimize its impact as much as possible.

*Additional Information:*

Personal Information:
  I am a master canidate of Computer Science in Parallel Processing
Institution, Fudan University. My interested area includes Compiler,
Computer Architecure, Managed Runtime etc. I started to follow Harmony with
interests 2 year ago, and I used to contribute to DRLVM and have submited
some patches to the community. As an intern, I improved Harmony concurrent
GC by using state-based phase design and implementing an efficient scheduler
in the summer of 2008, the patch of this work has been merged to the source
trunk [6].

Schedule:

  I can work at least 3 days per week for this work, my schedule this
project as follow:
  April 10 ~ April 26 Communicate with the mentor and design the interface
of the implementation, such as, the native interface of get() method, weak
reference processing interfaces in GC module. Write down the design
document.

  April 27 ~ May 24 implementing read barrier for get() method of reference
object.

  May 25 ~ June 30 implementing the weak and soft reference processing
features in GC module, add finalizer feature to Tick

  July 1 ~ July 15 implementing phantom reference processing features.

  July 16 ~ Aug 1 test and analyze Tick with weak reference related
benchmark. write document for implementing and testing.

  Aug 1 ~ Aug 15 try to improve Tick by reducing floating garbage.

*References:*

[1]
http://xiao-feng.blogspot.com/2007/03/comparison-between-snapshot-based-gc.html
[2] http://www.realjenius.com/node/377
[3] http://www.pawlan.com/Monica/refobjs/
[4]
http://mail-archives.apache.org/mod_mbox/harmony-dev/200505.mbox/%3C4293E6BD.1060303@anu.edu.au%3E
[5] http://www.memorymanagement.org/glossary/f.html#floating.garbage
[6] http://issues.apache.org/jira/browse/HARMONY-5989

-- 
>From : Simon.Zhou@PPI, Fudan University

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message