harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiao-Feng Li <xiaofeng...@gmail.com>
Subject Re: [GSoC] Proposal of GC-1: implement WeakReference support in Harmony concurrent GC
Date Tue, 14 Apr 2009 03:56:26 GMT
Simon, here is my understanding.

GC has following things with weakref in SATB:
1. to stop tracing when meeting a weakref object, and remember it in a
GC-weakref-queue for later processing;
2. to catch the referent of a weakref get(), and remember it as a
dirty object (or a normal reachable object);
3. During marking, the remembered dirty objects should be traced as roots.

4. When marking finishes, GC need to scan the GC-weakref-queue to find
unreachable referents, etc. Those weakref's referent field is cleared
and passed to VM-weakref-queue for further processing. (VM processes
them by executing enqueue() method of them.)
5. to sweep.

Step 4 must be executed after marking finishes, but step 4 can be
executed concurrently with mutators, and the write barrier can be
turned off.

Note, the above doesn't include finalizer processing. Finalization
process needs to trace the finalizable objects as normal scanning,
which might have race condition with mutators. So write barrier should
be kept for it.

So in my opinion, step 4 can be conducted in STW minor with finalizer
processing. Then it will be very similar to what Harmony GC already
has.

For step 1, check scan_weak_reference()'s call sites and implementation;
For step 4, check collector_identify_finref()'s call sites and implementation.

Thanks,
xiaofeng

On Tue, Apr 14, 2009 at 10:33 AM, Simon Zhou <simon.harmony@gmail.com> wrote:
> After some investigations, I am wondering when to processing WeakRef for
> concurrent GC. my understanding is as follows
>
> IMHO, we can process them, also concurrently with mutators, when the
> concurrent marking finishes (for 2 SATB alogorithms). the advantage is that,
> the data structure is easy to implement, I guess.
>
> Another choice is to process them along with the marking phase, that is, we
> should construct sync queues for 3 kinds of reference object, the read
> barrier will enqueue and GC threads will dequeue. The WeakRef processing
> will finishes with the concurrent marking phase.
> For mostly concurrent algorithm, we can just process it in the short STW
> phase.
> Would you like to give some comments on this? Thank you very much!
>
> Thanks
> Simon
>
> 2009/4/1 Xiao-Feng Li <xiaofeng.li@gmail.com>
>
>> Simon, I've got some time reviewing your proposal. It looks very
>> feasible. Probably you'd better read my blog entry on WeakRef and
>> Finalizer processing of Harmony [1].
>>
>> One question, will you implement weakref support for all the three
>> concurrent GC algorithm?
>>
>> Btw, please go ahead to submit/update your proposal to the GSoC
>> application web.
>>
>> [1]
>> http://xiao-feng.blogspot.com/2007/05/weak-reference-processing-in-apache.html
>>
>> Thanks,
>> xiaofeng
>>
>> On Tue, Mar 31, 2009 at 6:06 PM, Simon Zhou <simon.harmony@gmail.com>
>> wrote:
>>  > Hi All,
>> > Here is my proposal, any suggestion is welcome!
>> >
>> > *Title/Summary:* GC-1: Implement WeakReference support in Harmony
>> Concurent
>> > GC
>> >
>> > *Student:* Simon.Zhou
>> >
>> > *Student e-mail:* simon.harmony@gmail.com
>> >
>> > *Student Major:* Computer Science
>> >
>> > *Student Degree:* Master
>> >
>> > *Student Graduation:* 2009.7
>> >
>> > *Organization:* Fudan University
>> >
>> > *Assigned Mentor:*
>> >
>> > *Abstract:*
>> >
>> > weak reference allows a program to maintain special reference to objects,
>> > which is a useful API for the program interacting with the garbage
>> > collector to some extent. Apache Harmony has a concurrent garbage
>> collector
>> > (code name: Tick) which can perform the garbage collection without
>> > suspending the application threads except for root set enumeration. Tick
>> do
>> > not have weak reference supporting now, it treats all the references
>> object
>> > just as  strong references, which may cause inefficiency for applications
>> > using weak references. In this project, I will add the weak reference
>> > support for Tick, which would be different from the implementation in
>> > generational GC, because the consistency should be maintained for the
>> heap
>> > objects carefully. Read barrier of the get() method of reference object
>> will
>> > be used for this implementation, and the performance issue will be
>> seriously
>> > considered. before implement phantom reference, I will also implement the
>> > finalizer feature for Tick, although it is not recommanded to be used
>> except
>> > it is necessary. Besides this, I am trying to reduce the floating garbage
>> > for Tick’s concurrent algorithms.
>> >
>> > *Detailed Description:*
>> >
>> > Tick is a concurrent GC of Harmony DRLVM, it can do the garbage
>> collection
>> > without suspending application threads. For concurrent GC, the most
>> > important issue that should be dealt with is the inconsistency problem,
>> that
>> > is if we trace the heap object graph while the mutator updating the same
>> > graph, inconsistency problem occurs, some live objects may be mistaken
>> > relaimed as garbage. In order to keep the consistency of objects, write
>> > barriers is inserted when concurrent tracing starts, which intercept all
>> the
>> > write operations of object reference and record useful data for
>> rescaning.
>> > Now there are 3 different concurrent collection algorithms in Tick: DLG,
>> > Sliding-View, Mostly-Concurrent; and 3 different write barriers are used
>> > repectively. see more information [1].
>> >
>> > Experiment result shows that Tick can significantly reduce or even
>> eliminate
>> > the pause time caused by garbage collection. However, one drawback of
>> Tick
>> > is that, it do not support weak reference mechanism, which is a important
>> > API of Java programming language.
>> >
>> > Weak reference is an useful tool for whom are much care about the
>> > application memory usage, e.g. it can help developers to cache some
>> > temporary large objects for later reusing; also, to do some cleanup
>> > operations just when a object is acturally relaimed. There are 3 kinds of
>> > weak reference class in Java language (the weakness is decreasingly):
>> soft
>> > reference, weak reference, phantom reference. Soft reference is usually
>> used
>> > to maintain better cache for large object, such as pictures load from
>> web;
>> > weak reference can help to reduce the frequency of creating short-live
>> > objects; phantom reference can usully help programmer to do cleanup
>> > operations just when an object is physically relaimed. (from now on, I
>> will
>> > use ‘weakref’ to represent the set of all the three kinds of weak
>> > references) All the three kinds of reference classes have a get() method,
>> if
>> > the object has not been actually relaimed, the get() method will return a
>> > reference of the referent object (referent object is the object which
>> > reference object point to) except phantom reference (phantom reference’s
>> > get() method always return null, since it is just used for cleanup
>> > operations when object is actually relaimed). see more information [2,3]
>> >
>> > Weakrefs are always processed in GC. For generational GC, weakref feature
>> > can be implemented easily: when minor collection occurs, weak reference
>> > objects are equeued and their referents are reclaimed. when major
>> collection
>> > occurs, soft reference objects are enqueued and their referents are
>> > reclaimed. when finalizer are processed, phantom reference objects are
>> > enqueued and their referents are reclaimed.
>> >
>> > Unlike generational GC, Tick just simply treats all the references in the
>> > heap as strong references, which make memory pressure become bigger for
>> the
>> > applications using much data structure based on weakrefs, such as
>> > WeakHashMap. In this project, my job is to implement this weakref feature
>> > for Tick, and this work should be independent to the concurrent
>> algorithms
>> > used in Tick.
>> >
>> > For concurrent GC, weakref processing is different with generational GC,
>> > because the tracing thread is running concurrently with application
>> threads.
>> > Application threads may invoke the get() method to get the reference of
>> > referent object, it may:
>> > 1, put it to the thread local stack as local var or calling parameter,
>> such
>> > as
>> >    Object localvar = weakref.get();
>> >    functionA(weakref.get());
>> > 2, assign it to another object slot, such as
>> >    obj.field = weakref.get();
>> > these operations will change the reachability of referent object, which
>> may
>> > cause the inconsistency issues in concurrent GC. In order to deal with
>> it,
>> > what we should do is to intercept these operations while application
>> thread
>> > is running. Because all these operations use the get() method to get the
>> > reference of referent object, inserting the read barrier in the get() is
>> a
>> > simple and effective approach. The main job of the read barrier is to add
>> > the pointer of referent object to a remembered set, then the garbage
>> > collection thread will pick it up and trace from it.
>> >
>> > I divided the work into 3 major steps:
>> >
>> > The first step is to implement the read barrier for get() method of
>> > reference object. My approch is implementing the barrier using a VMHelper
>> > method, which is a VMMagic based helper function writen in Java. That is,
>> > using Java to write some hot funtions in JVM, then these Java methods can
>> be
>> > compilered and even inlined into the native code produced by JIT compiler
>> > [4]. This is very useful approach to reduce the read barrier’s overhead.
>> In
>> > this step, I will write a Java version read barrier and add some code to
>> the
>> > JIT compiler (Jitrino and JET) to make it compile the barrier correctly
>> and
>> > effiently.
>> >
>> > The second step is to implement the weakref processing module in Tick.
>> This
>> > module will collect all the weakrefs while concurrent marking and put
>> them
>> > to different queues depending on its reference type. When concurrent
>> marking
>> > finishes, GC threads pick up the reference object in the queues to see if
>> > the referent object is marked, if so, the referent object is live and
>> should
>> > not be processed, otherwise, process the referent object according the
>> type
>> > of the reference object respectively. moreover, if the reference object
>> is
>> > intercepted by the read barrier in the get() method, its referent object
>> > should be marked while tracing, so the consistency is well maintained.
>> >
>> > After this step, I will test Tick with benchmark using weakrefs, or using
>> > data structure based on the weakrefs (such as WeakHashMap). The
>> experiment
>> > result will be analyzed and compared to Tick without weakrefs supporting
>> and
>> > generational GC with weakrefs supporting. Then I will form the result
>> data
>> > to a document report.
>> >
>> > The last step is trying to reduce the floating garabge [5] for concurrent
>> > GC. This work is not related to weakrefs feature, but its effect may be
>> > similar to it, since the major goal of reducing floating garbage is to
>> ease
>> > the memory pressure while using concurrent GC. Floating garbage always
>> exist
>> > and may not be totally eliminated in concurrent GC, my job is trying to
>> > optimize its impact as much as possible.
>> >
>> > *Additional Information:*
>> >
>> > Personal Information:
>> >  I am a master canidate of Computer Science in Parallel Processing
>> > Institution, Fudan University. My interested area includes Compiler,
>> > Computer Architecure, Managed Runtime etc. I started to follow Harmony
>> with
>> > interests 2 year ago, and I used to contribute to DRLVM and have submited
>> > some patches to the community. As an intern, I improved Harmony
>> concurrent
>> > GC by using state-based phase design and implementing an efficient
>> scheduler
>> > in the summer of 2008, the patch of this work has been merged to the
>> source
>> > trunk [6].
>> >
>> > Schedule:
>> >
>> >  I can work at least 3 days per week for this work, my schedule this
>> > project as follow:
>> >  April 10 ~ April 26 Communicate with the mentor and design the interface
>> > of the implementation, such as, the native interface of get() method,
>> weak
>> > reference processing interfaces in GC module. Write down the design
>> > document.
>> >
>> >  April 27 ~ May 24 implementing read barrier for get() method of
>> reference
>> > object.
>> >
>> >  May 25 ~ June 30 implementing the weak and soft reference processing
>> > features in GC module, add finalizer feature to Tick
>> >
>> >  July 1 ~ July 15 implementing phantom reference processing features.
>> >
>> >  July 16 ~ Aug 1 test and analyze Tick with weak reference related
>> > benchmark. write document for implementing and testing.
>> >
>> >  Aug 1 ~ Aug 15 try to improve Tick by reducing floating garbage.
>> >
>> > *References:*
>> >
>> > [1]
>> >
>> http://xiao-feng.blogspot.com/2007/03/comparison-between-snapshot-based-gc.html
>> > [2] http://www.realjenius.com/node/377
>> > [3] http://www.pawlan.com/Monica/refobjs/
>> > [4]
>> >
>> http://mail-archives.apache.org/mod_mbox/harmony-dev/200505.mbox/%3C4293E6BD.1060303@anu.edu.au%3E
>> > [5] http://www.memorymanagement.org/glossary/f.html#floating.garbage
>> > [6] http://issues.apache.org/jira/browse/HARMONY-5989
>> >
>> > --
>> > From : Simon.Zhou@PPI, Fudan University
>> >
>>
>>
>>
>>  --
>> http://people.apache.org/~xli
>>
>
>
>
> --
> From : Simon.Zhou@PPI, Fudan University
>



-- 
http://people.apache.org/~xli

Mime
View raw message