harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao-Feng Li" <xiaofeng...@gmail.com>
Subject Re: [harmony-gc-5] updates on Parrot GC [was Re: Comparison of Harmony GC_Gen vs. Parrot]
Date Mon, 14 Apr 2008 22:43:42 GMT
On Tue, Apr 15, 2008 at 4:23 AM, Alexei Fedotov
<alexei.fedotov@gmail.com> wrote:
> Hello Senaka,
>  That's a good progress in understanding how Parrot is built.
>
>  Generally, I would suggest keeping build systems of Parrot and GC as
>  is, with minimal changes. For example, you may try the following:
>
>  1. Build Parrot as is using a Parrot build system..
>  2. Build GC DLL using DRLVM build system.
>  3. Copy GC DLL to Parrot.
>
>  You may want to ask few questions:
>  1. How Parrot would know about this new DLL? You will need to change
>  Parrot command line parsing to understand a new option.
>  2. How it would know which functions to call to collect garbage?
>  <answer mentions header files>
>  3. How it would be possible to maintain all these changes in order?
>  <answer was in the correspondence>
>  4. Etc...

Good suggestions. We need know if Parrot can connect with a DLL GC. If
it can't, probably it's good for it to improve. To use DLL GC will
force the Parrot VM to consider modularity and interface clearly.

Btw, I had read the doc of Parrot GC. Algorithm-wise, I guess it is a
single-thread (sequential) stop-the-world mark-sweep GC. And I guess
it is trying to add incremental mark-sweep to it.

PMC looks like objects, and Buffer looks like variable-length stings.
The doc says PMC may contain PMC and Buffer, while Buffer contains
nothing. I don't know if the "contain" here means "reference".

To make GC into DLL, somebody really needs to answer following VM <->
GC interactions in Parrot as in Harmony:

Mainly the followings are agreed between VM and GC
1. Partially revealed obj and vtable definitions (object layout)
2. Object header bits left for GC usage
3. GC ↔ VM interfaces in open/gc.h, vm_gc.h
4. GC asks VM to suspend/resume mutators, Include GC safe-point support in VM
5. GC asks VM to enumerate root references, Include stack frame
unwinding support in VM
6. Misc (not critical): finalizer/weakref, class unloading, etc.

Basically they tell how GC works in the system
1. How VM asks GC to allocate objects
2. How VM triggers collection
3. How GC asks VM to suspend mutators
4. How GC asks VM to enumerate root references
5. How GC traces object connection graph

These are the key points for GC porting or developing

Thanks,
xiaofeng

>  Thanks.
>
>
>
>
>  On Tue, Apr 15, 2008 at 12:09 AM, Senaka Fernando <senakafdo@gmail.com> wrote:
>  > Hi all,
>  >
>  >  I have finally made it possible to build Parrot in a C++ environment. And,
>  >  have managed to uncover most differences between Parrot's C and C++ build
>  >  streams. I have also contacted Mark (from Parrot) regarding being my
>  >  co-mentor and he's interested. I also do get a great deal of support from
>  >  the Parrot community. The Parrot developer meeting will be held tomorrow at
>  >  18.30 GMT. I hope that there would be a discussion on the GC.
>  >
>  >  Regards,
>  >  Senaka
>  >
>  >  On Sun, Apr 13, 2008 at 5:41 PM, Senaka Fernando <senakafdo@gmail.com>
>  >
>  >
>  > wrote:
>  >
>  >  > Hi Xiao-Feng,
>  >  >
>  >  > Here is a detailed answer to your questions. I have copy-pasted some info
>  >  > from Parrot documentation on GC.
>  >  >
>  >  > Objects on the heap are laid out as PMCs (PObjects), and buffers.
>  >  > Allocation is pool based. The Arenas structure holds function pointers for
>  >  > the core defined interface of the currently active GC subsystem:
>  >  > "init_pool", "do_gc_mark",        "finalize_gc_system". It holds various
>  >  > accounting information for the GC subsystem, including how many GC runs have
>  >  > been completed, amount of memory allocated since the last run, and total
>  >  > memory allocated. This accounting information is updated by the GC system.
>  >  > The current block level for GC mark and sweep phases is stored in the Arenas
>  >  > structure.
>  >  >
>  >  > The Memory_Pool structure is a simple memory pool. It contains a pointer
>  >  > to the top block of the allocated pool, the total allocated size of the
>  >  > pool, the block size, and some details on the reclamation characteristics
of
>  >  > the pool.
>  >  >
>  >  > The Small_Object_Pool structure is a richer memory pool for object
>  >  > allocation.  It tracks details like the number of allocated and free objects
>  >  > in the pool, a list of free objects, and for the generational GC
>  >  > implementation maintains linked lists of white, black, and gray PMCs. It
>  >  > contains a pointer to a simple Memory_Pool (the base storage of the pool).
>  >  > It holds function pointers for adding and retrieving free objects in the
>  >  > pool, and for allocating objects.
>  >  >
>  >  > Each PMC/Buffer will contain a set of flags that will govern the behavior
>  >  > and state of it in the presence of the GC.
>  >  > Ex:- PObj_active_destroy_FLAG, PObj_data_is_PMC_array_FLAG, PObj_live_FLAG
>  >  > etc.
>  >  >
>  >  > Each PObject has a header which is of type UnionVal, a union of various
>  >  > fields, in addition to flags. A PMC has a Vtable in it. Thus, each allocated
>  >  > object will have header info within it.
>  >  >
>  >  > Each GC core defines 4 function pointers stored in the Small_Object_Pool
>  >  > structures. One to allocate new objects, another to add a freed object to
>  >  > the free list, another to retrieve a free object from the free list and one
>  >  > to reallocate for additional objects. If a Small_Object_Pool is full (no
>  >  > free objects) a new one will needed to be created. Thus, each object on a
>  >  > small object pool is like a place holder for a new instance.
>  >  >
>  >  > Heap is laid out as arenas, having two memory pools and six small object
>  >  > pools.
>  >  >
>  >  > There are two marking phases, for PMCs
>  >  >
>  >  > 1. Initial Marking
>  >  > Each PMC has a "flags" member which, among other things, facilitates
>  >  > garbage collection. At the beginning of the mark phase, the
>  >  > "PObj_is_live_FLAG" and "PObj_is_fully_marked_FLAG" are both unset, which
>  >  > flags the PMC as presumed dead (white). The initial mark phase of the
>  >  > collection cycle goes through each PMC in the root set and sets the
>  >  > Obj_is_live_FLAG" bit in the "flags" member (the PMC is gray).  It does not
>  >  > set the "PObj_is_fully_marked_FLAG" bit (changing the PMC to black), because
>  >  > in the initial mark, the PMCs or buffers contained by a PMC are not marked.
>  >  > It also appends the PMC to the end of a list used for further marking.
>  >  > However, if the PMC has already been marked as black, the current end of
>  >  > list is returned (instead of appending the already processed PMC) to prevent
>  >  > endless looping.
>  >  >
>  >  > 2. Incremental Marking
>  >  > After the root set of PMCs have been marked, a series of incremental mark
>  >  > runs are performed. These may be performed frequently, between other
>  >  > operations.  The incremental mark runs work to move gray PMCs to black. They
>  >  > take a PMC from the list for further marking, mark any PMCs or buffers it
>  >  > contains as gray (the "PObj_is_live_FLAG" is set and the
>  >  > "PObj_is_fully_marked_FLAG" is left unset), and add the contained PMCs or
>  >  > buffers to the list for further marking.  If the PMC has a custom mark
>  >  > function in its vtable, it is called at this point.
>  >  >
>  >  > For Buffers, no incremental marking is involved.
>  >  > The initial marking phase also marks the root set of buffers. Because
>  >  > buffers cannot contain other buffers, they are immediately marked as black
>  >  > and not added to the list for further marking. Because PMCs may contain
>  >  > buffers, the buffer collection phase can't run until the incremental marking
>  >  > of PMCs is completed.
>  >  >
>  >  > When the list for further marking is empty (all gray PMCs have changed to
>  >  > black), the collection stage is started. First, PMCs are collected, followed
>  >  > by buffers. In both cases (PMC and buffer), the "live" and "fully_marked"
>  >  > flags are reset after examination for reclamation.
>  >  >
>  >  > To collect PMCs, each PMC arena is examined from the most recently created
>  >  > backwards.  Each PMC is examined to see if it is live, already on the free
>  >  > list, or constant.  If it is not, then it is added to the free list and
>  >  > marked as being on the free list with the "PObj_on_free_list_FLAG".
>  >  >
>  >  > To collect buffers, each Buffer arena is examined from the most recently
>  >  > created backwards.  If the buffer is not live, not already on the free list
>  >  > and it is not a constant or copy on write, then it is added to the free pool
>  >  > for reuse and marked with the "PObj_on_free_list_FLAG".
>  >  >
>  >  > Thus, the objects are scanned during the mark phase and then identified as
>  >  > live, and collected. The collection process is triggered after the marking
>  >  > is complete.
>  >  >
>  >  > Allocation of objects is handled by pool structures.
>  >  >
>  >  > I can relate the necessary source code portions to this discussion if
>  >  > required. However, some of the above mentioned features are not fully
>  >  > implemented. But, according to the Parrot community, this will be their
>  >  > future direction.
>  >  >
>  >  > I also have fixed most build errors with Parrot & C++ and they are really
>  >  > happy about it. Now am in the process of resolving some linking conflicts
on
>  >  > Parrot's C++ build.
>  >  >
>  >  > Regards,
>  >  > Senaka
>  >  >
>  >  > On Mon, Apr 7, 2008 at 10:25 AM, Senaka Fernando <senakafdo@gmail.com>
>  >  > wrote:
>  >  >
>  >  > > Hi Xiao-Feng,
>  >  > >
>  >  > > Thanks for these questions. I believe that they'd be really helpful
in
>  >  > > understanding VM <-> GC assumptions on the Parrot end.
>  >  > >
>  >  > > Will work on answering these, and perhaps a comparison with Harmony.
>  >  > >
>  >  > > Regards,
>  >  > > Senaka
>  >  > >
>  >  > >
>  >  > > On Mon, Apr 7, 2008 at 5:11 AM, Xiao-Feng Li <xiaofeng.li@gmail.com>
>  >  > > wrote:
>  >  > >
>  >  > > > Senaka, thanks for the page. I think the most important things
are
>  >  > > > related to the VM <-> GC protocol. Some questions that may
help you:
>  >  > > > 1. How Parrot layout/encode an object/array? fields, size, object
>  >  > > > header info, etc.
>  >  > >
>  >  > >
>  >  > > > 2. How Parrot layout/arrange the heap? free list? pages?
>  >  > > > 3. What's the process of an object creation? When and how?
>  >  > > > 4. How is a collection process triggered?
>  >  > > > 5. How does Parrot GC trace live objects and collect them?
>  >  > > >
>  >  > > > Some of the questions above might be GC internals, so it's more
>  >  > > > desirable if you can understand the Parrot VM's assumptions on
GC.
>  >  > > > I.e., does it assume the heap is laid out in certain way, does
it
>  >  > > > assume the objects are encoded in certain way, does it assume the
>  >  > > > roots are enumerated in certain way, etc.? Depending on your progress,
>  >  > > > more details might be needed later on.
>  >  > > >
>  >  > > > For this project, you might have to understand the Parrot current
>  >  > > > status for the above questions. It helps you and us to identify
the
>  >  > > > key issues to resolve, and the main efforts to be focused on. For
GC
>  >  > > > porting over different VMs, it's not like porting an application
over
>  >  > > > different OSes, because of the implicit assumptions between VM
and GC.
>  >  > > > I would expect some redesign work required for GC porting, hence
you
>  >  > > > have to understand the Parrot design in certain depth.
>  >  > > >
>  >  > > > Thanks,
>  >  > > > xiaofeng
>  >  > > >
>  >  > > > On Mon, Apr 7, 2008 at 4:26 AM, Alexei Fedotov <
>  >  > > > alexei.fedotov@gmail.com> wrote:
>  >  > > > > Good job, Senaka.
>  >  > > > >
>  >  > > > >  The general perception was that internal and external GC
interfaces
>  >  > > > >  were mixed, which maked this document less usable for harmony-gc-5
>  >  > > > >  project than it could be. For example, sweeping, marking
and
>  >  > > > >  reclaiming are internal interfaces while allocation, stack
>  >  > > > enumeration
>  >  > > > >  (please take a look at vm_enumerate_root_set_all_threads)
and gc
>  >  > > > >  invocation are external interfaces. We should pay more attention
to
>  >  > > > >  external interfaces for harmony-gc-5 project.
>  >  > > > >
>  >  > > > >  Thanks.
>  >  > > > >
>  >  > > > >
>  >  > > > >
>  >  > > > >
>  >  > > > >
>  >  > > > >
>  >  > > > >
>  >  > > > >  On Sun, Apr 6, 2008 at 8:35 PM, Senaka Fernando <
>  >  > > > senakafdo@gmail.com> wrote:
>  >  > > > >  >
>  >  > > > >  > Hi all,
>  >  > > > >  >
>  >  > > > >  >  I have almost finished comparing the two interfaces
of Harmony
>  >  > > > and Parrot.
>  >  > > > >  >  However, I'm not 100% sure on whether I got everything
right,
>  >  > > > but I believe
>  >  > > > >  >  that most of it is. Therefore, it would be really great
if you
>  >  > > > could review
>  >  > > > >  >  the wiki page and let me know whether it is correct
and precise.
>  >  > > > >  >
>  >  > > > >  >  Sections marked as TBD are not yet finalized. And,
I have
>  >  > > > omitted some
>  >  > > > >  >  instances of where Harmony supports something and Parrot
doesn't
>  >  > > > for
>  >  > > > >  >  simplicity.
>  >  > > > >  >
>  >  > > > >  >  The wiki page is found at [1]
>  >  > > > >  >
>  >  > > > >  >  [1]
>  >  > > > http://wiki.apache.org/harmony/gc_comparison/gc_gen_harmony_vs_parrot
>  >  > > > >  >
>  >  > > > >  >  Regards,
>  >  > > > >  >  Senaka
>  >  > > > >  >
>  >  > > > >
>  >  > > > >
>  >  > > > >
>  >  > > > >  --
>  >  > > > >  With best regards,
>  >  > > > >  Alexei
>  >  > > > >
>  >  > > >
>  >  > > >
>  >  > > >
>  >  > > > --
>  >  > > > http://xiao-feng.blogspot.com
>  >  > > >
>  >  > >
>  >  > >
>  >  >
>  >
>
>
>
>  --
>  With best regards,
>  Alexei
>



-- 
http://xiao-feng.blogspot.com
Mime
View raw message