harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robin Garner" <robin.gar...@anu.edu.au>
Subject Re: [DRLVM][JIT] write barrier broken by new jit opts?
Date Fri, 12 Jan 2007 06:44:19 GMT
> On the 0x25B day of Apache Harmony Mikhail Fursov wrote:
>> On 11 Jan 2007 17:26:13 +0600, Egor Pasko <egor.pasko@gmail.com> wrote:
>> >
>> > On the 0x25B day of Apache Harmony Mikhail Fursov wrote:
>> > > IMO disabling GC for inlined arraycopy IR region is a good idea and
>> it
>> > > should work.
>> > > Other proposals from me here:
>> > > 1) Why to report each object separately? May be calling wb helper
>> once
>> > with
>> > > array of objects could be better?
>> >
>> > AFAIR, Robin proposed to report array updates by chunks, which is a
>> > good idea for fast thread suspension. WB once for small arrays, wb
>> > several times for large arrays.
>>
>>
>> The issues that I did not understand from Robin's post are:
>> 1) Why do we need chunks?
>
> 1. arraycopy is uninterruptable
> 2. big arrays may take long to copy
>
> => the thread doing a big array copying cannot be suspended for a long
> time (which is a performance problem)
>
> first solution that came into our mind is to copy by uninterruptable
> chunks, WB after each chunk and check for suspension flag after each
> chunk.
>
>> 2) What does object_write_barrier(dest) report in algorithm with chunks?
>
> it should report the chunk update. Of course, it should have have more
> parameters to outline the exact chunk that has been updated.

The barrier that has been measured in the literature is simply an object
remembering barrier.  The only information in it is the address of the
object.  At (nursery) GC time, the GC uses the object pointer as a root,
and scans the array as a source of pointers into the nursery.

This is an optimization because
a) scanning arrays is fast

  while (++slot < array_bound)
    if (*slot > NURSERY_BOUNDARY)
      *slot = evacuate(*slot);

  hardware prefetch kicks in and even large arrays are quickly scanned.

b) The barrier is fast.
  if (!atomic_test_and_set(array_header->remembered_bit))
    *array_remset++ = array_address

While a barrier that saved a sub-range of an array might save some time
during a nursery GC, it would be far more expensive at runtime,
particularly if you use a structured remset that allows multiple
overlapping writes to the same array to be coalesced.

To put it another way, I _can_ imagine workloads for which an array-range
remembering barrier might be faster, but I don't know of any right now.

It might be worthwhile to remember slots for small arrays, eg < 8
elements, and use object remembering for the larger ones.

>> > 2) Another solution could be if GC will provide a helper written with
>> > > vmmagic for array copying by itself?
>> >
>> > yes, that makes it a) more elegant/supportable than JIT magic b) has
>> > GC specifics in GC. But on the other hand we should limit this vmmagic
>> > functionality by the "optimized" part of arraycopy. All other
>> > exception throwing pecularities are more natural to implement on JIT
>> > or VM side.
>>
>>
>> Dividing arraycopy  into different parts implies additional layer in
>> GC-JIT
>> communication in this case. I see no reasons why not to do nullpointer
>> or
>> bounds check in vmmagic helper.
>
> and type checks? IMHO, these checks are not GC-ish. It has to be one
> more contract if we want to tickle with this performance issue. The
> question is will it be a new CG-JIT contract or GC-VM contract (to
> throw exceptions). I see no much difference here. Need to decide.

FWIW, JikesRVM makes a fast-syscall to the OS memcpy() function.

I think it would be very hard to beat an inlined barrier followed by an
out-of-line call to memcpy or a hand-coded equivalent.

> --
> Egor Pasko
>
>

cheers,
Robin


Mime
View raw message