harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weldon Washburn" <weldon...@gmail.com>
Subject Re: [DRLVM][JIT] write barrier broken by new jit opts?
Date Tue, 09 Jan 2007 16:37:07 GMT
On 1/9/07, Robin Garner <robin.garner@anu.edu.au> wrote:
>
> > On 09 Jan 2007 16:54:03 +0600, Egor Pasko <egor.pasko@gmail.com> wrote:
> >>
> >> On the 0x258 day of Apache Harmony Weldon Washburn wrote:
> >> > It looks like multiple interesting design topics.  My comments
> inlined
> >> > below.
> >> >
> >> [..snip..]
> >> >
> >> > > Hi, I found write barrier in DRLVM can't catch all reference fields
> >> > > updates, and the problem is identified to be caused by new jit opts
> >> > > that do not observe the write barrier invariant for fields updates.
> >> > > For example, JIT may generate code to copy consecutive fields of an
> >> > > object without invoking write barrier. In order to revive the write
> >> > > barrier functionality while not sacrificing the opt performance,
> >> >
> >> > Yes.  But first how much performance is  being
> sacrificed?  .2%?  8%??
>
> >>
> >> JIT magic arraycopy gives up to 30% boost on a microbenchmark (see
> >> HARMONY-2247). I did not measure boosts on specific benchmarks, but I
> >> think users would expect System.arraycopy() to be pretty faster than
> >> manual array copying. We should care about performance numbers as
> >> such.
> >
> >
> > Let me re-phrase the question.  Arraycopy performance is important and
> > deserves the special case treatment it has always gotten.  Setting aside
> > arraycopy, how much performance gain can be expected by optimizing
> > consecutive writes to fields of an object for the benchmarks we care
> > about?
> > What about simply marking the consecutive writes regions as
> > "Uninterruptible"?  This would eliminate yet another API between the GC
> > and
> > JIT.  I think this is basically the same as Robin's suggestion.
> >
> > Regarding arraycopy, is there a problem with making the entire arraycopy
> > loop "Uninterruptible"?  This will impact GC latency but is the impact a
> > big
> > deal for workloads we care about?  If it is, why not have the compiler
> > unroll the loop a bunch and put WBs every, say, 10th write.  The body of
> > 10
> > writes would be Uninterruptible.
>
> With arraycopy, much of the saving is in barrier costs themselves.  Apart
> from the overhead on the write, there's a reduction in remset entries, and
> the cost of scanning the object at GC time is minimal for a reference
> array.
>
> The original motivating benchmark was jess iirc.
>
> Off the top of my head, if the barrier is called before any data is copied
> I think the arraycopy code is GC-safe, provided another barrier call is
> made after the GC and before the next pointer write.  It's probably better
> to make the arraycopy code uninterruptible.


Yes, I agree it's probably better to make the arraycopy code
uninterrutible.  The only caution is the impact on GC latency.  Um, does
this require a new additional API or can we simply use what is existing?

My gut feel is that scalars don't generally have enough pointers to make
> the object remembering barrier worthwhile.


That's my hunch also.  However, if someone wants to spend time analyzing
enterprise workloads to discover if there is any cheese down that tunnel, I
won't get in the way.

>> > I'd
> >> > > suggest to introduce an object remember write barrier which will be
> >> > > invoked after the object is copied. So the JIT doesn't need to
> >> insert
> >> > > barrier for each field store.
> >> >
> >> > hmm.... what happens if some other app thread causes a GC to happen
> in
> >> the
> >> > middle of writing a bunch of fields of a given object?  If the
> >> > gc_heap_wrote_object() is called before JITed code scribbles on
> slots,
> >> then
> >> > the early slots will be scanned and handled properly by the GC.   But
> >> how do
> >> > we handle the slots written after the GC completes?  One approach
> >> would
> >> be
> >> > for the JIT to mark such regions of emitted code "Uninterruptable".
> >> > Another approach would be to emit a WB both before and after a region
> >> of
> >> > multiple ref field scribbles.   In any case, it looks like we need
> >> patch
> >> up
> >> > the holes in the contract between jit and gc.  However, as I said
> >> above
> >> is
> >> > there anything wrong with a real simple dumb contract for now?  That
> >> is
> >> each
> >> > ref write has a matching WB with no intervening instructions.
> >> >
> >> > >GC has an interface
> >> > > gc_heap_wrote_object(p_obj) for this case.  I think it's ok to
> >> insert
> >> > > only the runtime native call at first. Then later we can consider
> to
> >> > > inline the object remembering barrier as well as the slot
> >> remembering
> >> > > barrier.
> >> > >
> >> > > Thanks,
> >> > > xiaofeng
> >> > >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Weldon Washburn
> >> > Intel Enterprise Solutions Software Division
> >>
> >> --
> >> Egor Pasko
> >>
> >>
> >
> >
> > --
> > Weldon Washburn
> > Intel Enterprise Solutions Software Division
> >
>
>
>


-- 
Weldon Washburn
Intel Enterprise Solutions Software Division

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message