harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Egor Pasko <egor.pa...@gmail.com>
Subject Re: [performance] quick sort is 4x slower on Harmony
Date Fri, 11 Jan 2008 08:16:20 GMT
On the 0x3C7 day of Apache Harmony Aleksey Shipilev wrote:
> And update here. I have confirmed that the main contributor is ValueProfiler.
> RI measurement (again):
> === /localdisk/jdk1.6.0_02/bin/java -server GenericQuicksort2 ===
> iteration 0: elapsed: 4825ms
> iteration 1: elapsed: 4805ms
> iteration 2: elapsed: 5128ms
> iteration 3: elapsed: 5125ms
> iteration 4: elapsed: 5130ms
> Baseline measurement (again):
> === /nfs/pb/home/ashipile/jre-r610377-clean/bin/java -Xem:server
> GenericQuicksort2 ===
> iteration 0: elapsed: 178898ms
> iteration 1: elapsed: 5663ms
> iteration 2: elapsed: 5666ms
> iteration 3: elapsed: 5660ms
> iteration 4: elapsed: 5672ms
> Collapsing critical section in ValueProfiler::addNewValue to wrap only
> insert_into_tnv_table - that should be initial proof-of-concept for
> going to CAS increase, Note that first iteration time decreased
> significantly, so we might consider CAS as an option:
> === /nfs/pb/home/ashipile/jre-r610377-work/bin/java -Xem:server
> GenericQuicksort2 ===
> iteration 0: elapsed: 85127ms
> iteration 1: elapsed: 5665ms
> iteration 2: elapsed: 5665ms
> iteration 3: elapsed: 5667ms
> iteration 4: elapsed: 5679ms
> Removing synchronization from VP at all (replacing
> lockProfile/unlockProfile with empty stubs rather that hymutex_*),
> note more decrease in rampup time and *boost* on next stages
> (probably, no more locking for concurrent SD1_OPT methods profiling?):
> === /nfs/pb/home/ashipile/jre-r610377-work/bin/java -Xem:server
> GenericQuicksort2 ===
> iteration 0: elapsed: 79678ms
> iteration 1: elapsed: 5018ms
> iteration 2: elapsed: 5014ms
> iteration 3: elapsed: 5013ms
> iteration 4: elapsed: 5028ms
> The profile of this mode, FIRST iteration, after 30 seconds of run:
> 27% Other32
> 21% libem#addNewValue
> 10% libharmonyvm#helper_get_interface_vtable
> 17% libem#find
> 8% libem#value_profiler_add_value
> 3% libem#getVPC
> 5% libharmonyvm#rth_get_interface_vtable
> 6% libjitrino#add_value_profile_value
> The profile of this mode, LAST iteration:
> 99% Other32
> 1% libjitrino#<various>
> Note that locks are disappeared - that testifies the problem with VP
> locks. After rampup there seem to be just a little JRE activity, most
> of the time executing user code.

cool investigation, Alexey! I love you :)

> I'm going to propose the option that eliminates synchronization from
> VP completely sacrificing profile accuracy. Egor, Pavel, what do you
> think? Is synchronization removal too dangerous?

I think, synchronization removal in this place is not very
dangerous. No guarantee, of course, you will need to check on SPECjbb
before applying the changes. DaCapo impact is also interesting.

Hence, I vote to remove the synchronization in the form that exists
now when we find a better solution for this microbenchmark and a
couple you-know-which benchmarks. Don't want to keep this trash in the
code forcing people to guess why it is there.

Concurrent modifications of TNV table will make it imprecise, but that
'ideally' should be no danger. I say 'ideally' because am not sure
about this implementation, it may require to eliminate
flushInstProfile(...) call from the addNewValue method to make the TNV
table concurrently modifiable with not much danger. That might require
to separate implementation of two TNV algorithms (i.e. TNV_FIRST_N
(=used now) and TNV_DIVIDED (=enhanced)).

And, of course, we still need a lock in createProfile()

There is also another option that I am keen telling about many times:
skip modifying TNV table if it is occupied by another thread. That
could really make a good command-line option for further experiments.

One more option is to keep TNV tables in TLS, but, that looks like too
memory-consuming and overcomplicated. Just mentioning it to complete
the picture :)

Alexey, I would be glad to prepare the patch by myself, but I can only
start doing it at the end of the next week. So, if you feel like doing
it, please, take the task, I would gladly review it and commit.

> Just a thought: next thing we should consider is making VP to stop
> profiling after optimized version of code is available, since we don't
> care about profile information further.

yes, I agree, that is a promising TODO

P.S.: still I cannot understand why non-recompiled version works such
a long time in this test. I suspect another bug here.

Egor Pasko

View raw message