harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Shipilev" <aleksey.shipi...@gmail.com>
Subject Re: [performance] quick sort is 4x slower on Harmony
Date Thu, 10 Jan 2008 16:32:03 GMT
And update here. I have confirmed that the main contributor is ValueProfiler.

RI measurement (again):
=== /localdisk/jdk1.6.0_02/bin/java -server GenericQuicksort2 ===
iteration 0: elapsed: 4825ms
iteration 1: elapsed: 4805ms
iteration 2: elapsed: 5128ms
iteration 3: elapsed: 5125ms
iteration 4: elapsed: 5130ms

Baseline measurement (again):
=== /nfs/pb/home/ashipile/jre-r610377-clean/bin/java -Xem:server
GenericQuicksort2 ===
iteration 0: elapsed: 178898ms
iteration 1: elapsed: 5663ms
iteration 2: elapsed: 5666ms
iteration 3: elapsed: 5660ms
iteration 4: elapsed: 5672ms

Collapsing critical section in ValueProfiler::addNewValue to wrap only
insert_into_tnv_table - that should be initial proof-of-concept for
going to CAS increase, Note that first iteration time decreased
significantly, so we might consider CAS as an option:
=== /nfs/pb/home/ashipile/jre-r610377-work/bin/java -Xem:server
GenericQuicksort2 ===
iteration 0: elapsed: 85127ms
iteration 1: elapsed: 5665ms
iteration 2: elapsed: 5665ms
iteration 3: elapsed: 5667ms
iteration 4: elapsed: 5679ms

Removing synchronization from VP at all (replacing
lockProfile/unlockProfile with empty stubs rather that hymutex_*),
note more decrease in rampup time and *boost* on next stages
(probably, no more locking for concurrent SD1_OPT methods profiling?):
=== /nfs/pb/home/ashipile/jre-r610377-work/bin/java -Xem:server
GenericQuicksort2 ===
iteration 0: elapsed: 79678ms
iteration 1: elapsed: 5018ms
iteration 2: elapsed: 5014ms
iteration 3: elapsed: 5013ms
iteration 4: elapsed: 5028ms

The profile of this mode, FIRST iteration, after 30 seconds of run:
27% Other32
21% libem#addNewValue
10% libharmonyvm#helper_get_interface_vtable
17% libem#find
8% libem#value_profiler_add_value
3% libem#getVPC
5% libharmonyvm#rth_get_interface_vtable
6% libjitrino#add_value_profile_value

The profile of this mode, LAST iteration:
99% Other32
1% libjitrino#<various>

Note that locks are disappeared - that testifies the problem with VP
locks. After rampup there seem to be just a little JRE activity, most
of the time executing user code.

I'm going to propose the option that eliminates synchronization from
VP completely sacrificing profile accuracy. Egor, Pavel, what do you
think? Is synchronization removal too dangerous?

Just a thought: next thing we should consider is making VP to stop
profiling after optimized version of code is available, since we don't
care about profile information further.


View raw message