harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Egor Pasko <egor.pa...@gmail.com>
Subject Re: [drlvm]several performance optimizaion can improve situation on Dacapo.jython bench
Date Thu, 26 Jul 2007 15:44:18 GMT
Vladimir,

this is a REALLY AWESOME analysis that you perfomed!!
we should definitely pick all these items and optimise them out, and,
I am sure, we will!

Greater thanks!

on the exceptions: I wonder why lazyexc does not apply here.. Maybe,
this is a recompilation problem? Vladimir, did you try to run
tryRaiseExceptions(...) several times in a loop? does it help DRLVM's
performance?

On the 0x31F day of Apache Harmony Vladimir Strigun wrote:
> Hi all,
> 
> I've gathered statistics for Dapaco.jython bench (the worst Dacapo
> bench in performance point of view), and identified several places for
> optimization. For every hot place small testcase was created √ you can
> find below as well as estimated speedup for every case. I believe that
> optimization below could significantly improve current "horrible"
> situation for jython (7570 on DRLVM vs 2916 on Sun 1.6).
> 
> Throwing/catching exception (HARMONY-4549 was created to track the issue)
> Expected boost: 700 ms = ~5-7 % overall jython bench
> Description: Raising/catching exceptions is very slow in comparison
> with Sun. TryRaiseExcept sub-bench of jython bench throwing and
> catching thousands exceptions and as you can see from the numbers
> below, it works more that 3 times slower on drlvm. AFAIU, since there
> are some operations on exception object in catch block VM unwind the
> stack every time exception caught.
> Small testcase:
> public class TestExceptions {
> 
>     public static void main(String[] args) {
>         //warmup VM first
>         tryRaiseExceptions(1);
>         long start = System.currentTimeMillis();
>         tryRaiseExceptions(1000000);
>         long res = System.currentTimeMillis() -start;
>         System.out.println("completed in "+res+" msec");
>     }
> 
>     public static void tryRaiseExceptions(int n) {
>         for(int i=0; i<n; i++)
>             try{
>                 throw new TException();
>             }catch(TException throwable){
>                 TException ts = Test2.test(throwable);
>             }
>     }
> }
> 
> 
> public class Test2  {
>    public static TException test(TException thr) {
>        return thr;
>    }
> }
> 
> public class TException  extends RuntimeException {
> }
> 
> 
> 
> System.identityHashCode re-implementation on magics (HARMONY-4551)
> Expected boost: 1000 ms = ~10% overall
> Description: System.identityHashCode() method frequently used in
> jython bench (more that 22000000 invocations). The reason of some many
> invocations is IdentityHashMap usage for storing ThreadLocal objects.
> I assume the method could be implemented through magic's and small
> experiments with the next incorrect implementation shows huge speedup
> on small testcase (from 1609 msec for un-patched version to 409 msec
> on patched one)
> 
> return ObjectReference.fromObject(object).toAddress().toInt();
> 
> Small testcase:
> public class test {
>     public static void main(String[] args) {
>         runTest(1000, new Object());
>         long start = System.currentTimeMillis();
>         runTest(10000000, str);
>         long end = System.currentTimeMillis() - start;
>         System.out.println("completed in "+end);
>     }
> 
>     public static void runTest(int num, Object obj) {
>         for(int i=0; i<num; i++) {
>             System.identityHashCode(new Object());
>         }
>     }
> }
> 
> 
> Instanceof modification (HARMONY-4552)
> Expected boost: 700 ms = ~5-7%
> Description: instanceof used in many places in Dacapo, but the hottest
> places are Arithmetic operations, in particular CompareFloats,
> CompareIntegers, SimpleFloatArithmetic, etc. The typical code for
> those benches is the following:
> PyInteger add(PyObject obj)
> If(obj instanceof PyInteger)
>   Int v = ((PyInteger)obj).value
> 
> It means that we have thousands of instanceof check for the same
> object, i.e. PyInteger instanceof PyInteger. Small testcase illustrate
> the problem. I should mention that the test works very fast on Sun 1.6
> server : 15 msec, while in client mode it completed in 2600 msec. On
> Harmony VM in server mode test completed in 2700 msec
> 
> Small testcase:
> public class Test {
>     public static void main(String[] args) {
>         runTest(1000, new String());
>         long start = System.currentTimeMillis();
>         runTest(1000000000, new String());
>         long end = System.currentTimeMillis() - start;
>         System.out.println("completed in "+end);
>     }
> 
>     public static void runTest(int num, String obj) {
>         for(int i=0; i<num; i++) {
>             if(obj instanceof String){}
>         }
>     }
> }
> 
> String.compareTo and equals methods optimizations ( HARMONY-4553 )
> Expected boost: 700 ms = ~5-7%
> Description: compareTo and equals methods used in CompareStrings,
> CompareInternedStrings sub benches and in several cases inside jython.
> The test below shows that DRLVM significantly slower on these
> operation.
> 
> Small testcase:
> public class CompareToTest{
>     public static void main(String[] args){
>         String st1 = new String("0 1 2 3 4 5 6 7 8 9");
>         String st2 = new String("0 1 2 3 4 5 6 7 8 9");
>         //warmup VM
>         stringCompareTo(st1, st2, 100000);
>         long start = System.currentTimeMillis();
>         stringCompareTo(st1, st2, 20000000);
>         long end = System.currentTimeMillis() -start;
>         System.out.println("String compareTo for equals strings
> completed in "+end +" msec");
>         st1 = new String("0 1 2 3 4 5 6 7 8 9abc");
>         //warmup VM
>         stringCompareTo(st1, st2, 100000);
>         long start1 = System.currentTimeMillis();
>         stringCompareTo(st1, st2, 20000000);
>         long end1 = System.currentTimeMillis() -start1;
>         System.out.println("String compareTo for non equals strings
> completed in "+end1 +" msec");
> 
>         System.out.println("Total in "+(end1+end) +" msec");
> 
>     }
> 
>     public static void stringCompareTo(String st1, String st2, int num){
>         for(int x=0; x<num; x++) {
>             st1.compareTo(st2);
>         }
> 
>     }
> }
> 
> 
> Thread.currentThread() method optimization (HARMONY-4555)
> Expected boost: ~5%
> Description: Thread.currentThread() is also one of the hot method for
> jython bench. The method invoked more that 7.5 millions times during
> jython execution. Despite the fact that the method was already
> optimized several times it still works slower on comparison with RI.
> I've made some experiments with magics implementation several weeks
> ago and have a good speedup for small test and for jython bench. Since
> threading system redesigning at the moment, I think it would be great
> to add currentThread() optimization to the plan.
> 
> Testcase:
> public class CurrentThreadTest {
>     public static void main(String[] args) {
>         long st = System.currentTimeMillis();
>         for(int i=0; i< 100000000; i++) {
>             Thread.currentThread();
>         }
>         long res = System.currentTimeMillis()-st;
>         System.out.println("res="+res);
>     }
> }
> 
> 
> Could JIT, GC and Thread gurus please have a look to the mentioned issues?
> 
> 
> Thanks.
> Vladimir.
> 
> Sub-benches statistics in milliseconds :
> 
> HARMONY JDK H vs JDK
> 
>       BuiltinFunctionCalls 63 78 0,807692
>        BuiltinMethodLookup 265 203 1,305419
>              CompareFloats 110 31 3,548387
>      CompareFloatsIntegers 94 63 1,492063
>            CompareIntegers 156 31 5,032258
>     CompareInternedStrings 187 31 6,032258
>               CompareLongs 94 32 2,9375
>             CompareStrings 125 31 4,032258
>             CompareUnicode 94 31 3,032258
>              ConcatStrings 797 656 1,214939
>              ConcatUnicode 562 188 2,989362
>            CreateInstances 203 62 3,274194
>         CreateNewInstances 344 204 1,686275
>    CreateStringsWithConcat 344 156 2,205128
>    CreateUnicodeWithConcat 141 78 1,807692
>               DictCreation 156 78 2
>          DictWithFloatKeys 328 141 2,326241
>        DictWithIntegerKeys 157 78 2,012821
>         DictWithStringKeys 62 62 1
>                   ForLoops 78 94 0,829787
>                 IfThenElse 172 234 0,735043
>                ListSlicing 63 32 1,96875
>             NestedForLoops 109 109 1
>       NormalClassAttribute 156 78 2
>    NormalInstanceAttribute 125 63 1,984127
>        PythonFunctionCalls 188 78 2,410256
>          PythonMethodCalls 250 109 2,293578
>                  Recursion 250 94 2,659574
>               SecondImport 141 109 1,293578
>        SecondPackageImport 156 141 1,106383
>      SecondSubmoduleImport 234 187 1,251337
>    SimpleComplexArithmetic 110 16 6,875
>     SimpleDictManipulation 156 94 1,659574
>      SimpleFloatArithmetic 109 62 1,758065
>   SimpleIntFloatArithmetic 78 16 4,875
>    SimpleIntegerArithmetic 63 31 2,032258
>     SimpleListManipulation 62 31 2
>       SimpleLongArithmetic 157 188 0,835106
>                 SmallLists 343 141 2,432624
>                SmallTuples 250 125 2
>      SpecialClassAttribute 141 93 1,516129
>   SpecialInstanceAttribute 125 63 1,984127
>             StringMappings 328 125 2,624
>           StringPredicates 219 109 2,009174
>              StringSlicing 140 78 1,794872
>                  TryExcept 16 0
>             TryRaiseExcept 1641 500 3,282
>               TupleSlicing 172 94 1,829787
>            UnicodeMappings 156 110 1,418182
>          UnicodePredicates 219 78 2,807692
>             UnicodeSlicing 140 62 2,258065
> 
> 
> 10829 5578
> 

-- 
Egor Pasko


Mime
View raw message