harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Strigun" <vstri...@gmail.com>
Subject [drlvm]several performance optimizaion can improve situation on Dacapo.jython bench
Date Thu, 26 Jul 2007 13:15:24 GMT
Hi all,

I've gathered statistics for Dapaco.jython bench (the worst Dacapo
bench in performance point of view), and identified several places for
optimization. For every hot place small testcase was created – you can
find below as well as estimated speedup for every case. I believe that
optimization below could significantly improve current "horrible"
situation for jython (7570 on DRLVM vs 2916 on Sun 1.6).

Throwing/catching exception (HARMONY-4549 was created to track the issue)
Expected boost: 700 ms = ~5-7 % overall jython bench
Description: Raising/catching exceptions is very slow in comparison
with Sun. TryRaiseExcept sub-bench of jython bench throwing and
catching thousands exceptions and as you can see from the numbers
below, it works more that 3 times slower on drlvm. AFAIU, since there
are some operations on exception object in catch block VM unwind the
stack every time exception caught.
Small testcase:
public class TestExceptions {

    public static void main(String[] args) {
        //warmup VM first
        tryRaiseExceptions(1);
        long start = System.currentTimeMillis();
        tryRaiseExceptions(1000000);
        long res = System.currentTimeMillis() -start;
        System.out.println("completed in "+res+" msec");
    }

    public static void tryRaiseExceptions(int n) {
        for(int i=0; i<n; i++)
            try{
                throw new TException();
            }catch(TException throwable){
                TException ts = Test2.test(throwable);
            }
    }
}


public class Test2  {
   public static TException test(TException thr) {
       return thr;
   }
}

public class TException  extends RuntimeException {
}



System.identityHashCode re-implementation on magics (HARMONY-4551)
Expected boost: 1000 ms = ~10% overall
Description: System.identityHashCode() method frequently used in
jython bench (more that 22000000 invocations). The reason of some many
invocations is IdentityHashMap usage for storing ThreadLocal objects.
I assume the method could be implemented through magic's and small
experiments with the next incorrect implementation shows huge speedup
on small testcase (from 1609 msec for un-patched version to 409 msec
on patched one)

return ObjectReference.fromObject(object).toAddress().toInt();

Small testcase:
public class test {
    public static void main(String[] args) {
        runTest(1000, new Object());
        long start = System.currentTimeMillis();
        runTest(10000000, str);
        long end = System.currentTimeMillis() - start;
        System.out.println("completed in "+end);
    }

    public static void runTest(int num, Object obj) {
        for(int i=0; i<num; i++) {
            System.identityHashCode(new Object());
        }
    }
}


Instanceof modification (HARMONY-4552)
Expected boost: 700 ms = ~5-7%
Description: instanceof used in many places in Dacapo, but the hottest
places are Arithmetic operations, in particular CompareFloats,
CompareIntegers, SimpleFloatArithmetic, etc. The typical code for
those benches is the following:
PyInteger add(PyObject obj)
If(obj instanceof PyInteger)
  Int v = ((PyInteger)obj).value

It means that we have thousands of instanceof check for the same
object, i.e. PyInteger instanceof PyInteger. Small testcase illustrate
the problem. I should mention that the test works very fast on Sun 1.6
server : 15 msec, while in client mode it completed in 2600 msec. On
Harmony VM in server mode test completed in 2700 msec

Small testcase:
public class Test {
    public static void main(String[] args) {
        runTest(1000, new String());
        long start = System.currentTimeMillis();
        runTest(1000000000, new String());
        long end = System.currentTimeMillis() - start;
        System.out.println("completed in "+end);
    }

    public static void runTest(int num, String obj) {
        for(int i=0; i<num; i++) {
            if(obj instanceof String){}
        }
    }
}

String.compareTo and equals methods optimizations ( HARMONY-4553 )
Expected boost: 700 ms = ~5-7%
Description: compareTo and equals methods used in CompareStrings,
CompareInternedStrings sub benches and in several cases inside jython.
The test below shows that DRLVM significantly slower on these
operation.

Small testcase:
public class CompareToTest{
    public static void main(String[] args){
        String st1 = new String("0 1 2 3 4 5 6 7 8 9");
        String st2 = new String("0 1 2 3 4 5 6 7 8 9");
        //warmup VM
        stringCompareTo(st1, st2, 100000);
        long start = System.currentTimeMillis();
        stringCompareTo(st1, st2, 20000000);
        long end = System.currentTimeMillis() -start;
        System.out.println("String compareTo for equals strings
completed in "+end +" msec");
        st1 = new String("0 1 2 3 4 5 6 7 8 9abc");
        //warmup VM
        stringCompareTo(st1, st2, 100000);
        long start1 = System.currentTimeMillis();
        stringCompareTo(st1, st2, 20000000);
        long end1 = System.currentTimeMillis() -start1;
        System.out.println("String compareTo for non equals strings
completed in "+end1 +" msec");

        System.out.println("Total in "+(end1+end) +" msec");

    }

    public static void stringCompareTo(String st1, String st2, int num){
        for(int x=0; x<num; x++) {
            st1.compareTo(st2);
        }

    }
}


Thread.currentThread() method optimization (HARMONY-4555)
Expected boost: ~5%
Description: Thread.currentThread() is also one of the hot method for
jython bench. The method invoked more that 7.5 millions times during
jython execution. Despite the fact that the method was already
optimized several times it still works slower on comparison with RI.
I've made some experiments with magics implementation several weeks
ago and have a good speedup for small test and for jython bench. Since
threading system redesigning at the moment, I think it would be great
to add currentThread() optimization to the plan.

Testcase:
public class CurrentThreadTest {
    public static void main(String[] args) {
        long st = System.currentTimeMillis();
        for(int i=0; i< 100000000; i++) {
            Thread.currentThread();
        }
        long res = System.currentTimeMillis()-st;
        System.out.println("res="+res);
    }
}


Could JIT, GC and Thread gurus please have a look to the mentioned issues?


Thanks.
Vladimir.

Sub-benches statistics in milliseconds :

HARMONY JDK H vs JDK

      BuiltinFunctionCalls 63 78 0,807692
       BuiltinMethodLookup 265 203 1,305419
             CompareFloats 110 31 3,548387
     CompareFloatsIntegers 94 63 1,492063
           CompareIntegers 156 31 5,032258
    CompareInternedStrings 187 31 6,032258
              CompareLongs 94 32 2,9375
            CompareStrings 125 31 4,032258
            CompareUnicode 94 31 3,032258
             ConcatStrings 797 656 1,214939
             ConcatUnicode 562 188 2,989362
           CreateInstances 203 62 3,274194
        CreateNewInstances 344 204 1,686275
   CreateStringsWithConcat 344 156 2,205128
   CreateUnicodeWithConcat 141 78 1,807692
              DictCreation 156 78 2
         DictWithFloatKeys 328 141 2,326241
       DictWithIntegerKeys 157 78 2,012821
        DictWithStringKeys 62 62 1
                  ForLoops 78 94 0,829787
                IfThenElse 172 234 0,735043
               ListSlicing 63 32 1,96875
            NestedForLoops 109 109 1
      NormalClassAttribute 156 78 2
   NormalInstanceAttribute 125 63 1,984127
       PythonFunctionCalls 188 78 2,410256
         PythonMethodCalls 250 109 2,293578
                 Recursion 250 94 2,659574
              SecondImport 141 109 1,293578
       SecondPackageImport 156 141 1,106383
     SecondSubmoduleImport 234 187 1,251337
   SimpleComplexArithmetic 110 16 6,875
    SimpleDictManipulation 156 94 1,659574
     SimpleFloatArithmetic 109 62 1,758065
  SimpleIntFloatArithmetic 78 16 4,875
   SimpleIntegerArithmetic 63 31 2,032258
    SimpleListManipulation 62 31 2
      SimpleLongArithmetic 157 188 0,835106
                SmallLists 343 141 2,432624
               SmallTuples 250 125 2
     SpecialClassAttribute 141 93 1,516129
  SpecialInstanceAttribute 125 63 1,984127
            StringMappings 328 125 2,624
          StringPredicates 219 109 2,009174
             StringSlicing 140 78 1,794872
                 TryExcept 16 0
            TryRaiseExcept 1641 500 3,282
              TupleSlicing 172 94 1,829787
           UnicodeMappings 156 110 1,418182
         UnicodePredicates 219 78 2,807692
            UnicodeSlicing 140 62 2,258065


10829 5578

Mime
View raw message