harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Strigun" <vstri...@gmail.com>
Subject Re: [drlvm]several performance optimizaion can improve situation on Dacapo.jython bench
Date Thu, 26 Jul 2007 16:00:16 GMT
Hi Egor,

On 26 Jul 2007 19:44:18 +0400, Egor Pasko <egor.pasko@gmail.com> wrote:
> Vladimir,
>
> this is a REALLY AWESOME analysis that you perfomed!!
> we should definitely pick all these items and optimise them out, and,
> I am sure, we will!

Thanks for your interest in that area.

> Greater thanks!
>
> on the exceptions: I wonder why lazyexc does not apply here.. Maybe,
> this is a recompilation problem? Vladimir, did you try to run
> tryRaiseExceptions(...) several times in a loop? does it help DRLVM's
> performance?

Yes, lazyexc optimization works well in simple case (several times
faster in comparison with server mode) for the next example:
            try{
                throw new Exception();
            }catch(Throwable throwable){
            }

Within the provided test I tried to emulate exception processing
inside TryRaiseExcept jython subtest. I've tried to remove several
side-effect checks in lazyexc.cpp, but unfortunately haven't any
speedup for jython as well as for jython. Possibly, lazyexc not
working on the bench because there is some operations for exception
object in catch block.

Thanks.
Vladimir.

> On the 0x31F day of Apache Harmony Vladimir Strigun wrote:
> > Hi all,
> >
> > I've gathered statistics for Dapaco.jython bench (the worst Dacapo
> > bench in performance point of view), and identified several places for
> > optimization. For every hot place small testcase was created √ you can
> > find below as well as estimated speedup for every case. I believe that
> > optimization below could significantly improve current "horrible"
> > situation for jython (7570 on DRLVM vs 2916 on Sun 1.6).
> >
> > Throwing/catching exception (HARMONY-4549 was created to track the issue)
> > Expected boost: 700 ms = ~5-7 % overall jython bench
> > Description: Raising/catching exceptions is very slow in comparison
> > with Sun. TryRaiseExcept sub-bench of jython bench throwing and
> > catching thousands exceptions and as you can see from the numbers
> > below, it works more that 3 times slower on drlvm. AFAIU, since there
> > are some operations on exception object in catch block VM unwind the
> > stack every time exception caught.
> > Small testcase:
> > public class TestExceptions {
> >
> >     public static void main(String[] args) {
> >         //warmup VM first
> >         tryRaiseExceptions(1);
> >         long start = System.currentTimeMillis();
> >         tryRaiseExceptions(1000000);
> >         long res = System.currentTimeMillis() -start;
> >         System.out.println("completed in "+res+" msec");
> >     }
> >
> >     public static void tryRaiseExceptions(int n) {
> >         for(int i=0; i<n; i++)
> >             try{
> >                 throw new TException();
> >             }catch(TException throwable){
> >                 TException ts = Test2.test(throwable);
> >             }
> >     }
> > }
> >
> >
> > public class Test2  {
> >    public static TException test(TException thr) {
> >        return thr;
> >    }
> > }
> >
> > public class TException  extends RuntimeException {
> > }
> >
> >
> >
> > System.identityHashCode re-implementation on magics (HARMONY-4551)
> > Expected boost: 1000 ms = ~10% overall
> > Description: System.identityHashCode() method frequently used in
> > jython bench (more that 22000000 invocations). The reason of some many
> > invocations is IdentityHashMap usage for storing ThreadLocal objects.
> > I assume the method could be implemented through magic's and small
> > experiments with the next incorrect implementation shows huge speedup
> > on small testcase (from 1609 msec for un-patched version to 409 msec
> > on patched one)
> >
> > return ObjectReference.fromObject(object).toAddress().toInt();
> >
> > Small testcase:
> > public class test {
> >     public static void main(String[] args) {
> >         runTest(1000, new Object());
> >         long start = System.currentTimeMillis();
> >         runTest(10000000, str);
> >         long end = System.currentTimeMillis() - start;
> >         System.out.println("completed in "+end);
> >     }
> >
> >     public static void runTest(int num, Object obj) {
> >         for(int i=0; i<num; i++) {
> >             System.identityHashCode(new Object());
> >         }
> >     }
> > }
> >
> >
> > Instanceof modification (HARMONY-4552)
> > Expected boost: 700 ms = ~5-7%
> > Description: instanceof used in many places in Dacapo, but the hottest
> > places are Arithmetic operations, in particular CompareFloats,
> > CompareIntegers, SimpleFloatArithmetic, etc. The typical code for
> > those benches is the following:
> > PyInteger add(PyObject obj)
> > If(obj instanceof PyInteger)
> >   Int v = ((PyInteger)obj).value
> >
> > It means that we have thousands of instanceof check for the same
> > object, i.e. PyInteger instanceof PyInteger. Small testcase illustrate
> > the problem. I should mention that the test works very fast on Sun 1.6
> > server : 15 msec, while in client mode it completed in 2600 msec. On
> > Harmony VM in server mode test completed in 2700 msec
> >
> > Small testcase:
> > public class Test {
> >     public static void main(String[] args) {
> >         runTest(1000, new String());
> >         long start = System.currentTimeMillis();
> >         runTest(1000000000, new String());
> >         long end = System.currentTimeMillis() - start;
> >         System.out.println("completed in "+end);
> >     }
> >
> >     public static void runTest(int num, String obj) {
> >         for(int i=0; i<num; i++) {
> >             if(obj instanceof String){}
> >         }
> >     }
> > }
> >
> > String.compareTo and equals methods optimizations ( HARMONY-4553 )
> > Expected boost: 700 ms = ~5-7%
> > Description: compareTo and equals methods used in CompareStrings,
> > CompareInternedStrings sub benches and in several cases inside jython.
> > The test below shows that DRLVM significantly slower on these
> > operation.
> >
> > Small testcase:
> > public class CompareToTest{
> >     public static void main(String[] args){
> >         String st1 = new String("0 1 2 3 4 5 6 7 8 9");
> >         String st2 = new String("0 1 2 3 4 5 6 7 8 9");
> >         //warmup VM
> >         stringCompareTo(st1, st2, 100000);
> >         long start = System.currentTimeMillis();
> >         stringCompareTo(st1, st2, 20000000);
> >         long end = System.currentTimeMillis() -start;
> >         System.out.println("String compareTo for equals strings
> > completed in "+end +" msec");
> >         st1 = new String("0 1 2 3 4 5 6 7 8 9abc");
> >         //warmup VM
> >         stringCompareTo(st1, st2, 100000);
> >         long start1 = System.currentTimeMillis();
> >         stringCompareTo(st1, st2, 20000000);
> >         long end1 = System.currentTimeMillis() -start1;
> >         System.out.println("String compareTo for non equals strings
> > completed in "+end1 +" msec");
> >
> >         System.out.println("Total in "+(end1+end) +" msec");
> >
> >     }
> >
> >     public static void stringCompareTo(String st1, String st2, int num){
> >         for(int x=0; x<num; x++) {
> >             st1.compareTo(st2);
> >         }
> >
> >     }
> > }
> >
> >
> > Thread.currentThread() method optimization (HARMONY-4555)
> > Expected boost: ~5%
> > Description: Thread.currentThread() is also one of the hot method for
> > jython bench. The method invoked more that 7.5 millions times during
> > jython execution. Despite the fact that the method was already
> > optimized several times it still works slower on comparison with RI.
> > I've made some experiments with magics implementation several weeks
> > ago and have a good speedup for small test and for jython bench. Since
> > threading system redesigning at the moment, I think it would be great
> > to add currentThread() optimization to the plan.
> >
> > Testcase:
> > public class CurrentThreadTest {
> >     public static void main(String[] args) {
> >         long st = System.currentTimeMillis();
> >         for(int i=0; i< 100000000; i++) {
> >             Thread.currentThread();
> >         }
> >         long res = System.currentTimeMillis()-st;
> >         System.out.println("res="+res);
> >     }
> > }
> >
> >
> > Could JIT, GC and Thread gurus please have a look to the mentioned issues?
> >
> >
> > Thanks.
> > Vladimir.
> >
> > Sub-benches statistics in milliseconds :
> >
> > HARMONY JDK H vs JDK
> >
> >       BuiltinFunctionCalls 63 78 0,807692
> >        BuiltinMethodLookup 265 203 1,305419
> >              CompareFloats 110 31 3,548387
> >      CompareFloatsIntegers 94 63 1,492063
> >            CompareIntegers 156 31 5,032258
> >     CompareInternedStrings 187 31 6,032258
> >               CompareLongs 94 32 2,9375
> >             CompareStrings 125 31 4,032258
> >             CompareUnicode 94 31 3,032258
> >              ConcatStrings 797 656 1,214939
> >              ConcatUnicode 562 188 2,989362
> >            CreateInstances 203 62 3,274194
> >         CreateNewInstances 344 204 1,686275
> >    CreateStringsWithConcat 344 156 2,205128
> >    CreateUnicodeWithConcat 141 78 1,807692
> >               DictCreation 156 78 2
> >          DictWithFloatKeys 328 141 2,326241
> >        DictWithIntegerKeys 157 78 2,012821
> >         DictWithStringKeys 62 62 1
> >                   ForLoops 78 94 0,829787
> >                 IfThenElse 172 234 0,735043
> >                ListSlicing 63 32 1,96875
> >             NestedForLoops 109 109 1
> >       NormalClassAttribute 156 78 2
> >    NormalInstanceAttribute 125 63 1,984127
> >        PythonFunctionCalls 188 78 2,410256
> >          PythonMethodCalls 250 109 2,293578
> >                  Recursion 250 94 2,659574
> >               SecondImport 141 109 1,293578
> >        SecondPackageImport 156 141 1,106383
> >      SecondSubmoduleImport 234 187 1,251337
> >    SimpleComplexArithmetic 110 16 6,875
> >     SimpleDictManipulation 156 94 1,659574
> >      SimpleFloatArithmetic 109 62 1,758065
> >   SimpleIntFloatArithmetic 78 16 4,875
> >    SimpleIntegerArithmetic 63 31 2,032258
> >     SimpleListManipulation 62 31 2
> >       SimpleLongArithmetic 157 188 0,835106
> >                 SmallLists 343 141 2,432624
> >                SmallTuples 250 125 2
> >      SpecialClassAttribute 141 93 1,516129
> >   SpecialInstanceAttribute 125 63 1,984127
> >             StringMappings 328 125 2,624
> >           StringPredicates 219 109 2,009174
> >              StringSlicing 140 78 1,794872
> >                  TryExcept 16 0
> >             TryRaiseExcept 1641 500 3,282
> >               TupleSlicing 172 94 1,829787
> >            UnicodeMappings 156 110 1,418182
> >          UnicodePredicates 219 78 2,807692
> >             UnicodeSlicing 140 62 2,258065
> >
> >
> > 10829 5578
> >
>
> --
> Egor Pasko
>
>
Mime
View raw message