harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Fursov" <mike.fur...@gmail.com>
Subject Re: [drlvm]several performance optimizaion can improve situation on Dacapo.jython bench
Date Fri, 27 Jul 2007 12:02:33 GMT
Vladimir,
You did a great work. Let's now fix this issues one by one.

I checked the example with exceptions and found that there is a bug in our
inlining code: code regions that are reachable by exception paths only seems
not to be analyzed by inliner at all.
Here is a JIRA with description.
https://issues.apache.org/jira/browse/HARMONY-4561

I think we will have better performance after this issue is fixed.


On 7/26/07, Vladimir Strigun <vstrigun@gmail.com> wrote:
>
> Hi Egor,
>
> On 26 Jul 2007 19:44:18 +0400, Egor Pasko <egor.pasko@gmail.com> wrote:
> > Vladimir,
> >
> > this is a REALLY AWESOME analysis that you perfomed!!
> > we should definitely pick all these items and optimise them out, and,
> > I am sure, we will!
>
> Thanks for your interest in that area.
>
> > Greater thanks!
> >
> > on the exceptions: I wonder why lazyexc does not apply here.. Maybe,
> > this is a recompilation problem? Vladimir, did you try to run
> > tryRaiseExceptions(...) several times in a loop? does it help DRLVM's
> > performance?
>
> Yes, lazyexc optimization works well in simple case (several times
> faster in comparison with server mode) for the next example:
>             try{
>                 throw new Exception();
>             }catch(Throwable throwable){
>             }
>
> Within the provided test I tried to emulate exception processing
> inside TryRaiseExcept jython subtest. I've tried to remove several
> side-effect checks in lazyexc.cpp, but unfortunately haven't any
> speedup for jython as well as for jython. Possibly, lazyexc not
> working on the bench because there is some operations for exception
> object in catch block.
>
> Thanks.
> Vladimir.
>
> > On the 0x31F day of Apache Harmony Vladimir Strigun wrote:
> > > Hi all,
> > >
> > > I've gathered statistics for Dapaco.jython bench (the worst Dacapo
> > > bench in performance point of view), and identified several places for
> > > optimization. For every hot place small testcase was created √ you can
> > > find below as well as estimated speedup for every case. I believe that
> > > optimization below could significantly improve current "horrible"
> > > situation for jython (7570 on DRLVM vs 2916 on Sun 1.6).
> > >
> > > Throwing/catching exception (HARMONY-4549 was created to track the
> issue)
> > > Expected boost: 700 ms = ~5-7 % overall jython bench
> > > Description: Raising/catching exceptions is very slow in comparison
> > > with Sun. TryRaiseExcept sub-bench of jython bench throwing and
> > > catching thousands exceptions and as you can see from the numbers
> > > below, it works more that 3 times slower on drlvm. AFAIU, since there
> > > are some operations on exception object in catch block VM unwind the
> > > stack every time exception caught.
> > > Small testcase:
> > > public class TestExceptions {
> > >
> > >     public static void main(String[] args) {
> > >         //warmup VM first
> > >         tryRaiseExceptions(1);
> > >         long start = System.currentTimeMillis();
> > >         tryRaiseExceptions(1000000);
> > >         long res = System.currentTimeMillis() -start;
> > >         System.out.println("completed in "+res+" msec");
> > >     }
> > >
> > >     public static void tryRaiseExceptions(int n) {
> > >         for(int i=0; i<n; i++)
> > >             try{
> > >                 throw new TException();
> > >             }catch(TException throwable){
> > >                 TException ts = Test2.test(throwable);
> > >             }
> > >     }
> > > }
> > >
> > >
> > > public class Test2  {
> > >    public static TException test(TException thr) {
> > >        return thr;
> > >    }
> > > }
> > >
> > > public class TException  extends RuntimeException {
> > > }
> > >
> > >
> > >
> > > System.identityHashCode re-implementation on magics (HARMONY-4551)
> > > Expected boost: 1000 ms = ~10% overall
> > > Description: System.identityHashCode() method frequently used in
> > > jython bench (more that 22000000 invocations). The reason of some many
> > > invocations is IdentityHashMap usage for storing ThreadLocal objects.
> > > I assume the method could be implemented through magic's and small
> > > experiments with the next incorrect implementation shows huge speedup
> > > on small testcase (from 1609 msec for un-patched version to 409 msec
> > > on patched one)
> > >
> > > return ObjectReference.fromObject(object).toAddress().toInt();
> > >
> > > Small testcase:
> > > public class test {
> > >     public static void main(String[] args) {
> > >         runTest(1000, new Object());
> > >         long start = System.currentTimeMillis();
> > >         runTest(10000000, str);
> > >         long end = System.currentTimeMillis() - start;
> > >         System.out.println("completed in "+end);
> > >     }
> > >
> > >     public static void runTest(int num, Object obj) {
> > >         for(int i=0; i<num; i++) {
> > >             System.identityHashCode(new Object());
> > >         }
> > >     }
> > > }
> > >
> > >
> > > Instanceof modification (HARMONY-4552)
> > > Expected boost: 700 ms = ~5-7%
> > > Description: instanceof used in many places in Dacapo, but the hottest
> > > places are Arithmetic operations, in particular CompareFloats,
> > > CompareIntegers, SimpleFloatArithmetic, etc. The typical code for
> > > those benches is the following:
> > > PyInteger add(PyObject obj)
> > > If(obj instanceof PyInteger)
> > >   Int v = ((PyInteger)obj).value
> > >
> > > It means that we have thousands of instanceof check for the same
> > > object, i.e. PyInteger instanceof PyInteger. Small testcase illustrate
> > > the problem. I should mention that the test works very fast on Sun 1.6
> > > server : 15 msec, while in client mode it completed in 2600 msec. On
> > > Harmony VM in server mode test completed in 2700 msec
> > >
> > > Small testcase:
> > > public class Test {
> > >     public static void main(String[] args) {
> > >         runTest(1000, new String());
> > >         long start = System.currentTimeMillis();
> > >         runTest(1000000000, new String());
> > >         long end = System.currentTimeMillis() - start;
> > >         System.out.println("completed in "+end);
> > >     }
> > >
> > >     public static void runTest(int num, String obj) {
> > >         for(int i=0; i<num; i++) {
> > >             if(obj instanceof String){}
> > >         }
> > >     }
> > > }
> > >
> > > String.compareTo and equals methods optimizations ( HARMONY-4553 )
> > > Expected boost: 700 ms = ~5-7%
> > > Description: compareTo and equals methods used in CompareStrings,
> > > CompareInternedStrings sub benches and in several cases inside jython.
> > > The test below shows that DRLVM significantly slower on these
> > > operation.
> > >
> > > Small testcase:
> > > public class CompareToTest{
> > >     public static void main(String[] args){
> > >         String st1 = new String("0 1 2 3 4 5 6 7 8 9");
> > >         String st2 = new String("0 1 2 3 4 5 6 7 8 9");
> > >         //warmup VM
> > >         stringCompareTo(st1, st2, 100000);
> > >         long start = System.currentTimeMillis();
> > >         stringCompareTo(st1, st2, 20000000);
> > >         long end = System.currentTimeMillis() -start;
> > >         System.out.println("String compareTo for equals strings
> > > completed in "+end +" msec");
> > >         st1 = new String("0 1 2 3 4 5 6 7 8 9abc");
> > >         //warmup VM
> > >         stringCompareTo(st1, st2, 100000);
> > >         long start1 = System.currentTimeMillis();
> > >         stringCompareTo(st1, st2, 20000000);
> > >         long end1 = System.currentTimeMillis() -start1;
> > >         System.out.println("String compareTo for non equals strings
> > > completed in "+end1 +" msec");
> > >
> > >         System.out.println("Total in "+(end1+end) +" msec");
> > >
> > >     }
> > >
> > >     public static void stringCompareTo(String st1, String st2, int
> num){
> > >         for(int x=0; x<num; x++) {
> > >             st1.compareTo(st2);
> > >         }
> > >
> > >     }
> > > }
> > >
> > >
> > > Thread.currentThread() method optimization (HARMONY-4555)
> > > Expected boost: ~5%
> > > Description: Thread.currentThread() is also one of the hot method for
> > > jython bench. The method invoked more that 7.5 millions times during
> > > jython execution. Despite the fact that the method was already
> > > optimized several times it still works slower on comparison with RI.
> > > I've made some experiments with magics implementation several weeks
> > > ago and have a good speedup for small test and for jython bench. Since
> > > threading system redesigning at the moment, I think it would be great
> > > to add currentThread() optimization to the plan.
> > >
> > > Testcase:
> > > public class CurrentThreadTest {
> > >     public static void main(String[] args) {
> > >         long st = System.currentTimeMillis();
> > >         for(int i=0; i< 100000000; i++) {
> > >             Thread.currentThread();
> > >         }
> > >         long res = System.currentTimeMillis()-st;
> > >         System.out.println("res="+res);
> > >     }
> > > }
> > >
> > >
> > > Could JIT, GC and Thread gurus please have a look to the mentioned
> issues?
> > >
> > >
> > > Thanks.
> > > Vladimir.
> > >
> > > Sub-benches statistics in milliseconds :
> > >
> > > HARMONY JDK H vs JDK
> > >
> > >       BuiltinFunctionCalls 63 78 0,807692
> > >        BuiltinMethodLookup 265 203 1,305419
> > >              CompareFloats 110 31 3,548387
> > >      CompareFloatsIntegers 94 63 1,492063
> > >            CompareIntegers 156 31 5,032258
> > >     CompareInternedStrings 187 31 6,032258
> > >               CompareLongs 94 32 2,9375
> > >             CompareStrings 125 31 4,032258
> > >             CompareUnicode 94 31 3,032258
> > >              ConcatStrings 797 656 1,214939
> > >              ConcatUnicode 562 188 2,989362
> > >            CreateInstances 203 62 3,274194
> > >         CreateNewInstances 344 204 1,686275
> > >    CreateStringsWithConcat 344 156 2,205128
> > >    CreateUnicodeWithConcat 141 78 1,807692
> > >               DictCreation 156 78 2
> > >          DictWithFloatKeys 328 141 2,326241
> > >        DictWithIntegerKeys 157 78 2,012821
> > >         DictWithStringKeys 62 62 1
> > >                   ForLoops 78 94 0,829787
> > >                 IfThenElse 172 234 0,735043
> > >                ListSlicing 63 32 1,96875
> > >             NestedForLoops 109 109 1
> > >       NormalClassAttribute 156 78 2
> > >    NormalInstanceAttribute 125 63 1,984127
> > >        PythonFunctionCalls 188 78 2,410256
> > >          PythonMethodCalls 250 109 2,293578
> > >                  Recursion 250 94 2,659574
> > >               SecondImport 141 109 1,293578
> > >        SecondPackageImport 156 141 1,106383
> > >      SecondSubmoduleImport 234 187 1,251337
> > >    SimpleComplexArithmetic 110 16 6,875
> > >     SimpleDictManipulation 156 94 1,659574
> > >      SimpleFloatArithmetic 109 62 1,758065
> > >   SimpleIntFloatArithmetic 78 16 4,875
> > >    SimpleIntegerArithmetic 63 31 2,032258
> > >     SimpleListManipulation 62 31 2
> > >       SimpleLongArithmetic 157 188 0,835106
> > >                 SmallLists 343 141 2,432624
> > >                SmallTuples 250 125 2
> > >      SpecialClassAttribute 141 93 1,516129
> > >   SpecialInstanceAttribute 125 63 1,984127
> > >             StringMappings 328 125 2,624
> > >           StringPredicates 219 109 2,009174
> > >              StringSlicing 140 78 1,794872
> > >                  TryExcept 16 0
> > >             TryRaiseExcept 1641 500 3,282
> > >               TupleSlicing 172 94 1,829787
> > >            UnicodeMappings 156 110 1,418182
> > >          UnicodePredicates 219 78 2,807692
> > >             UnicodeSlicing 140 62 2,258065
> > >
> > >
> > > 10829 5578
> > >
> >
> > --
> > Egor Pasko
> >
> >
>



-- 
Mikhail Fursov
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message