harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Egor Pasko <egor.pa...@gmail.com>
Subject Re: [perf] Comparative benchmarking
Date Tue, 06 Nov 2007 21:27:25 GMT
On the 0x386 day of Apache Harmony Vladimir Strigun wrote:
> On 06 Nov 2007 18:32:01 +0300, Egor Pasko <egor.pasko@gmail.com> wrote:
> > On the 0x386 day of Apache Harmony Aleksey Shipilev wrote:
> > > On 11/6/07, Tim Ellison <t.p.ellison@gmail.com> wrote:
> > > > > It seems to me that it could be improved further if some magic
> > > > > implementing sqrt() will be used instead on native call.
> > > >
> > > > Looks like we have to go this path, since the hacked intrinsics are
> > > > still 4x slower if I'm reading this properly.
> > > You're reading right. We are still 4x slower than Sun 1.6.0_02
> > >
> > > > > Moreover, AFAIU the (3) approach is safe since IEEE754 compatibility
> > > > > must be preserved only for strict mode, whereas (3) approach
> > > > > implements fastpath for non-strict mode.
> > > >
> > > > Yes, I modified my microbench to use both strict and non-strict in the
> > > > same run and there is a noticeable difference on Sun 6.0:
> > > >       Math Result = 6.666661664588418E8 in 30ms
> > > > StrictMath Result = 6.666661664588418E8 in 1012ms
> > >
> > > That's weird :S Here's what I've got on this modified microtest:
> > > ========================================================
> > > public class testSqrt {
> > >
> > > final static int count = 10000000;
> > >
> > > public static void main(String[] args) {
> > >
> > >    // warm-up
> > >    double result = 0.0d;
> > >    for (long i = 0; i < 1024*1024*10; i++) {
> > >        result += Math.sqrt((double) i);
> > >        result += StrictMath.sqrt((double) i);
> > >    }
> > >    System.out.println("Warmup finished: " + result);
> > >
> > >    long start;
> > >
> > >    // Timed run
> > >    result = 0.0d;
> > >    start = System.currentTimeMillis();
> > >    for (int i = 0; i < count; i++) {
> > >        result += Math.sqrt((double) i);
> > >    }
> > >
> > >    System.out.println("Math Result = " + result + " in "
> > >               + (System.currentTimeMillis() - start) + "ms");
> > >
> > >    // Timed run
> > >    result = 0.0d;
> > >    start = System.currentTimeMillis();
> > >    for (int i = 0; i < count; i++) {
> > >        result += StrictMath.sqrt((double) i);
> > >    }
> > >
> > >    System.out.println("StrictMath Result = " + result + " in "
> > >               + (System.currentTimeMillis() - start) + "ms");
> > >
> > >
> > > }
> > >
> > > }
> > > ========================================================
> > >
> > > java version "1.5.0_10"
> > > Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_10-b03)
> > > Java HotSpot(TM) Client VM (build 1.5.0_10-b03, mixed mode, sharing)
> > >
> > > Warmup finished: 4.527292719905836E10
> > > Math Result = 2.1081849486439312E10 in 140ms
> > > StrictMath Result = 2.1081849486439312E10 in 9235ms
> > >
> > > ----------------
> > >
> > > java version "1.6.0"
> > > Java(TM) SE Runtime Environment (build 1.6.0-b105)
> > > Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)
> > >
> > > Warmup finished: 4.527292719905836E10
> > > Math Result = 2.1081849486439312E10 in 141ms
> > > StrictMath Result = 2.1081849486439312E10 in 141ms
> > >
> > > ---------------
> > >
> > > java version "1.6.0_02"
> > > Java(TM) SE Runtime Environment (build 1.6.0_02-b06)
> > > Java HotSpot(TM) Client VM (build 1.6.0_02-b06, mixed mode)
> > >
> > > Warmup finished: 4.527292719905836E10
> > > Math Result = 2.1081849486439312E10 in 156ms
> > > StrictMath Result = 2.1081849486439312E10 in 140ms
> > >
> > > ---------------
> > > Harmony (clean)
> > >
> > > Apache Harmony Launcher : (c) Copyright 1991, 2006 The Apache Software
> > > Foundation or its licensors, as applicable.
> > > java version "1.5.0"
> > > pre-alpha : not complete or compatible
> > > svn = r589548, (Nov  6 2007), Windows/ia32/msvc 1310, release build
> > > http://harmony.apache.org
> > >
> > > Warmup finished: 4.527292719905836E10
> > > Math Result = 2.1081849486439312E10 in 12078ms
> > > StrictMath Result = 2.1081849486439312E10 in 12016ms
> > >
> > > ----------------
> > > Harmony (patched)
> > >
> > > Apache Harmony Launcher : (c) Copyright 1991, 2006 The Apache Software
> > > Foundation or its licensors, as applicable.
> > > java version "1.5.0"
> > > pre-alpha : not complete or compatible
> > > svn = r589548, (Nov  6 2007), Windows/ia32/msvc 1310, release build
> > > http://harmony.apache.org
> > >
> > > Warmup finished: 4.5272927199206406E10
> > > Math Result = 2.1081849486508232E10 in 3406ms
> > > StrictMath Result = 2.1081849486439312E10 in 12058ms
> > >
> > > ----------------
> > > You see, Sun 1.6.0 behaves fast even in strict mode.
> >
> > Alexey, thanks, very useful!
> >
> > my 2c is that both Math and StrictMath require "correctly rounded" (to
> > nearest) results for sqrt().
> >
> > BTW, the _mm_sqrt_s intrinsic looks like invokes the hardware
> > implementation of sqrt which is khown to be slower than some modern
> > software implementations.
> 
> Egor,
> 
> could you please provide link to the best software implementation?
> AFAIK, with the provided patch we could receive good speedup for
> Dacapo.lusearch bench. If I remember correctly with _mm_sqrt_s
> intrinsic lusearch time decreased from ~2800 msec to 2100 msec. So, I
> think it could be great to use this implementation while we haven't
> good software implementation.

Vladimir, guess what? :) I actually mixed several things
altogether. 

[1] discusses the Itanium 2 implementation that takes 28 cycles to
compute the single precision sqrt(). The paper is interesting, but it
would take a long time to make a good implementation and no speedup
guaranteed on x86 :)

Another memory that helped me to mix things up is the fact that
trigonometric functions work slower than tricky (less accurate)
software implementations in Intel MKL (but trig functions do not have
IEEE754 constraint).

So, we are left with SSE asm that can be inlined by JIT and AFAI can
see it is not as fast as HotSpot? Weird :)

[1] Software implementations of division and square root operations
    for IntelĀ® ItaniumĀ® processors. Proceedings of the 2004 workshop
    on Computer architecture education: held in conjunction with the
    31st International Symposium on Computer Architecture

-- 
Egor Pasko


Mime
View raw message