On 06 Nov 2007 18:32:01 +0300, Egor Pasko wrote: > On the 0x386 day of Apache Harmony Aleksey Shipilev wrote: > > On 11/6/07, Tim Ellison wrote: > > > > It seems to me that it could be improved further if some magic > > > > implementing sqrt() will be used instead on native call. > > > > > > Looks like we have to go this path, since the hacked intrinsics are > > > still 4x slower if I'm reading this properly. > > You're reading right. We are still 4x slower than Sun 1.6.0_02 > > > > > > Moreover, AFAIU the (3) approach is safe since IEEE754 compatibility > > > > must be preserved only for strict mode, whereas (3) approach > > > > implements fastpath for non-strict mode. > > > > > > Yes, I modified my microbench to use both strict and non-strict in the > > > same run and there is a noticeable difference on Sun 6.0: > > > Math Result = 6.666661664588418E8 in 30ms > > > StrictMath Result = 6.666661664588418E8 in 1012ms > > > > That's weird :S Here's what I've got on this modified microtest: > > ======================================================== > > public class testSqrt { > > > > final static int count = 10000000; > > > > public static void main(String[] args) { > > > > // warm-up > > double result = 0.0d; > > for (long i = 0; i < 1024*1024*10; i++) { > > result += Math.sqrt((double) i); > > result += StrictMath.sqrt((double) i); > > } > > System.out.println("Warmup finished: " + result); > > > > long start; > > > > // Timed run > > result = 0.0d; > > start = System.currentTimeMillis(); > > for (int i = 0; i < count; i++) { > > result += Math.sqrt((double) i); > > } > > > > System.out.println("Math Result = " + result + " in " > > + (System.currentTimeMillis() - start) + "ms"); > > > > // Timed run > > result = 0.0d; > > start = System.currentTimeMillis(); > > for (int i = 0; i < count; i++) { > > result += StrictMath.sqrt((double) i); > > } > > > > System.out.println("StrictMath Result = " + result + " in " > > + (System.currentTimeMillis() - start) + "ms"); > > > > > > } > > > > } > > ======================================================== > > > > java version "1.5.0_10" > > Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_10-b03) > > Java HotSpot(TM) Client VM (build 1.5.0_10-b03, mixed mode, sharing) > > > > Warmup finished: 4.527292719905836E10 > > Math Result = 2.1081849486439312E10 in 140ms > > StrictMath Result = 2.1081849486439312E10 in 9235ms > > > > ---------------- > > > > java version "1.6.0" > > Java(TM) SE Runtime Environment (build 1.6.0-b105) > > Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing) > > > > Warmup finished: 4.527292719905836E10 > > Math Result = 2.1081849486439312E10 in 141ms > > StrictMath Result = 2.1081849486439312E10 in 141ms > > > > --------------- > > > > java version "1.6.0_02" > > Java(TM) SE Runtime Environment (build 1.6.0_02-b06) > > Java HotSpot(TM) Client VM (build 1.6.0_02-b06, mixed mode) > > > > Warmup finished: 4.527292719905836E10 > > Math Result = 2.1081849486439312E10 in 156ms > > StrictMath Result = 2.1081849486439312E10 in 140ms > > > > --------------- > > Harmony (clean) > > > > Apache Harmony Launcher : (c) Copyright 1991, 2006 The Apache Software > > Foundation or its licensors, as applicable. > > java version "1.5.0" > > pre-alpha : not complete or compatible > > svn = r589548, (Nov 6 2007), Windows/ia32/msvc 1310, release build > > http://harmony.apache.org > > > > Warmup finished: 4.527292719905836E10 > > Math Result = 2.1081849486439312E10 in 12078ms > > StrictMath Result = 2.1081849486439312E10 in 12016ms > > > > ---------------- > > Harmony (patched) > > > > Apache Harmony Launcher : (c) Copyright 1991, 2006 The Apache Software > > Foundation or its licensors, as applicable. > > java version "1.5.0" > > pre-alpha : not complete or compatible > > svn = r589548, (Nov 6 2007), Windows/ia32/msvc 1310, release build > > http://harmony.apache.org > > > > Warmup finished: 4.5272927199206406E10 > > Math Result = 2.1081849486508232E10 in 3406ms > > StrictMath Result = 2.1081849486439312E10 in 12058ms > > > > ---------------- > > You see, Sun 1.6.0 behaves fast even in strict mode. > > Alexey, thanks, very useful! > > my 2c is that both Math and StrictMath require "correctly rounded" (to > nearest) results for sqrt(). > > BTW, the _mm_sqrt_s intrinsic looks like invokes the hardware > implementation of sqrt which is khown to be slower than some modern > software implementations. Egor, could you please provide link to the best software implementation? AFAIK, with the provided patch we could receive good speedup for Dacapo.lusearch bench. If I remember correctly with _mm_sqrt_s intrinsic lusearch time decreased from ~2800 msec to 2100 msec. So, I think it could be great to use this implementation while we haven't good software implementation. Thanks. Vladimir > -- > Egor Pasko > >