harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [performance] a few early benchmarks
Date Tue, 21 Nov 2006 17:00:18 GMT
Sergey Kuksenko wrote:
> Stefano,
> Trying to get the potential of Harmony I've quickly checked SciMak on tuned
> Harmony release build and compared it with BEA & SUN.

Sergey,

many thanks for doing this.

> Hardware: P4 Xeon 3GHz
> Windows XP SP2 (It's another platform, but I hope the key things are still
> the same).
> 
> BEA -
> java version "1.5.0_06"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
> BEA JRockit(R) (build R26.3.0-32-58710-1.5.0_06-20060308-2022-win-ia32, )
> 
> SUN -
> java version "1.5.0_06"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
> Java HotSpot(TM) Client VM (build 1.5.0_06-b05, mixed mode)
> 
> 
> Harmony -
> Apache Harmony Launcher : (c) Copyright 1991, 2006 The Apache Software
> Foundation or its licensors, as applicable.
> java version "1.5.0"
> pre-alpha : not complete or compatible
> svn = 475925, (Nov 17 2006), Windows/ia32/msvc 1310, release build
> 
> I've got the following results
> 
> BEA (out of the box):
> 
> Composite Score: 435.9674695335291
> FFT (1024): 295.33366058958575
> SOR (100x100):   474.15229982839213
> Monte Carlo : 111.56918839504195
> Sparse matmult (N=1000, nz=5000): 551.8821052631578
> LU (100x100): 746.9000935914679
> ----
> 
> Sun (out of the box):
> 
> Composite Score: 229.70779446543412
> FFT (1024): 104.92303791891565
> SOR (100x100):   400.44785722405015
> Monte Carlo : 13.257380552894444
> Sparse matmult (N=1000, nz=5000): 160.07814989061512
> LU (100x100): 469.8325467406951
> 
> ---
> 
> Harmony (out of the box):
> 
> Composite Score: 109.43208528481887
> FFT (1024): 51.30119529411764
> SOR (100x100):   257.9591618631154
> Monte Carlo : 17.04568642272773
> Sparse matmult (N=1000, nz=5000): 129.4666069618598
> LU (100x100): 91.38777588227376
> ----
> 
> Harmony (tuned options, server path):
> 
> Composite Score: 181.54555681031619
> FFT (1024): 91.22597999162443
> SOR (100x100):   329.8450882375011
> Monte Carlo : 42.51432538579417
> Sparse matmult (N=1000, nz=5000): 260.58050602943024
> LU (100x100): 183.56188440723088

that's pretty good.

> ------
> 
> When I looked into Harmony OOB I've found that all hot methods of SciMark
> are compiled by JET (not recompiled by the optimizing JIT compiler). The
> way
> our DRLVM currrently recognises hot path is not sutable for SciMark becasue
> of short run. 

Hmmm, what do you mean by "short run"? the entire app runs for a short
amount of time total or each hot method runs for a short amount of time
not enough to have it recognized as "hot"?

> We need to tune DRLVM options to get better results.
> Tuned options give good SciMark score improvement (109->181).

Well, to be fair, all the other JVM could probably do the same.

> Which moves Harmony performance close to what Sun OOB shows.

excuse my ignorance, but what's OOB? (google define says "out of
business" or "order of battle"... not sure they apply here ;-)

> Our client (default) compilation path was tuned a long time ago and it
> probably makes sense to have another round. What we initially did was
> running some script executing the given set of workloads trying to find the
> best configuration for our VM. Having said that I suggest we choose the
> right set of applications/benchmarks, so we can start our tuning once
> again.

Maybe it's the analog microelectonic guy in me talking, but every time I
hear something like "let's get reasonable defaults", I think of
introducing a variation and a feedback to reach a local minimum and
stabilize the system.

I know very little about how DRLVM works, but would it be feasible to
start with such "reasonable defaults" and introduce a random variability
to the way the JIT works alongside a very simple method profiler and see
if the performance increase? think of you trying out different things
and see if they work better... but done by the JVM as it runs.

Keep in mind I'm a total newbie in virtual machine design (or CPU
architectures for that matter, despite my degree in microelectronics..
well, to be fair, I was doing analog not digital circuits) so bear with
me if I'm saying stupid things :-)

> Currently we have in mind the following list:
> - HWA (Hello World Application)
> - SciMark
> - Dacapo (reasonable set of benchmarks, like fop, hsqldb, chart and xalan)
> - Anything else?
> 
> What do you think about this? Any additions to the list? Comments?
> Questions?

The problem I have with this is that I feel that each one of such
scenario might require different tuning parameters... and if that is the
case, you end up with the 'short blanket' problem: you improve here and
you decrease there.

An 'adaptive' scenario, on the other hand, would allow us to:

 1) avoid trying to find the optimal defaults (since we can't possible
test every scenario that will be useful in a way that is consistent with
real world usage)

 2) avoid the blanket problem, each VM can adapt to the scenario of use

 3) avoid the 'stiffness' problem, each VM can adapt to machine resource
changes and 'retune' itself if the environment changes.

Of course, there is a price to pay in such 'fedback variability' systems
since they have to find the minima over and over again.

So, another solution is to have a JVM "tuning parameters discovery mode"
that you can run and you turn such "parameter finding" autoprofiling
on... and the JVM dumps the tuning results for you on disk which you can
later use to initialize the JVM on your own.

Not sure how feasible or complicated to write this is, but wow does this
sound on paper?


> Thanks,
> ---
> Sergey Kuksenko
> Intel Enterprise Solutions Software Division
> 
> 
> On 11/17/06, Stefano Mazzocchi <stefano@apache.org> wrote:
>>
>> Alexey Varlamov wrote:
>> > Stefano,
>> >
>> > It is a bit unfair to compare *debug* build of Harmony with other
>> > release versions :)
>>
>> I'm simulating what a journalist with a developer could do.
>>
>> If there is a way to make it compile in 'release mode' (if such a thing
>> exists), I'll be very glad to redo the benchmarks.
>>
>> > I suppose all VMs where run in default mode (i.e. no special cmd-line
>> > switches)?
>>
>> Right. No switches. I'm simulating what users do when they get the JVM:
>> they run "java"... and if it's now fast enough they buy a new box.
>>
>> Having command line tuning parameters is mostly useless since most
>> people don't know the internals of a JVM well enough to guess what
>> parameters to tune anyway.
>>
>> So, what people will do once they get an harmony snapshot is "java
>> my.class.Name <http://my.class.name/>" and see the results.
>>
>> I want to simulate that and compare it to the same exact experience they
>> will get with other virtual machines for a variety of common scenarios
>> (number crunching, xml processing, http serving, database load, etc...)
>>
>> I will focus on the server because that's there the apache action (and
>> my personal interest) is.
>>
>> So, like I said, if there are 'compile time' switches that I can use to
>> turn 'release mode' on, please tell me and I'll re-do the tests.
>>
>> > 2006/11/17, Stefano Mazzocchi <stefano@apache.org>:
>> >> There are lies, damn lies and benchmarks.... which don't really tell
>> you
>> >> if an implementation of a program is *faster* but at least it tells
>> you
>> >> where you're at.
>> >>
>> >> So, as Geir managed to get the DSO linking problem go away in DRLVM, I
>> >> was able to start running some benchmarks.
>> >>
>> >> The machine is the following:
>> >>
>> >> Linux harmony-em64t 2.6.15-27-amd64-generic #1 SMP PREEMPT Sat Sep 16
>> >> 01:50:50 UTC 2006 x86_64 GNU/Linux
>> >>
>> >> dual Intel(R) Pentium(R) D CPU 3.20GHz
>> >> bogomips 6410.31 (per CPU)
>> >>
>> >> There is nothing else running on the machine (load is 0.04 at the time
>> >> of testing).
>> >>
>> >> The various virtual machines tested are:
>> >>
>> >> harmony
>> >> -------
>> >> Apache Harmony Launcher : (c) Copyright 1991, 2006 The Apache Software
>> >> Foundation or its licensors, as applicable.
>> >> java version " 1.5.0"
>> >> pre-alpha : not complete or compatible
>> >> svn = r476006, (Nov 16 2006), Linux/em64t/gcc 4.0.3, debug build
>> >>
>> >> sun5
>> >> ---
>> >> java version "1.5.0_09 "
>> >> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_09-b03)
>> >> Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_09-b03, mixed mode)
>> >>
>> >> sun6
>> >> ----
>> >> java version " 1.6.0-rc"
>> >> Java(TM) SE Runtime Environment (build 1.6.0-rc-b104)
>> >> Java HotSpot(TM) 64-Bit Server VM (build 1.6.0-rc-b104, mixed mode)
>> >>
>> >> ibm
>> >> ---
>> >> java version " 1.5.0"
>> >> Java(TM) 2 Runtime Environment, Standard Edition (build
>> >> pxa64dev-20061002a (SR3) )
>> >> IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux amd64-64
>> >> j9vmxa6423-20061001 (JIT enabled)
>> >> J9VM - 20060915_08260_LHdSMr
>> >> JIT  - 20060908_1811_r8
>> >> GC   - 20060906_AA)
>> >> JCL  - 20061002
>> >>
>> >> bea
>> >> ---
>> >> java version "1.5.0_06 "
>> >> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
>> >> BEA JRockit(R) (build
>> >> R26.4.0-63-63688-1.5.0_06-20060626-2259-linux-x86_64, )
>> >>
>> >>
>> >>
>> --------------------------------------------------------------------------
>>
>> >>
>> >>
>> >> Test #1: java scimark2 (http://math.nist.gov/scimark2/)
>> >>
>> >> command: java jnt.scimark2.commandline
>> >>
>> >> NOTE: bigger number is better
>> >>
>> >> Sun6
>> >> Composite Score: 364.5832265230057
>> >> FFT (1024): 220.8458713892794
>> >> SOR (100x100):   696.1542342357722
>> >> Monte Carlo : 149.37978088875656
>> >> Sparse matmult (N=1000, nz=5000): 326.37451873283845
>> >> LU (100x100): 430.1617273683819
>> >>
>> >> BEA
>> >> Composite Score: 359.13480378697835
>> >> FFT (1024): 303.8746880751562
>> >> SOR (100x100):   454.25628897202307
>> >> Monte Carlo : 93.23913192138497
>> >> Sparse matmult (N=1000, nz=5000): 530.44112637391
>> >> LU (100x100): 413.8627835924175
>> >>
>> >> Sun5
>> >> Composite Score: 332.84987587548574
>> >> FFT (1024): 216.5144595799027
>> >> SOR (100x100):   689.429322146947
>> >> Monte Carlo : 25.791262124978065
>> >> Sparse matmult (N=1000, nz=5000): 317.5193965699373
>> >> LU (100x100): 414.99493895566377
>> >>
>> >> IBM
>> >> Composite Score: 259.8249218693683
>> >> FFT (1024): 296.8415012789055
>> >> SOR (100x100):   428.974881649179
>> >> Monte Carlo : 89.15159857584082
>> >> Sparse matmult (N=1000, nz=5000): 144.3524241203982
>> >> LU (100x100): 339.8042037225181
>> >>
>> >> Harmony
>> >> Composite Score: 113.65082278962575
>> >> FFT (1024): 203.76641991778123
>> >> SOR (100x100):   224.37761309236748
>> >> Monte Carlo : 9.063866256533116
>> >> Sparse matmult (N=1000, nz=5000): 65.4051866327227
>> >> LU (100x100): 65.6410280487242
>> >>
>> >> In this test harmony is clearly lagging behind... at about 30%
>> >> performance of the best JVM, it's a little crappy. Please note how
>> FFT's
>> >> performance is not so bad awhile monte carlo is pretty bad compared to
>> >> BEA or IBM.
>> >>
>> >> Overall, it seems like there is some serious work to do here to catch
>> up.
>> >>
>> >>
>> --------------------------------------------------------------------------
>>
>> >>
>> >>
>> >> Test 2: Dhrystones
>> (http://www.c-creators.co.jp/okayan/DhrystoneApplet/
>> )
>> >>
>> >> command: java dhry 100000000
>> >>
>> >> NOTE: bigger is better
>> >>
>> >> NB: I modified the code to accept the count at input from the command
>> >> line!
>> >>
>> >> sun6:     8552856 dhrystones/sec
>> >> sun5:     6605892
>> >> bea:      5678914
>> >> harmony:   669734
>> >> ibm:       501562
>> >>
>> >> The performance here is horrific but what's surprising is that J9 is
>> >> even worse. No idea what's going on but it seems like something is not
>> >> working as it should (in both harmony and J9)
>> >>
>> >>
>> --------------------------------------------------------------------------
>>
>> >>
>> >>
>> >> Test 3: Sieve (part of http://www.sax.de/~adlibit/tya18.tgz)
>> >>
>> >> command: java Sieve 30
>> >>
>> >> NB: I modified the test to run for a configurable amount of seconds.
>> >>
>> >> sun6     8545 sieves/sec
>> >> sun5     8364
>> >> bea      6174
>> >> harmony  1836
>> >> ibm       225
>> >>
>> >> IBM J9 clearly has something wrong on x86_64 but harmony is clearly
>> >> lagging behind.
>> >>
>> >> Stay tuned for more tests.
>> >>
>> >> --
>> >> Stefano.
>> >>
>> >>
>>
>>
>> -- 
>> Stefano.
>>
>>
> 


-- 
Stefano.


Mime
View raw message