harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Kuksenko" <sergey.kukse...@gmail.com>
Subject Re: [performance] a few early benchmarks
Date Wed, 22 Nov 2006 14:01:50 GMT
Stefano,



On 11/21/06, Stefano Mazzocchi <stefano@apache.org> wrote:
>
> Sergey Kuksenko wrote:
> > Stefano,
> > Trying to get the potential of Harmony I've quickly checked SciMak on
> tuned
> > Harmony release build and compared it with BEA & SUN.
>
> > When I looked into Harmony OOB I've found that all hot methods of
> SciMark
> > are compiled by JET (not recompiled by the optimizing JIT compiler). The
> > way
> > our DRLVM currrently recognises hot path is not sutable for SciMark
> becasue
> > of short run.
>
> Hmmm, what do you mean by "short run"? the entire app runs for a short
> amount of time total or each hot method runs for a short amount of time
> not enough to have it recognized as "hot"?


SciMark performs default measurement in 2 seconds range (for each
subbenchmark).
Top subbenmarks methods were not called frequently:
FFT was called  ~8200 times.
SOR was called  ~ 15 times.
Monte Carlo was called  ~26 times.
Sparse matmult  was called  ~17 times.
LU was called  ~2000 times.

It is not enough to have it recognized as "hot" for current DRLVM.
For example.  Default measurement time of SciMark is 2 seconds.
This time may be set as parameter. I've set 30 seconds and got the following
results.

BEA(OOB):
Composite Score (default, 2 secs): 435
 Composite Score (30 secs): 438

 SUN(OOB):
Composite Score (default, 2 secs): 229
 Composite Score (30 secs): 225

 Harmony(OOB):
Composite Score (default, 2 secs): 109
 Composite Score (30 secs): 189

Thus the first problem here - is default Harmony configuration, we
should extend  recompilation and only after that we can look into real
performance.


> We need to tune DRLVM options to get better results.
> > Tuned options give good SciMark score improvement (109->181).
>
> Well, to be fair, all the other JVM could probably do the same.


Other VM already did it. :)
The first problem is recompilation.

> Which moves Harmony performance close to what Sun OOB shows.
>
> excuse my ignorance, but what's OOB? (google define says "out of
> business" or "order of battle"... not sure they apply here ;-)


"out of the box" - by default, no VM options.


> Our client (default) compilation path was tuned a long time ago and it
> > probably makes sense to have another round. What we initially did was
> > running some script executing the given set of workloads trying to find
> the
> > best configuration for our VM. Having said that I suggest we choose the
> > right set of applications/benchmarks, so we can start our tuning once
> > again.
>
> Maybe it's the analog microelectonic guy in me talking, but every time I
> hear something like "let's get reasonable defaults", I think of
> introducing a variation and a feedback to reach a local minimum and
> stabilize the system.
>
> I know very little about how DRLVM works, but would it be feasible to
> start with such "reasonable defaults" and introduce a random variability
> to the way the JIT works alongside a very simple method profiler and see
> if the performance increase? think of you trying out different things
> and see if they work better... but done by the JVM as it runs.


You are absolutely right.
But it will have a low impact to short benchmarks (like SciMark) because
profiler need time.
Also my current suggestion is change (improve) "starting point" that also
may help for further profiling.
I suggested to specify a set of workloads which looks like "reasonable"
and find a common set of DRLVM options. These options can't be optimal for
each separate application, but they will improve overall performance (for
selected set).
I think it is possible to get overall improvement because current default
DRLVM options were set a long time ago.


Keep in mind I'm a total newbie in virtual machine design (or CPU
> architectures for that matter, despite my degree in microelectronics..
> well, to be fair, I was doing analog not digital circuits) so bear with
> me if I'm saying stupid things :-)
>
> > Currently we have in mind the following list:
> > - HWA (Hello World Application)
> > - SciMark
> > - Dacapo (reasonable set of benchmarks, like fop, hsqldb, chart and
> xalan)
> > - Anything else?
> >
> > What do you think about this? Any additions to the list? Comments?
> > Questions?
>
> The problem I have with this is that I feel that each one of such
> scenario might require different tuning parameters... and if that is the
> case, you end up with the 'short blanket' problem: you improve here and
> you decrease there.


You are right.
It is impossible to find the best options for all applications.
But I am afraid that current DRLVM default options are obsolete and I hope
there is a place for overall improvement.


An 'adaptive' scenario, on the other hand, would allow us to:
>
> 1) avoid trying to find the optimal defaults (since we can't possible
> test every scenario that will be useful in a way that is consistent with
> real world usage)
>
> 2) avoid the blanket problem, each VM can adapt to the scenario of use
>
> 3) avoid the 'stiffness' problem, each VM can adapt to machine resource
> changes and 'retune' itself if the environment changes.
>
> Of course, there is a price to pay in such 'fedback variability' systems
> since they have to find the minima over and over again.
>
> So, another solution is to have a JVM "tuning parameters discovery mode"
> that you can run and you turn such "parameter finding" autoprofiling
> on... and the JVM dumps the tuning results for you on disk which you can
> later use to initialize the JVM on your own.
>
> Not sure how feasible or complicated to write this is, but wow does this
> sound on paper?


Yes. Adaptive tuning is a mandatory part of VM and it is more important then
my suggestion.
But it is not so easy as starting point changing.
Lets do the simplest thing fist. :)
We can do it. We only need to specify a set of workloads.

Best regards,
 ---
Sergey Kuksenko.
Intel Enterprise Solutions Software Division.


> Thanks,
> > ---
> > Sergey Kuksenko
> > Intel Enterprise Solutions Software Division
> >
> >
> > On 11/17/06, Stefano Mazzocchi <stefano@apache.org> wrote:
> >>
> >> Alexey Varlamov wrote:
> >> > Stefano,
> >> >
> >> > It is a bit unfair to compare *debug* build of Harmony with other
> >> > release versions :)
> >>
> >> I'm simulating what a journalist with a developer could do.
> >>
> >> If there is a way to make it compile in 'release mode' (if such a thing
> >> exists), I'll be very glad to redo the benchmarks.
> >>
> >> > I suppose all VMs where run in default mode (i.e. no special cmd-line
> >> > switches)?
> >>
> >> Right. No switches. I'm simulating what users do when they get the JVM:
> >> they run "java"... and if it's now fast enough they buy a new box.
> >>
> >> Having command line tuning parameters is mostly useless since most
> >> people don't know the internals of a JVM well enough to guess what
> >> parameters to tune anyway.
> >>
> >> So, what people will do once they get an harmony snapshot is "java
> >> my.class.Name <http://my.class.name/>" and see the results.
> >>
> >> I want to simulate that and compare it to the same exact experience
> they
> >> will get with other virtual machines for a variety of common scenarios
> >> (number crunching, xml processing, http serving, database load, etc...)
> >>
> >> I will focus on the server because that's there the apache action (and
> >> my personal interest) is.
> >>
> >> So, like I said, if there are 'compile time' switches that I can use to
> >> turn 'release mode' on, please tell me and I'll re-do the tests.
> >>
> >> > 2006/11/17, Stefano Mazzocchi <stefano@apache.org>:
> >> >> There are lies, damn lies and benchmarks.... which don't really tell
> >> you
> >> >> if an implementation of a program is *faster* but at least it tells
> >> you
> >> >> where you're at.
> >> >>
> >> >> So, as Geir managed to get the DSO linking problem go away in DRLVM,
> I
> >> >> was able to start running some benchmarks.
> >> >>
> >> >> The machine is the following:
> >> >>
> >> >> Linux harmony-em64t 2.6.15-27-amd64-generic #1 SMP PREEMPT Sat Sep
> 16
> >> >> 01:50:50 UTC 2006 x86_64 GNU/Linux
> >> >>
> >> >> dual Intel(R) Pentium(R) D CPU 3.20GHz
> >> >> bogomips 6410.31 (per CPU)
> >> >>
> >> >> There is nothing else running on the machine (load is 0.04 at the
> time
> >> >> of testing).
> >> >>
> >> >> The various virtual machines tested are:
> >> >>
> >> >> harmony
> >> >> -------
> >> >> Apache Harmony Launcher : (c) Copyright 1991, 2006 The Apache
> Software
> >> >> Foundation or its licensors, as applicable.
> >> >> java version " 1.5.0"
> >> >> pre-alpha : not complete or compatible
> >> >> svn = r476006, (Nov 16 2006), Linux/em64t/gcc 4.0.3, debug build
> >> >>
> >> >> sun5
> >> >> ---
> >> >> java version "1.5.0_09 "
> >> >> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_09-b03
> )
> >> >> Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_09-b03, mixed mode)
> >> >>
> >> >> sun6
> >> >> ----
> >> >> java version " 1.6.0-rc"
> >> >> Java(TM) SE Runtime Environment (build 1.6.0-rc-b104)
> >> >> Java HotSpot(TM) 64-Bit Server VM (build 1.6.0-rc-b104, mixed mode)
> >> >>
> >> >> ibm
> >> >> ---
> >> >> java version " 1.5.0"
> >> >> Java(TM) 2 Runtime Environment, Standard Edition (build
> >> >> pxa64dev-20061002a (SR3) )
> >> >> IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux amd64-64
> >> >> j9vmxa6423-20061001 (JIT enabled)
> >> >> J9VM - 20060915_08260_LHdSMr
> >> >> JIT  - 20060908_1811_r8
> >> >> GC   - 20060906_AA)
> >> >> JCL  - 20061002
> >> >>
> >> >> bea
> >> >> ---
> >> >> java version "1.5.0_06 "
> >> >> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05
> )
> >> >> BEA JRockit(R) (build
> >> >> R26.4.0-63-63688-1.5.0_06-20060626-2259-linux-x86_64, )
> >> >>
> >> >>
> >> >>
> >>
> --------------------------------------------------------------------------
> >>
> >> >>
> >> >>
> >> >> Test #1: java scimark2 (http://math.nist.gov/scimark2/)
> >> >>
> >> >> command: java jnt.scimark2.commandline
> >> >>
> >> >> NOTE: bigger number is better
> >> >>
> >> >> Sun6
> >> >> Composite Score: 364.5832265230057
> >> >> FFT (1024): 220.8458713892794
> >> >> SOR (100x100):   696.1542342357722
> >> >> Monte Carlo : 149.37978088875656
> >> >> Sparse matmult (N=1000, nz=5000): 326.37451873283845
> >> >> LU (100x100): 430.1617273683819
> >> >>
> >> >> BEA
> >> >> Composite Score: 359.13480378697835
> >> >> FFT (1024): 303.8746880751562
> >> >> SOR (100x100):   454.25628897202307
> >> >> Monte Carlo : 93.23913192138497
> >> >> Sparse matmult (N=1000, nz=5000): 530.44112637391
> >> >> LU (100x100): 413.8627835924175
> >> >>
> >> >> Sun5
> >> >> Composite Score: 332.84987587548574
> >> >> FFT (1024): 216.5144595799027
> >> >> SOR (100x100):   689.429322146947
> >> >> Monte Carlo : 25.791262124978065
> >> >> Sparse matmult (N=1000, nz=5000): 317.5193965699373
> >> >> LU (100x100): 414.99493895566377
> >> >>
> >> >> IBM
> >> >> Composite Score: 259.8249218693683
> >> >> FFT (1024): 296.8415012789055
> >> >> SOR (100x100):   428.974881649179
> >> >> Monte Carlo : 89.15159857584082
> >> >> Sparse matmult (N=1000, nz=5000): 144.3524241203982
> >> >> LU (100x100): 339.8042037225181
> >> >>
> >> >> Harmony
> >> >> Composite Score: 113.65082278962575
> >> >> FFT (1024): 203.76641991778123
> >> >> SOR (100x100):   224.37761309236748
> >> >> Monte Carlo : 9.063866256533116
> >> >> Sparse matmult (N=1000, nz=5000): 65.4051866327227
> >> >> LU (100x100): 65.6410280487242
> >> >>
> >> >> In this test harmony is clearly lagging behind... at about 30%
> >> >> performance of the best JVM, it's a little crappy. Please note how
> >> FFT's
> >> >> performance is not so bad awhile monte carlo is pretty bad compared
> to
> >> >> BEA or IBM.
> >> >>
> >> >> Overall, it seems like there is some serious work to do here to
> catch
> >> up.
> >> >>
> >> >>
> >>
> --------------------------------------------------------------------------
> >>
> >> >>
> >> >>
> >> >> Test 2: Dhrystones
> >> (http://www.c-creators.co.jp/okayan/DhrystoneApplet/
> >> )
> >> >>
> >> >> command: java dhry 100000000
> >> >>
> >> >> NOTE: bigger is better
> >> >>
> >> >> NB: I modified the code to accept the count at input from the
> command
> >> >> line!
> >> >>
> >> >> sun6:     8552856 dhrystones/sec
> >> >> sun5:     6605892
> >> >> bea:      5678914
> >> >> harmony:   669734
> >> >> ibm:       501562
> >> >>
> >> >> The performance here is horrific but what's surprising is that J9 is
> >> >> even worse. No idea what's going on but it seems like something is
> not
> >> >> working as it should (in both harmony and J9)
> >> >>
> >> >>
> >>
> --------------------------------------------------------------------------
> >>
> >> >>
> >> >>
> >> >> Test 3: Sieve (part of http://www.sax.de/~adlibit/tya18.tgz)
> >> >>
> >> >> command: java Sieve 30
> >> >>
> >> >> NB: I modified the test to run for a configurable amount of seconds.
> >> >>
> >> >> sun6     8545 sieves/sec
> >> >> sun5     8364
> >> >> bea      6174
> >> >> harmony  1836
> >> >> ibm       225
> >> >>
> >> >> IBM J9 clearly has something wrong on x86_64 but harmony is clearly
> >> >> lagging behind.
> >> >>
> >> >> Stay tuned for more tests.
> >> >>
> >> >> --
> >> >> Stefano.
> >> >>
> >> >>
> >>
> >>
> >> --
> >> Stefano.
> >>
> >>
> >
>
>
> --
> Stefano.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message