harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Shipilev" <aleksey.shipi...@gmail.com>
Subject Re: [VM] On-demand class library parsing is ready to commit
Date Mon, 22 Dec 2008 07:21:04 GMT
Nathan:

Wenlong's definition is "startup time is the time required for VM to
initialize before entering Java main() method". This one is measured
by tapping into VM internals.

Aleksey's definition is "startup time is the time needed for VM to
fully execute main() method one first time". This one is measured by
standard Java benchmarks.

My impression (backed by Cachegrind/Callgrind profiles) is, there much
more time spent in initial compilation of Java bytecode than VM
initialization sequence. This is also demonstrated by running real
workload through startup scenario: even though there's the boost on
HWA in "freezing-cold cache conditions", it disappears from startup of
real workload.

The VM initalization sequence can be a bottleneck but it would come to
the effect when all other issues are resolved. IMO, the reduction of
0.1 msec startup time is neglible and does not worth messing the
bootclasspath.

Is there another advantages (not performance ones) for doing that? If
not, I'd rather postpone it.

I'm not saying Wenlong did bad job, he did great, but sometimes the
consistency gains overcome the performance, especially is the
performance boost is not general.

Thanks,
Aleksey.

On Mon, Dec 22, 2008 at 7:58 AM, Wenlong Li <wenlong@gmail.com> wrote:
> My startup means the computation needed before executing user's code
> (in main method) (see [1][2], while  Aleksey's opinion is the startup
> benchmark in SPECJVM2008.
> [1] http://www.oracle.com/technology/pub/articles/dev2arch/2004/01/jrockit.html
> [2] http://www.ibm.com/developerworks/java/library/os-ecspy1/
>
> On Mon, Dec 22, 2008 at 8:38 AM, Nathan Beyer <ndbeyer@apache.org> wrote:
>> Can someone give a quick summary of the two different definitions of
>> "startup" being discussed?
>>
>> -Nathan
>>
>> On Sun, Dec 21, 2008 at 6:22 PM, Wenlong Li <wenlong@gmail.com> wrote:
>>> Aleksey,
>>>
>>> Thx for testing this patch, and sharing your experimental result.
>>> Yes, I think your result would be reasonable. The performance gain of
>>> this patch varies with different systems.
>>>
>>> Again, I would like to say we have different definitions for "startup".
>>> Maybe I should move the change in classlib module to vm module, so
>>> that the dependency can be minimized.
>>>
>>> thx again for discussion. :)
>>> wenlong
>>>
>>> On Mon, Dec 22, 2008 at 4:04 AM, Aleksey Shipilev
>>> <aleksey.shipilev@gmail.com> wrote:
>>>> Hi Wenlong,
>>>>
>>>> I had some performance experiments with your patch. The test system is:
>>>>  - Pentium D 820 2.8 Ghz / 2 Gb DDR2-667
>>>>  - WD 3200KS, 320 Gb, 16 Mb cache
>>>>  - Gentoo Linux x86, 2.6.23
>>>>  - Harmony r728459
>>>>  - SPECjvm2008
>>>>
>>>> To recreate the stressful conditions over and over the simple script
>>>> was written [1]. The script invalidates the caches before actually
>>>> starting the workload: re-reads the same 64 Mb file a couple of times
>>>> to fill out on-HDD cache, invalidating VFS block caches first to make
>>>> sure the data is really requested from the disk.
>>>>
>>>> On HWA [2] these performance results were produced:
>>>>
>>>> "cold-start" (invalidate caches):
>>>> clean: (5.24 +- 0.28) secs
>>>> ondemand: (4.49 +- 0.17) secs
>>>>
>>>> "warm-start" (don't invalidate caches);
>>>> clean: (2.82 +- 0.01) secs
>>>> ondemand: (2.80 +- 0.02) secs
>>>>
>>>> That is, on-demand patch does bring +17% (-+9%) improvement on HWA
>>>> when running with flushed caches, and does not bring any performance
>>>> improvement in warm mode.
>>>>
>>>> As I mentioned several times, this test does not reflect the real
>>>> performance end user would perceive, so I took two SPECjvm2008:startup
>>>> benchmarks and run each of them 10x10 times.
>>>>
>>>> SPECjvm2008:startup.helloworld, "cold start":
>>>> clean: (8.93 +- 0.21) ops/min
>>>> ondemand: (9.04 +- 0.03) ops/min
>>>>
>>>> SPECjvm2008:startup.compiler.compiler, "cold start":
>>>> clean: (1.44 +- 0.05) ops/min
>>>> ondemand: (1.42 +- 0.04) ops/min
>>>>
>>>> As you can see even in very stressful situation there's no boost. I
>>>> would find these performance results unconvincing to change the
>>>> infrastructure of boolclasspath resolution. Am I missing something
>>>> important?
>>>>
>>>> Thanks,
>>>> Aleksey.
>>>>
>>>> [1] run.sh
>>>> #!/bin/bash
>>>>
>>>> R=`pwd`
>>>>
>>>> JAVA=$R/platforms/builds/harmony-release-clean/jdk/jre/bin/java
>>>> #JAVA=$R/platforms/builds/harmony-release-ondemand/jdk/jre/bin/java
>>>> JAVA_OPTS="-Xmx1024M -Xms1024M"
>>>>
>>>> for T in `seq 1 10`; do
>>>>
>>>>        echo "*************** EXECUTING ITERATION $T ****************"
>>>>
>>>>        # invalidate HDD caches
>>>>        #   - need to replace all entries in LRU HDD cache
>>>>        #   - flush the kernel VFS cache first to ensure the data
>>>> would be read from disk
>>>>
>>>>        echo "Flushing caches"
>>>>        for I in `seq 1 5`; do
>>>>                sync
>>>>                echo 3 > /proc/sys/vm/drop_caches
>>>>
>>>>                dd if=cachekiller.file of=/dev/null > /dev/null 2>&1
>>>>        done
>>>>
>>>>        echo "Executing."
>>>>
>>>>        # HelloWorld
>>>>        /usr/bin/time $JAVA $JAVA_OPTS -cp benchmarks/ HelloWorld 2>&1
>>>>
>>>>        # SPECjvm2008
>>>>        #cd $R/benchmarks/storage/SPECjvm2008
>>>>        #/usr/bin/time $JAVA $JAVA_OPTS -Djava.awt.headless=true -jar
>>>> SPECjvm2008.jar -ikv -i 10 startup.compiler.compiler 2>&1
>>>>
>>>>        echo ""
>>>> done
>>>>
>>>> [2] HelloWorld.java
>>>> public class HelloWorld {
>>>> public static void main(String[] args) {
>>>>        System.out.println("Hello, world!");
>>>> }
>>>> }
>>>>
>>>>
>>>> On Sun, Dec 21, 2008 at 6:02 AM, Wenlong Li <wenlong@gmail.com> wrote:
>>>>> On Sat, Dec 20, 2008 at 7:10 PM, Alexei Fedotov
>>>>> <alexei.fedotov@gmail.com> wrote:
>>>>>> Wenlong,
>>>>>> Thanks for removing the commented code.
>>>>>>
>>>>>> There are several VMs which make use of the Harmony class library,
>>>>>> e.g. Harmony VM, J9, Android Dalvik, etc. Your change is Harmony
VM
>>>>>> specific, isn't it? If it is, then it's better to keep related changes
>>>>>> in the VM module. If it is not, then it might be a good idea to keep
>>>>>> the changes in the class library module unless other VMs already
has
>>>>>> such optimization in their code.
>>>>> [Wenlong] Though at this moment, you can think on-demand class parsing
>>>>> is a specif optimization from your point of view. I believe it could
>>>>> be a general technique, e.g., it can be easily deployed in other
>>>>> runtime systems. Current VM also depends on the luniglobal.c in
>>>>> working_classlib to get all class libraries/modules. e.g., there is a
>>>>> cross-module dependence between classlib and VM. When user wants to
>>>>> add new module, they should manually change the
>>>>> bootclasspath.properties, while if applying this patch, user should
>>>>> revise my added property file instead of the bootclasspath.properties.
>>>>> I understand modifying bootclasspath file may be a specification.
>>>>>>
>>>>>> In any case crossing module boundary would make class library users
>>>>>> think more than once or even write some code. Is it technically
>>>>>> possible to prepare a patch which does not change module boundaries?
>>>>>> What do you think?
>>>>> [Wenlong] Yes, it is possible from technical perspective, but a little
>>>>> complicated. I can think about it. :)
>>>>>
>>>>>>
>>>>>> As for your performance experiments, which particular test are your
>>>>>> measuring? It is bootclasspath-unpretentious "Hello, world", isn't
it?
>>>>> [Wenlong] My startup means the work executed before running user's
>>>>> computation. That is, the vm creation time. I manually add
>>>>> instrumentation code for execution time in JNI_CreateJavaVM of
>>>>> JNI.cpp. This startup work is common for any benchmarks. My experiment
>>>>> was conducted on both Windows and Linux system. Please see my previous
>>>>> message about performance gain from this optimization.
>>>>>
>>>>> Thx,
>>>>> Wenlong
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> On Sat, Dec 20, 2008 at 2:19 AM, Wenlong Li <wenlong@gmail.com>
wrote:
>>>>>>> On Sat, Dec 20, 2008 at 12:42 AM, Alexei Fedotov
>>>>>>> <alexei.fedotov@gmail.com> wrote:
>>>>>>>> Wenlong,
>>>>>>>> Have I missed a discussion of the proposed design? I see
that you
>>>>>>>> expose a new public interface:
>>>>>>>>  /**
>>>>>>>>  * @map the jar with exported package in the pending jar
list for
>>>>>>>> on-demand jar parsing
>>>>>>>>  *   Key is the jar, and value is the package exported by
this jar
>>>>>>>>  */
>>>>>>>> DECLARE_OPEN(void, vm_properties_set_pending_jar, (const
char* key,
>>>>>>>> const char* value));
>>>>>>>>
>>>>>>>> Did you mean "Maps" instead of "@map"? Strangely the word
"pending"
>>>>>>>> disappeared from the name of the wrapping VMI interface
>>>>>>>> SetJarPackageMapping . Why should we extend both OPEN and
VMI
>>>>>>>> interfaces with the same function? Why did you put your code
into
>>>>>>>> working_classlib/modules/luni/src/main/native/luni/shared/luniglob.c,
>>>>>>>> thus introducing another dependency between VM and class
library?
>>>>>>> [Wenlong] The boot class path is defined in luniglobal.c in Harmony,
>>>>>>> and it also has dependence with VM. In my understanding, my patch
is
>>>>>>> related to boot class path determination, so I also put my code
in
>>>>>>> luniglobal.c, and use VMI interface to communicate with VM.
>>>>>>>
>>>>>>>>
>>>>>>>> +            //rcSetProperty = (*vmInterface)->SetJarPackageMapping
>>>>>>>> (vmInterface, jarName, jarValue);
>>>>>>>> +            /*
>>>>>>>> +            hymem_free_memory(jarName);
>>>>>>>> +            hymem_free_memory(jarValue);
>>>>>>>> +            */
>>>>>>>> Should we really commit the commented code?
>>>>>>>> Thanks.
>>>>>>>
>>>>>>> [Wenlong] Please see my latest version of patch in the list.
Such
>>>>>>> commented code has been removed.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Dec 19, 2008 at 6:59 PM, Tim Ellison <t.p.ellison@gmail.com>
wrote:
>>>>>>>>> I was hoping that somebody else would comment first,
so I don't have to
>>>>>>>>> be the grumpy one all the time :-)
>>>>>>>>>
>>>>>>>>> As I said before, this is good prototyping work...
>>>>>>>>>
>>>>>>>>> Wenlong Li wrote:
>>>>>>>>>> I did the pre-commit test on the patch of on-demand
class library
>>>>>>>>>> parsing (https://issues.apache.org/jira/browse/HARMONY-6039),
and it
>>>>>>>>>> works well now.
>>>>>>>>>> Can Harmony incorporate this feature?
>>>>>>>>>
>>>>>>>>> I'm not sure it is ready for committing to the head stream
yet.
>>>>>>>>>
>>>>>>>>>> Via on-demand class parsing, we can reduce startup
time from 20+
>>>>>>>>>> seconds to 3 seconds for cold runing, and 170 ms
to 140 ms for warm-up
>>>>>>>>>> running on Core 2 Duo with Windows.
>>>>>>>>>
>>>>>>>>> Can you tell me how to reproduce 20+sec cold start-up?
 I haven't seen
>>>>>>>>> anything like that in my simple tests.
>>>>>>>>>
>>>>>>>>>> After applying the patch, please note there is some
change to add new modules.
>>>>>>>>>> (1) If you want to add new modules/libraries, please
don't put them in
>>>>>>>>>> the bootclasspath.properties file. This file now
only saves modules
>>>>>>>>>> needed during startup (the VM startup only accesses
class libraries in
>>>>>>>>>> eight modules)
>>>>>>>>>
>>>>>>>>> That would break too much.  How about creating a new
file rather than
>>>>>>>>> re-purposing an existing file with different semantics?
 This file is
>>>>>>>>> used by Jikes, IBM VME, the Eclipse plug-in, at least.
>>>>>>>>>
>>>>>>>>>> (2) For new modules/libraries, please put them in
the
>>>>>>>>>> modulelibrarymapping.properties file. You should
specify the module
>>>>>>>>>> name and its exported class library. Here is one
example:
>>>>>>>>>> math.jar=java.math, where "math.jar" means the module
name, and
>>>>>>>>>> "java.math" means the class libraries this module
exports.
>>>>>>>>>
>>>>>>>>> As we discussed on another thread, its unclear if the
time is spent in
>>>>>>>>> following the slow indexing through the classpath/JAR
directories, or
>>>>>>>>> whether it is speed of loading bytes once we know what
we need.  I think
>>>>>>>>> that it is premature to abandon the JAR manifest data
as the principal
>>>>>>>>> source of metadata until we understand the problem this
solves.
>>>>>>>>>
>>>>>>>>> Can we measure where the time is spent in the current
implementation?
>>>>>>>>> I think it will help guide this approach to a better
solution.
>>>>>>>>> What tools do you recommend for profiling start-up?
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Tim
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> С уважением,
>>>>>>>> Алексей Федотов,
>>>>>>>> ЗАО «Телеком Экспресс»
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> С уважением,
>>>>>> Алексей Федотов,
>>>>>> ЗАО «Телеком Экспресс»
>>>>>>
>>>>>
>>>>
>>>
>>
>
Mime
View raw message