Return-Path: Delivered-To: apmail-tomcat-dev-archive@www.apache.org Received: (qmail 52213 invoked from network); 25 Feb 2006 22:37:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 25 Feb 2006 22:37:30 -0000 Received: (qmail 96099 invoked by uid 500); 25 Feb 2006 22:37:26 -0000 Delivered-To: apmail-tomcat-dev-archive@tomcat.apache.org Received: (qmail 96075 invoked by uid 500); 25 Feb 2006 22:37:25 -0000 Mailing-List: contact dev-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Developers List" Delivered-To: mailing list dev@tomcat.apache.org Received: (qmail 96064 invoked by uid 99); 25 Feb 2006 22:37:25 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Feb 2006 14:37:25 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [195.227.30.246] (HELO datura.kippdata.de) (195.227.30.246) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Feb 2006 14:37:24 -0800 Received: from [192.168.2.109] ([192.168.2.109]) by datura.kippdata.de (8.13.5/8.13.5) with ESMTP id k1PMb0Iu028559 for ; Sat, 25 Feb 2006 23:37:01 +0100 (CET) Message-ID: <4400DC1C.7010407@kippdata.de> Date: Sat, 25 Feb 2006 23:37:16 +0100 From: Rainer Jung User-Agent: Thunderbird 1.5 (Windows/20051201) MIME-Version: 1.0 To: Tomcat Developers List Subject: Re: DTrace analysis of TC-Cluster Multicast References: <20060225161033.35923.qmail@web33804.mail.mud.yahoo.com> <4400BBB5.5090304@kippdata.de> <4400D4B9.8060202@hanik.com> In-Reply-To: <4400D4B9.8060202@hanik.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N I agree, heartbeat is not that important w.r.t. performance. I used it as a simple test for how DTrace will be able to help. And at least in that nearly trivial test case the results are already interesting. Another thing that DTrace can check is lock contention. So there's more to come :) Do you have a working build of groupcom? I could trace some of the group communication. I hope you will not sacrifice synchronization correctness for performance ... Rainer Filip Hanik - Dev Lists wrote: > Cool Rainer, interesting numbers. Can you run the same numbers on the > new module groupcom. > I have done some optimizations there, for example, the name isn't > broadcasted. And I am about to remove the domain, instead of having a > default one. also, I agree, lets not translate the 4byte IP and the > 4byte port when receiving it. instead, member comparisons can be done on > the byte array level, excellent suggestion. > > After that I won't be spending to much focus on this issue, as a > heartbeat do not really have to be optimized. The heartbeat runs *very* > infrequently and it not a critical piece. > > Instead I am gonna start reworking the data senders, there are way too > many synchronized calls locking down the sockets, the code is somewhat > cluttered then I am gonna move on to serialization. In the current > cluster module, too much gets serialized, in the new module "ha" I have > already removed a lot of serialization and will continue stripping it out. > > Filip > > Rainer Jung wrote: >> Hi, >> >> I started to trace TC cluster with DTrace. I did a first simple exam for >> the mcast subcomplex. CPU usage and especially elapsed times might be >> slightly larger than expected, because I used the more intrusive >> extended DTrace probes. >> >> All tests were done with JDK 1.6.0 build 72 on Solaris 10 SPARC, using a >> T2000 system (8 cores with 4 Threads each). >> >> Method allocation elapsed cpu >> mcast/McastServiceImpl.send() 1328B 40ms 6.7ms >> mcast/McastServiceImpl.receive() 744-760B 23-510ms 3.5ms >> >> So this does not look very bad, but I inspected, where cpu time is used >> and objects get allocated: >> >> 1) send cpu time, elapsed time and allocation, per send >> >> cpu elapsed alloc >> 6.7ms 40.3ms 1328B McastServiceImpl.send() >> >> 6.3ms 37.6ms 1296B McastMember.getData() >> >> 1.3ms 7.6ms 136B ... getName().getBytes(); >> 0.9ms 5.3ms 120B ... getDomain().getBytes(); >> 3.6ms 21.2ms 880B ... java.net.InetAddress.getByName(host)... >> >> It's interesting, that the last three statements are responsible for 85% >> of the allocation and 90% of the cpu usage. This seems unnecessary, >> because for sending multicast heartbeat, name, domain and host do not >> change, so they could be buffered easily. >> >> 2) receive cpu time, elapsed time and allocation, per receive >> >> cpu elapsed alloc >> 3.5ms 23-510ms 744-760B McastServiceImpl.receive() >> >> 0.1ms 3-485ms 0B socket.receive() >> 2.8ms 17.9ms 640B McastMember.getMember() >> >> 1.4ms 8.1ms 160B new String(named), >> 1.0ms 5.6ms 128B new String(domaind), >> 0.1ms 0.6ms 112B addressToString(addr), >> >> 0.3ms 2.7ms 16B ... membership.memberAlive(m) [Only 50%] >> 0.2ms 2.3ms 32B ... membership.expire(timeToExpiration); >> ... >> >> The calls to memberAlive are much faster in ~50% of the calls. >> >> Here most interesting is the fact, that the string construction takes >> 70% of total cpu time. I inspected, why this is so. For the next trace I >> had to increase the depth I traced, so elapsed times go a little up. The >> columns are: >> >> - ">" or "<": ">" means start of method, "<" end of method >> - stack depth relative to the receive method >> - cpu time >> - elapsed time >> - allocated bytes >> - class and method >> >>> 3 - - - java/lang/String. >>> 4 - - - java/lang/String. >>> 5 - - - java/lang/StringCoding.decode >>> 6 - - - java/lang/StringCoding.decode >>> 7 - - - java/lang/StringCoding.deref >> < 7 0.1ms 0.7ms 0 java/lang/StringCoding.deref >>> 7 - - - java/lang/StringCoding$StringDecoder.decode >>> 8 - - - java/nio/ByteBuffer.wrap >> < 8 0.1ms 0.5ms 48 java/nio/ByteBuffer.wrap >>> 8 - - - java/nio/CharBuffer.wrap >> < 8 0.1ms 0.5ms 48 java/nio/CharBuffer.wrap >>> 8 - - - java/nio/charset/CharsetDecoder.decode >> < 8 0.7ms 4.4ms 0 java/nio/charset/CharsetDecoder.decode >> < 7 1.1ms 7.8ms 160 java/lang/StringCoding$StringDecoder.decode >> < 6 1.2ms 9.1ms 160 java/lang/StringCoding.decode >> < 5 1.2ms 9.5ms 160 java/lang/StringCoding.decode >> < 4 1.3ms 10.1ms 160 java/lang/String. >> < 3 1.3ms 10.2ms 160 java/lang/String. >> >> So we can see, that most of the cpu time is needed inside >> >> - java/nio/charset/CharsetDecoder.decode: 0.7ms >> - java/lang/StringCoding.deref: 0.1ms >> - java/nio/ByteBuffer.wrap: 0.1ms >> - java/nio/CharBuffer.wrap: 0.1ms >> >> Maybe it would help to use a more byte-oriented format in the heartbeat >> messages and to not convert from it. String conversions might be >> restrictable to the case when logging in debug mode. >> >> Cheers, >> >> Rainer >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org >> For additional commands, e-mail: dev-help@tomcat.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org > For additional commands, e-mail: dev-help@tomcat.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org For additional commands, e-mail: dev-help@tomcat.apache.org