From kato-dev-return-301-apmail-incubator-kato-dev-archive=incubator.apache.org@incubator.apache.org Fri Jan 15 08:34:09 2010 Return-Path: Delivered-To: apmail-incubator-kato-dev-archive@minotaur.apache.org Received: (qmail 5529 invoked from network); 15 Jan 2010 08:34:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Jan 2010 08:34:09 -0000 Received: (qmail 36655 invoked by uid 500); 15 Jan 2010 08:34:08 -0000 Delivered-To: apmail-incubator-kato-dev-archive@incubator.apache.org Received: (qmail 36618 invoked by uid 500); 15 Jan 2010 08:34:08 -0000 Mailing-List: contact kato-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: kato-dev@incubator.apache.org Delivered-To: mailing list kato-dev@incubator.apache.org Received: (qmail 36608 invoked by uid 99); 15 Jan 2010 08:34:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jan 2010 08:34:08 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of spoole167@googlemail.com designates 209.85.218.226 as permitted sender) Received: from [209.85.218.226] (HELO mail-bw0-f226.google.com) (209.85.218.226) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jan 2010 08:33:59 +0000 Received: by bwz26 with SMTP id 26so342372bwz.27 for ; Fri, 15 Jan 2010 00:33:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=RbhPvDYOI0PS4i02BswH5p2M2bXMjgqvdgvgGVxWRVA=; b=p8pAsmpzaViXGoKJaHWhW0alHrLM4heldAUm54Zcr4G93VWnVRownQv3dy9VMt30o7 2QjQOF4ImpylJ6gD7ZQj/rmRfjm8GhPdFBky4wBhPEhhUlb9+hwiw8s/CdcXxf17cBE6 40tEf4U5U+knwOiohT7GHzoGczfuyouTDPi30= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=H2d/X3HWtN8cw8iCgQr3fqXf7Q2FEWbxnPEnyRN9nHsLqszrCD2i48tlG+VMyvB/jE +SoZRjT4FbKxYNto101afnsqkl/AkuE4wCjefaDiXdnasAXcQyt4uOX/Tk5Z6f+CXk1k u4yvVyUPJr61on7QjWzaomcAkBYxCrJu2Di8k= MIME-Version: 1.0 Received: by 10.204.18.212 with SMTP id x20mr1112615bka.9.1263544417384; Fri, 15 Jan 2010 00:33:37 -0800 (PST) In-Reply-To: <4B4F528B.2050203@intercel.com.pl> References: <4B47835B.2030904@intercel.com.pl> <4B4A317C.2040900@stoo.me.uk> <4B4B9140.5090106@intercel.com.pl> <4B4CB631.1010503@intercel.com.pl> <4B4CBCB1.3070009@stoo.me.uk> <4B4F528B.2050203@intercel.com.pl> Date: Fri, 15 Jan 2010 08:33:37 +0000 Message-ID: Subject: Re: Processing huge heap dump. From: Steve Poole To: kato-dev@incubator.apache.org Content-Type: multipart/alternative; boundary=000325557d7627f94e047d2fdc7b --000325557d7627f94e047d2fdc7b Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I've run some performance tests on various configs for HProfFile caching. See here http://spreadsheets.google.com/pub?key=3Dtes0VC-Wyz-lTe9sKNen2Hw&output=3Dh= tml Our current setting for cache size and bucket size is 1000,25 - Bucket siz= e does effect the speed of analysis but its cache size that really does it! We need to configure the cache size based on some heuristic about the size of the dump file. On Thu, Jan 14, 2010 at 5:21 PM, Lukasz wrote: > Hi Steve, Stuart, > > In last two days I was trying to move further. > > I tried to prepare some workaround for incorrect length. > First idea was to utilize the fact that integer overflow probably took > place, so I tried to search for 0x2c just at overflow points (probably qu= ite > brittle approach). But it turned out, that in my heap dumps (both real on= e > and prepared for development purposes) there is no 0x2c byte after/at the > end of the dump. > I've noticed however that for both dumps the heap record ends at the end = of > the file. Probably it is not general rule, but for my purposes it should = be > enough, so I changed kato sources slightly to utilize this and decided to > move further. > > So far I have some code which extracts static data from dump (strings, > classes) and searches for instances of given class. > I have run it against real dump. It took around 40 min (to get static dat= a) > and 50 min (to find instances), which is not bad I think. > I didn't made careful reasoning about performance but CPU (not I/O) looke= d > to be a bottleneck. Performance also looked to be quite stable across who= le > dump. > > 60GB dump was extracted for core file using jmap (I don't remember exactl= y > from which version of java this jmap came from, but for sure it was Java > 1.6). > Following JVM version was dumped to core file: > java version "1.6.0_10" > Java(TM) SE Runtime Environment (build 1.6.0_10-b33) > Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode) > > I was also suspecting that heap dump segments were introduced to handle > +4GB heap sizes, but it looks like sometimes it can not be a case. > > Anyway, it is nice that I was able to go through whole dump without any > error. > > Regards > Lukasz > > > Steve Poole wrote: > >> Hi Lukasz, >> >> You say your hprof format is 1.0.1 so , according to the information= I >> have about hprof format, that means your won't have any heap dump >> segments >> in your file. Apparently they are in the 1.0.2 format which is a Java = 6 >> addition and is only used when the heap is too big for the old format to >> cope with. >> >> What JVM and OS are you using to produce this dump? >> >> On Tue, Jan 12, 2010 at 6:17 PM, Stuart Monteith >> wrote: >> >> >> >>> Hi, >>> >>> >>> You are quite right, 4 bytes for the record length. If we had the file >>> format wrong for 64 bit >>> hprof dumps we'd not be able to read any dump, so it must be an overflo= w >>> of >>> the record length. >>> In that case, the only possible solution I can think of would be for us >>> to >>> change to code to search >>> for the HEAP DUMP END record (tagged 0x2c) to detect the end of the HEA= P >>> DUMP. >>> >>> Of course, there was some puzzling over the difference between HEAP DUM= P >>> SEGMENT (0x1c) and HEAP DUMP >>> (0x0c) before. >>> >>> I'd expect we'd get: >>> >>> HEAP DUMP >>> heap dump records.... >>> HEAP DUMP END >>> >>> or >>> >>> HEAP DUMP SEGMENT >>> heap dump records >>> HEAP DUMP SEGMENT >>> heap dump records >>> : : >>> HEAP DUMP END >>> >>> >>> If we ignore the record size for the HEAP DUMP and calculate it >>> ourselves, >>> we might have more luck. >>> >>> Regards, >>> Stuart >>> >>> >>> >>> Lukasz wrote: >>> >>> >>> >>>> Hello, >>>> >>>> Since performance looks not bad after changing array size, today I hav= e >>>> tried to process my 60GB heap dump. Unfortunately, very fast I've ende= d >>>> with >>>> exception: >>>> Exception in thread "main" java.lang.IllegalArgumentException: range >>>> -1293366370 is less than 0 >>>> at >>>> >>>> org.apache.kato.common.SubsetDataProvider.(SubsetDataProvider.ja= va:29) >>>> >>>> at >>>> >>>> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurre= ntElement(HProfFile.java:487) >>>> >>>> at >>>> >>>> org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurre= ntElement(HProfFile.java:437) >>>> >>>> at >>>> >>>> org.apache.kato.common.BitMaskMappingArray.get(BitMaskMappingArray.jav= a:56) >>>> at >>>> org.apache.kato.hprof.datalayer.HProfFile.getRecord(HProfFile.java:784= ) >>>> ... (my code) ... >>>> >>>> It looks like range of HeapDumpHProfRecord had value: -1293366370. >>>> >>>> Following is the part of the dump which I believes is responsible for >>>> that >>>> (B2 E8 CB 9E): >>>> 0055DD84 25 00 00 2A =E2=94=82 B7 AE 22 06 =E2=94=82 80 00 00 00 =E2= =94=82 01 00 00 2A =E2=94=82 B7 AE >>>> 20 >>>> 0B =E2=94=82 88 02 00 00 =E2=94=82 00 00 00 00 =E2=94=82 00 18 00 00 = =E2=94=82 23 26 00 00 >>>> 0055DDA8 2A B7 B0 A1 =E2=94=82 64 28 00 00 =E2=94=82 00 01 00 00 =E2= =94=82 2A B7 B0 33 =E2=94=82 EA F8 >>>> 0C >>>> 00 =E2=94=82 00 00 00 B2 =E2=94=82 E8 CB 9E 20 =E2=94=82 00 00 2A B7 = =E2=94=82 B0 DE AF C8 >>>> 0055DDCC 00 00 00 01 =E2=94=82 00 00 2A B7 =E2=94=82 AE 22 0F 20 =E2= =94=82 00 00 2A AA =E2=94=82 BE 21 >>>> 3D >>>> 28 =E2=94=82 00 00 00 00 =E2=94=82 00 00 00 00 =E2=94=82 00 00 2A AA = =E2=94=82 BE 25 B7 D8 >>>> >>>> What is interesting hprof file format reserves only 4 bytes for length >>>> of >>>> a record, which allows to express record whose size is at most 4GB >>>> (assuming >>>> unsigned 4 bytes). >>>> >>>> My dump's version is: JAVA PROFILE 1.0.1 >>>> >>>> Regards >>>> Lukasz >>>> >>>> >>>> Lukasz wrote: >>>> >>>> >>>> >>>>> Hi Stuart, Steve, >>>>> >>>>> I've taken deeper look into code. I still didn't trace carefully inde= x >>>>> calculation in classes BitMaskMappingArray and >>>>> ArrayBitMaskMappingStrategy, >>>>> but I managed to improve performance by increasing arrays size in tho= se >>>>> classes (which is set in HProfFile class). >>>>> >>>>> If I understand code correctly, when capacity of BitMaskMappingArray >>>>> will >>>>> be exhausted bucketSize is doubled, which in turn causes that more >>>>> reads >>>>> (even cached) is required to set position of >>>>> IDataProvider/IArrayEntryProvider. >>>>> >>>>> Following are loading time results for default array size (1000) and >>>>> increased (1000000). Test ran against generated dump file (5000000 >>>>> instances >>>>> of Data). >>>>> Default (1000): >>>>> HeapSubRecord: 100000 (866ms, 4215kB) >>>>> HeapSubRecord: 200000 (1716ms, 7879kB) >>>>> HeapSubRecord: 300000 (2833ms, 11263kB) >>>>> HeapSubRecord: 400000 (3889ms, 14283kB) >>>>> HeapSubRecord: 500000 (3893ms, 17319kB) >>>>> HeapSubRecord: 600000 (7248ms, 20479kB) (here probably buckedSize was >>>>> doubled) >>>>> HeapSubRecord: 700000 (7721ms, 23531kB) >>>>> HeapSubRecord: 800000 (7729ms, 26567kB) >>>>> HeapSubRecord: 900000 (7731ms, 29671kB) >>>>> HeapSubRecord: 1000000 (7704ms, 32731kB) >>>>> ... (I didn't wait until end) >>>>> >>>>> Increased(1000000): >>>>> HeapSubRecord: 100000 (622ms, 17809kB) >>>>> HeapSubRecord: 200000 (309ms, 20345kB) >>>>> HeapSubRecord: 300000 (283ms, 23861kB) >>>>> HeapSubRecord: 400000 (274ms, 27921kB) >>>>> HeapSubRecord: 500000 (269ms, 29957kB) >>>>> HeapSubRecord: 600000 (264ms, 31993kB) >>>>> HeapSubRecord: 700000 (272ms, 36097kB) >>>>> HeapSubRecord: 800000 (288ms, 37739kB) >>>>> HeapSubRecord: 900000 (263ms, 39835kB) >>>>> HeapSubRecord: 1000000 (259ms, 41931kB) >>>>> HeapSubRecord: 1100000 (300ms, 44773kB) >>>>> HeapSubRecord: 1200000 (283ms, 46901kB) >>>>> HeapSubRecord: 1300000 (291ms, 49029kB) >>>>> HeapSubRecord: 1400000 (328ms, 53801kB) >>>>> HeapSubRecord: 1500000 (259ms, 53801kB) >>>>> HeapSubRecord: 1600000 (272ms, 58125kB) >>>>> HeapSubRecord: 1700000 (264ms, 60293kB) >>>>> HeapSubRecord: 1800000 (264ms, 62473kB) >>>>> HeapSubRecord: 1900000 (361ms, 61373kB) >>>>> HeapSubRecord: 2000000 (274ms, 63105kB) >>>>> ... >>>>> HeapSubRecord: 9000000 (284ms, 231969kB) >>>>> HeapSubRecord: 9100000 (272ms, 233597kB) >>>>> HeapSubRecord: 9200000 (281ms, 236357kB) >>>>> HeapSubRecord: 9300000 (274ms, 240469kB) >>>>> HeapSubRecord: 9400000 (279ms, 244541kB) >>>>> HeapSubRecord: 9500000 (269ms, 246549kB) >>>>> HeapSubRecord: 9600000 (279ms, 250565kB) >>>>> HeapSubRecord: 9700000 (265ms, 252573kB) >>>>> HeapSubRecord: 9800000 (279ms, 256629kB) >>>>> HeapSubRecord: 9900000 (265ms, 258669kB) >>>>> HeapSubRecord: 10000000 (463ms, 263997kB) >>>>> (end) >>>>> >>>>> i.e. my 60GB dump file contains more than 1 100 000 000 of objects (i= f >>>>> I >>>>> remember correctly). >>>>> >>>>> Regards >>>>> Lukasz >>>>> >>>>> >>>>> Stuart Monteith wrote: >>>>> >>>>> >>>>> >>>>>> The hprof dump reader spends a lot of time reading the whole file, f= or >>>>>> various reason. >>>>>> The indices it has in memory are constructed through an initial read= , >>>>>> and this is also >>>>>> the source of the memory usage. In addition, there is some correlati= on >>>>>> to be done which >>>>>> also takes up time, and induces yet more reading. >>>>>> >>>>>> I'm sure some work could be done to improve the performance further, >>>>>> but >>>>>> we'll have to >>>>>> look at the tradeoff between diskspace and memory usage. The hprof >>>>>> file >>>>>> format itself >>>>>> is what it is, however, and we have no influence over that. The CJVM= TI >>>>>> agent is has lots of >>>>>> room for improvement, but I suspect its potential for improvement is >>>>>> unlikely to be much better >>>>>> than existing hprof implementations. The built-in JVM hprof dumper >>>>>> will >>>>>> probably be a hard act >>>>>> to follow. >>>>>> >>>>>> The HProf implementation is not thread-safe. Realistically, I think = it >>>>>> is something that ought to >>>>>> be considered once things are more mature. There will be algorithms >>>>>> that >>>>>> can deal with the JVM >>>>>> structure sensible. >>>>>> >>>>>> And thanks Lukasz, it's great to have your input. >>>>>> >>>>>> Regards, >>>>>> Stuart >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> -- >>> Stuart Monteith >>> http://blog.stoo.me.uk/ >>> >>> >>> >>> >> >> >> >> > > --=20 Steve --000325557d7627f94e047d2fdc7b--