Return-Path: Delivered-To: apmail-incubator-clerezza-dev-archive@minotaur.apache.org Received: (qmail 71897 invoked from network); 21 Jan 2011 11:38:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Jan 2011 11:38:10 -0000 Received: (qmail 45870 invoked by uid 500); 21 Jan 2011 11:38:10 -0000 Delivered-To: apmail-incubator-clerezza-dev-archive@incubator.apache.org Received: (qmail 45809 invoked by uid 500); 21 Jan 2011 11:38:08 -0000 Mailing-List: contact clerezza-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: clerezza-dev@incubator.apache.org Delivered-To: mailing list clerezza-dev@incubator.apache.org Received: (qmail 45801 invoked by uid 99); 21 Jan 2011 11:38:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Jan 2011 11:38:07 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [83.222.232.116] (HELO charlie.justhostme.co.uk) (83.222.232.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Jan 2011 11:37:58 +0000 Received: from 82-69-1-248.dsl.in-addr.zen.co.uk ([82.69.1.248] helo=[192.168.1.63]) by charlie.justhostme.co.uk with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1PgFJC-0005GY-6J for clerezza-dev@incubator.apache.org; Fri, 21 Jan 2011 11:37:38 +0000 Message-ID: <4D396FFF.7070804@epimorphics.com> Date: Fri, 21 Jan 2011 11:37:35 +0000 From: Andy Seaborne User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 To: clerezza-dev@incubator.apache.org Subject: Re: leak but where after parsing rdf files? References: <4D2EBA7D.2020206@epimorphics.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - charlie.justhostme.co.uk X-AntiAbuse: Original Domain - incubator.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - epimorphics.com X-Virus-Checked: Checked by ClamAV on apache.org On 20/01/11 19:43, Reto Bachmann-Gmuer wrote: > HI Andy > > I've committed an application that uses directly jena without clerezza stuff > in the middle that demonstarts the problem. > > Starting it with > > MAVEN_OPTS="-Xmx256m -Xms128m" mvn clean install exec:java -o -e > > it will fail at one of the files, howver if I change the order in which the > files are to be parsed and put the file it was failing at at the begginning > it suceeds parsing this file and will fail at another one. > > the app is here: > http://svn.apache.org/viewvc/incubator/clerezza/issues/CLEREZZA-384/turtlememory Not entirely without clerezza stuff - the POM does not work standalone. After some POM hacking, I got it working. I take it the test is "TestWithFiles". It's not using RIOT because that's not in the Jena download yet. Add com.hp.hpl.jena arq 2.8.7 and either: com.hp.hpl.jena.query.ARQ.init() ; or org.openjena.riot.SysRIOT.wireIntoJena() ; With this the test passes (and much faster as well). The test is not just parsing. It's storing the results in a model so the space needed included complete storage of the model. Only a small increase in -Xmx (e.g. 350m) and the test passes. The test fails in the first pass over the files if it's going to fail. I suspect that one or more internal systems have fixed size caches. Jena does. JavaCC has expanding buffering (and you have some very large literals). Jena's caches are bounded by number of slots so churning based on large literals will need to settle down before any conclusions of a memory leak can be made. Hence failing on the first pass is not suggestive of a memory leak. This is backed up by the fact file order matters. JavaCC used by the old parser uses expanding buffers and your long literals will force those larger and hence the runtime working space is higher on a single file parse. RIOT uses a fixed size buffer and builds the large literals directly into the string to be used as the RDF node. As increasing the heap means that the test runs and the test fails in the first pass over the files if it is going to fail, I conclude it's various caches filling up and just not fitting. I guess it passes at 256m with RIOT by chance. Slightly less overhead meaning that caches just happen to fit. There is a streaming interface to RIOT in org.openjena.riot.RiotReader. Andy