Return-Path: X-Original-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ECE2D7C81 for ; Wed, 28 Sep 2011 15:55:32 +0000 (UTC) Received: (qmail 82913 invoked by uid 500); 28 Sep 2011 15:55:32 -0000 Delivered-To: apmail-incubator-jena-dev-archive@incubator.apache.org Received: (qmail 82882 invoked by uid 500); 28 Sep 2011 15:55:32 -0000 Mailing-List: contact jena-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jena-dev@incubator.apache.org Delivered-To: mailing list jena-dev@incubator.apache.org Received: (qmail 82873 invoked by uid 99); 28 Sep 2011 15:55:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Sep 2011 15:55:32 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of castagna.lists@googlemail.com designates 209.85.214.47 as permitted sender) Received: from [209.85.214.47] (HELO mail-bw0-f47.google.com) (209.85.214.47) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Sep 2011 15:55:26 +0000 Received: by bke11 with SMTP id 11so8452009bke.6 for ; Wed, 28 Sep 2011 08:55:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=F2ebN9Sk9NtLXK7ouBcwHVPwu/WO340EvIruriboVuo=; b=aPvvDYCkouFI8ii4rNuZZj6zSM6QBs6xdaOf4V+1FKlfFMi5oIc4BUs8LKClf8qTiI PHC7v8kepvYQfkAslAnu+fdA75wkcIej7S5wGGFSkjQe0THqWhaKH2U3jyrkI5m1usby 9zc6PeGpRhK0ypX+82gDsGfzJdua1XA9IO2kk= Received: by 10.204.13.80 with SMTP id b16mr6174426bka.113.1317225305198; Wed, 28 Sep 2011 08:55:05 -0700 (PDT) Received: from [192.168.10.120] ([212.36.55.94]) by mx.google.com with ESMTPS id z7sm28685001bkt.5.2011.09.28.08.55.03 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 28 Sep 2011 08:55:03 -0700 (PDT) Message-ID: <4E834355.5000908@googlemail.com> Date: Wed, 28 Sep 2011 16:55:01 +0100 From: Paolo Castagna User-Agent: Thunderbird 2.0.0.24 (X11/20101027) MIME-Version: 1.0 To: jena-dev@incubator.apache.org Subject: Re: TxTDB - com.hp.hpl.jena.tdb.base.file.FileException: Impossibly large object References: <4E82F483.6030902@googlemail.com> <4E82FAB4.4040007@googlemail.com> <4E831661.6050604@googlemail.com> <4E833DD6.2020102@googlemail.com> In-Reply-To: <4E833DD6.2020102@googlemail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Paolo Castagna wrote: > Hi, > I might be doing something stupid, but I think I produced a minimal > example which shows the problem. > > I have two file: 1.ttl and 2.ttl. > Here they are: > > ---------[ 1.ttl ]--------- > . > --------------------------- > > ---------[ 2.ttl ]--------- > . > --------------------------- > > This is what I do: > > public static void dumpObjectFile(Location location) { > ObjectFile objects = FileFactory.createObjectFileDisk( > location.getPath(Names.indexId2Node, Names.extNodeData)) ; > Iterator> iter = objects.all() ; > while ( iter.hasNext() ) { > System.out.println(iter.next()) ; > } > } > > public static void load(Location location, String filename) { > StoreConnection sc = StoreConnection.make(location) ; > DatasetGraphTxn dsg = sc.begin(ReadWrite.WRITE) ; > TDBLoader.load(dsg, filename) ; > dsg.commit() ; > TDB.sync(dsg) ; > dsg.close() ; > StoreConnection.release(location) ; > } > > public static void main(String[] args) { > String path = "/home/castagna/Desktop/" ; > Location location = new Location(path + "tdb") ; > // 1 > load(location, path + "1.ttl") ; > // 2 > // dumpObjectFile (location) ; > // 3 > // replay(location, path + "2.ttl") ; ^ | load > // 4 > // dumpObjectFile (location) ; > } > > I first load the first file (i.e. 1.ttl). > Then I comment step 1 and uncomment step 2: dumpObjectFile. > I then comment step 2 and uncomment step 3 to load the second file (i.e. > 2.ttl). This time on an existing TDB location. > Comment step 3, uncomment step 4 to dump the object file out again. This > time the nodes.dat file is corrupted. > > I tried to use Model read(...) method instead of TDBLoader load(...). > The effect is the same. > > I tried to remove the TDB.sync(dsg), since I don't thing it is necessary > there. > The effect is the same. > > Am I missing something obvious here? > > Paolo > > > Simon Helsen wrote: >> Paolo, >> >> In our tests, we are not using TDBLoader.load directly. But we do use >> public Model add( Model m ) which in its turn calls >> getBulkUpdateHandler().add( m.getGraph(), !suppressReifications ); >> >> Not sure if that helps in the analysis >> >> Simon >> >> >> >> From: >> Paolo Castagna >> To: >> jena-dev@incubator.apache.org >> Date: >> 09/28/2011 08:46 AM >> Subject: >> Re: TxTDB - com.hp.hpl.jena.tdb.base.file.FileException: Impossibly >> large object >> >> >> >> Hi, >> I am currently investigating the issue. >> >> So far, I managed to get an initial copy of TDB indexes which is not >> corrupted (~2.6GB). We then applied ~635 updates to it (and for each >> transaction I have the data which has been submitted). I then >> re-applied the changes with a little program which uses TxTDB only >> (via TDBLoader.load(...)). At the end of this, the nodes.dat file is >> corrupted. >> >> This is just doing: >> >> StoreConnection sc = >> StoreConnection.make(location) ; >> for ( int i = 1; i < 636; i++ ) { >> System.out.println(i); >> DatasetGraphTxn dsg = >> sc.begin(ReadWrite.WRITE) ; >> TDBLoader.load(dsg, >> "/tmp/updates/" + i + ".ttl") ; >> dsg.commit() ; >> dsg.close() ; >> } >> >> I tried to apply same changes to an initially empty TDB database and >> there are no problems. >> >> Now, I am double checking the integrity of my initial TDB indexes. >> I then proceed applying one change at the time and verify integrity >> (via dump). >> >> Paolo >> >> >> >> Simon Helsen wrote: >>> thanks Paolo, >>> >>> this is related to jena-91. In fact, that is how our problems started >>> >>> Glad someone else was able to reproduce >>> >>> Simon >>> >>> >>> >>> From: >>> Paolo Castagna >>> To: >>> jena-dev@incubator.apache.org >>> Date: >>> 09/28/2011 06:47 AM >>> Subject: >>> Re: TxTDB - com.hp.hpl.jena.tdb.base.file.FileException: Impossibly >> large >>> object >>> >>> >>> >>> The object file of the node table (i.e. nodes.dat) is corrupted. >>> >>> I tried to read it sequentially, I get: >>> (318670, java.nio.HeapByteBuffer[pos=0 lim=22 cap=22]) >>> But, after that, the length of the next ByteBuffer is: 909129782 (*). >>> >>> Paolo >>> >>> (*) Running a simple program to iterate through all the Pair>> ByteBuffer> >>> in the ObjectFile and debugging it: ObjectFileDiskDirect, line >> 176. >>> >>> Paolo Castagna wrote: >>>> Hi, >>>> we are using|testing TxTDB. >>>> >>>> In this case, we just perform a series of WRITE transactions >>> (sequentially >>>> one after the other) and then issue a SPARQL query (as a READ >>> transaction). >>>> There are no exceptions during the WRITE transactions. >>>> >>>> This is the exception we see when we issue the SPARQL query: >>>> >>>> com.hp.hpl.jena.tdb.base.file.FileException: >>>> ObjectFile.read(9863)[119398665][119079969]: Impossibly large object >>>> : 1752462448 bytes >>>> at >> com.hp.hpl.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:282) >> >>>> at com.hp.hpl.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:60) >>>> at >> com.hp.hpl.jena.tdb.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:164) >> >>>> at >> com.hp.hpl.jena.tdb.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:88) >> >>>> at >> com.hp.hpl.jena.tdb.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:59) >> >>>> at >> com.hp.hpl.jena.tdb.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:89) >> >>>> at >> com.hp.hpl.jena.tdb.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:60) >> >>>> at >> com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:44) >> >>>> at >> com.hp.hpl.jena.tdb.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:56) >> >>>> at >> com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:44) >> >>>> at com.hp.hpl.jena.tdb.solver.BindingTDB.get1(BindingTDB.java:92) >>>> at >> com.hp.hpl.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:106) >> >>>> at >>>> com.hp.hpl.jena.sparql.core.ResultBinding._get(ResultBinding.java:44) >>>> at >> com.hp.hpl.jena.sparql.core.QuerySolutionBase.get(QuerySolutionBase.java:20) >> >>>> at >> com.hp.hpl.jena.sparql.resultset.ResultSetApply.apply(ResultSetApply.java:35) >> >>>> at >>>> com.hp.hpl.jena.sparql.resultset.JSONOutput.format(JSONOutput.java:23) >>>> at >> com.hp.hpl.jena.query.ResultSetFormatter.outputAsJSON(ResultSetFormatter.java:584) >> >>>> [...] >>>> >>>> This was with an Oracle JVM, 1.6.0_25 64-bit on an VM (on EC2) with >>>> Ubuntu 64-bit OS. We are using a TxTDB packaged directly from SVN >>>> (r1176416). >>>> >>>> This seems to be a similar (or related) issue to: >>>> https://issues.apache.org/jira/browse/JENA-91 >>>> >>>> Paolo >>>> >>>> >>> >>> >>> >> >> >> >> >