Mailing-List: contact jena-dev-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jena-dev@incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of castagna.lists@googlemail.com
 designates 209.85.214.47 as permitted sender)
Message-ID: <4E834355.5000908@googlemail.com>
Date: Wed, 28 Sep 2011 16:55:01 +0100
From: Paolo Castagna <castagna.lists@googlemail.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20101027)
MIME-Version: 1.0
To: jena-dev@incubator.apache.org
Subject: Re: TxTDB - com.hp.hpl.jena.tdb.base.file.FileException: Impossibly
 large object
References: <4E82F483.6030902@googlemail.com>
 <4E82FAB4.4040007@googlemail.com>
 <OFB3CE2E54.316E1CA4-ON85257919.00424515-85257919.0044B5EB@ca.ibm.com>
 <4E831661.6050604@googlemail.com>
 <OF462077BF.42FF3C70-ON85257919.00486E4D-85257919.0048A669@ca.ibm.com>
 <4E833DD6.2020102@googlemail.com>
In-Reply-To: <4E833DD6.2020102@googlemail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Paolo Castagna wrote:
> Hi,
> I might be doing something stupid, but I think I produced a minimal
> example which shows the problem.
> 
> I have two file: 1.ttl and 2.ttl.
> Here they are:
> 
> ---------[ 1.ttl ]---------
> <http://example.com/1> <http://example.com/ns#p> <http://example.com/2> .
> ---------------------------
> 
> ---------[ 2.ttl ]---------
> <http://example.com/3> <http://example.com/ns#p> <http://example.com/4> .
> ---------------------------
> 
> This is what I do:
> 
> public static void dumpObjectFile(Location location) {
>     ObjectFile objects = FileFactory.createObjectFileDisk(
>         location.getPath(Names.indexId2Node, Names.extNodeData)) ;
>     Iterator<Pair<Long,ByteBuffer>> iter = objects.all() ;
>     while ( iter.hasNext() ) {
>         System.out.println(iter.next()) ;
>     }
> }
> 
> public static void load(Location location, String filename) {
>     StoreConnection sc = StoreConnection.make(location) ;
>     DatasetGraphTxn dsg = sc.begin(ReadWrite.WRITE) ;
>     TDBLoader.load(dsg, filename) ;
>     dsg.commit() ;
>     TDB.sync(dsg) ;
>     dsg.close() ;
>     StoreConnection.release(location) ;
> }
>     
> public static void main(String[] args) {
>     String path = "/home/castagna/Desktop/" ;
>     Location location = new Location(path + "tdb") ;
>     // 1
>     load(location, path + "1.ttl") ;
>     // 2
>     // dumpObjectFile (location) ;
>     // 3
>     // replay(location, path + "2.ttl") ;
            ^
            |
           load

>     // 4
>     // dumpObjectFile (location) ;
> }
> 
> I first load the first file (i.e. 1.ttl).
> Then I comment step 1 and uncomment step 2: dumpObjectFile.
> I then comment step 2 and uncomment step 3 to load the second file (i.e. 
> 2.ttl). This time on an existing TDB location.
> Comment step 3, uncomment step 4 to dump the object file out again. This 
> time the nodes.dat file is corrupted.
> 
> I tried to use Model read(...) method instead of TDBLoader load(...).
> The effect is the same.
> 
> I tried to remove the TDB.sync(dsg), since I don't thing it is necessary 
> there.
> The effect is the same.
> 
> Am I missing something obvious here?
> 
> Paolo
> 
> 
> Simon Helsen wrote:
>> Paolo,
>>
>> In our tests, we are not using TDBLoader.load directly. But we do use 
>> public Model add( Model m ) which in its turn calls 
>> getBulkUpdateHandler().add( m.getGraph(), !suppressReifications );
>>
>> Not sure if that helps in the analysis
>>
>> Simon
>>
>>
>>
>> From:
>> Paolo Castagna <castagna.lists@googlemail.com>
>> To:
>> jena-dev@incubator.apache.org
>> Date:
>> 09/28/2011 08:46 AM
>> Subject:
>> Re: TxTDB - com.hp.hpl.jena.tdb.base.file.FileException: Impossibly 
>> large object
>>
>>
>>
>> Hi,
>> I am currently investigating the issue.
>>
>> So far, I managed to get an initial copy of TDB indexes which is not 
>> corrupted (~2.6GB). We then applied ~635 updates to it (and for each 
>> transaction I have the data which has been submitted). I then 
>> re-applied the changes with a little program which uses TxTDB only 
>> (via TDBLoader.load(...)). At the end of this, the nodes.dat file is 
>> corrupted.
>>
>> This is just doing:
>>
>>                                  StoreConnection sc = 
>> StoreConnection.make(location) ;
>>                                  for ( int i = 1; i < 636; i++ ) {
>>                                                  System.out.println(i);
>>                                                  DatasetGraphTxn dsg = 
>> sc.begin(ReadWrite.WRITE) ;
>>                                                  TDBLoader.load(dsg, 
>> "/tmp/updates/" + i + ".ttl") ;
>>                                                  dsg.commit() ;
>>                                                  dsg.close() ;
>>                                  }
>>
>> I tried to apply same changes to an initially empty TDB database and 
>> there are no problems.
>>
>> Now, I am double checking the integrity of my initial TDB indexes.
>> I then proceed applying one change at the time and verify integrity 
>> (via dump).
>>
>> Paolo
>>
>>
>>
>> Simon Helsen wrote:
>>> thanks Paolo,
>>>
>>> this is related to jena-91. In fact, that is how our problems started
>>>
>>> Glad someone else was able to reproduce
>>>
>>> Simon
>>>
>>>
>>>
>>> From:
>>> Paolo Castagna <castagna.lists@googlemail.com>
>>> To:
>>> jena-dev@incubator.apache.org
>>> Date:
>>> 09/28/2011 06:47 AM
>>> Subject:
>>> Re: TxTDB - com.hp.hpl.jena.tdb.base.file.FileException: Impossibly 
>> large
>>> object
>>>
>>>
>>>
>>> The object file of the node table (i.e. nodes.dat) is corrupted.
>>>
>>> I tried to read it sequentially, I get:
>>> (318670, java.nio.HeapByteBuffer[pos=0 lim=22 cap=22])
>>> But, after that, the length of the next ByteBuffer is: 909129782 (*).
>>>
>>> Paolo
>>>
>>>   (*) Running a simple program to iterate through all the Pair<Long, 
>>> ByteBuffer>
>>>       in the ObjectFile and debugging it: ObjectFileDiskDirect, line 
>> 176.
>>>
>>> Paolo Castagna wrote:
>>>> Hi,
>>>> we are using|testing TxTDB.
>>>>
>>>> In this case, we just perform a series of WRITE transactions 
>>> (sequentially
>>>> one after the other) and then issue a SPARQL query (as a READ 
>>> transaction).
>>>> There are no exceptions during the WRITE transactions.
>>>>
>>>> This is the exception we see when we issue the SPARQL query:
>>>>
>>>> com.hp.hpl.jena.tdb.base.file.FileException: 
>>>> ObjectFile.read(9863)[119398665][119079969]: Impossibly large object 
>>>> : 1752462448 bytes
>>>>     at
>> com.hp.hpl.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:282) 
>>
>>>>     at com.hp.hpl.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:60)
>>>>     at
>> com.hp.hpl.jena.tdb.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:164) 
>>
>>>>     at
>> com.hp.hpl.jena.tdb.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:88) 
>>
>>>>     at
>> com.hp.hpl.jena.tdb.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:59) 
>>
>>>>     at
>> com.hp.hpl.jena.tdb.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:89) 
>>
>>>>     at
>> com.hp.hpl.jena.tdb.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:60) 
>>
>>>>     at
>> com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:44) 
>>
>>>>     at
>> com.hp.hpl.jena.tdb.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:56) 
>>
>>>>     at
>> com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:44) 
>>
>>>>     at com.hp.hpl.jena.tdb.solver.BindingTDB.get1(BindingTDB.java:92)
>>>>     at
>> com.hp.hpl.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:106) 
>>
>>>>     at 
>>>> com.hp.hpl.jena.sparql.core.ResultBinding._get(ResultBinding.java:44)
>>>>     at
>> com.hp.hpl.jena.sparql.core.QuerySolutionBase.get(QuerySolutionBase.java:20) 
>>
>>>>     at
>> com.hp.hpl.jena.sparql.resultset.ResultSetApply.apply(ResultSetApply.java:35) 
>>
>>>>     at 
>>>> com.hp.hpl.jena.sparql.resultset.JSONOutput.format(JSONOutput.java:23)
>>>>     at
>> com.hp.hpl.jena.query.ResultSetFormatter.outputAsJSON(ResultSetFormatter.java:584) 
>>
>>>>     [...]
>>>>
>>>> This was with an Oracle JVM, 1.6.0_25 64-bit on an VM (on EC2) with
>>>> Ubuntu 64-bit OS. We are using a TxTDB packaged directly from SVN 
>>>> (r1176416).
>>>>
>>>> This seems to be a similar (or related) issue to:
>>>> https://issues.apache.org/jira/browse/JENA-91
>>>>
>>>> Paolo
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>>
>