incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Seaborne <>
Subject Re: leak but where after parsing rdf files?
Date Mon, 24 Jan 2011 21:11:39 GMT

On 24/01/11 18:03, Hasan Hasan wrote:
> Hi Andy
> attached I provide a bundle that when run can
> throw java.lang.OutOfMemoryError exception.
> I don't do any parsing in the code. I merely read triples from the graph
> generated in the previous or current execution.
> Invoked with:
> MAVEN_OPTS="-Xmx512m -Xms128m"  mvn clean install exec:java -o -e
> -Dexec.args="300 2"
> You can play with the arguments. You can generate some triples in
> current execution and retrieve them
> You can also only retrieve triples, in which case you need not
> specify -Dexec.args
> In the above example, 300 is the number of triples to be generated and
> added to the graph
> 2 is the type of literal used: xsd:base64Binary, if you specify 1, the
> type used is rdf:XMLLiteral
> Not all objects in the graph are of typed literals.
> Could you please check? Thanks.

Yes, it does.  But I can tell that from the pom alone and reading 
between the lines (and test cases) that it's about large literals.

This uses TDB - TDB has various caches in JVM.

Note that on 64 bit hardware, TDB will also use memory mapped I/O, which 
counts towards the process size but not the heap.

There are 2 100K slot caches in front of the node table in the heap, one 
for node->NodeId and one for NodeId->Node (the latter is more important 
at query time, the former at update time).  The policy is LRU.

This has an implicit assumption that nodes are not comparable size - you 
have 1.5Mbyte (3MBytes in Java!).

If you want to store multimegabyte base64-encoded literals, you might 
wish to consider using a blob store and storing the reference in teh RDF 
database.  Even if this all worked naturally, you might want to do this 
because it's an inefficient use of a valuable system resource (memory 

((Or submit a patch to Jena JIRA for a separate storage area and policy 
for large literals. Or size sensitive cache implementation :-)))

In theory, the caches are tunable because it's all constants in 
SystemTDB, but it's untested as to the performance impact. It should in 
the appropriate .info file as well, but it's not.


> Hasan

View raw message