incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Seaborne <andy.seabo...@epimorphics.com>
Subject Re: leak but where after parsing rdf files?
Date Mon, 24 Jan 2011 21:11:39 GMT


On 24/01/11 18:03, Hasan Hasan wrote:
> Hi Andy
>
> attached I provide a bundle that when run can
> throw java.lang.OutOfMemoryError exception.
> I don't do any parsing in the code. I merely read triples from the graph
> generated in the previous or current execution.
>
> Invoked with:
> MAVEN_OPTS="-Xmx512m -Xms128m"  mvn clean install exec:java -o -e
> -Dexec.args="300 2"
>
> You can play with the arguments. You can generate some triples in
> current execution and retrieve them
> You can also only retrieve triples, in which case you need not
> specify -Dexec.args
> In the above example, 300 is the number of triples to be generated and
> added to the graph
> 2 is the type of literal used: xsd:base64Binary, if you specify 1, the
> type used is rdf:XMLLiteral
> Not all objects in the graph are of typed literals.
>
> Could you please check? Thanks.

Yes, it does.  But I can tell that from the pom alone and reading 
between the lines (and test cases) that it's about large literals.

This uses TDB - TDB has various caches in JVM.

Note that on 64 bit hardware, TDB will also use memory mapped I/O, which 
counts towards the process size but not the heap.

There are 2 100K slot caches in front of the node table in the heap, one 
for node->NodeId and one for NodeId->Node (the latter is more important 
at query time, the former at update time).  The policy is LRU.

This has an implicit assumption that nodes are not comparable size - you 
have 1.5Mbyte (3MBytes in Java!).

If you want to store multimegabyte base64-encoded literals, you might 
wish to consider using a blob store and storing the reference in teh RDF 
database.  Even if this all worked naturally, you might want to do this 
because it's an inefficient use of a valuable system resource (memory 
space).

((Or submit a patch to Jena JIRA for a separate storage area and policy 
for large literals. Or size sensitive cache implementation :-)))

In theory, the caches are tunable because it's all constants in 
SystemTDB, but it's untested as to the performance impact. It should in 
the appropriate .info file as well, but it's not.

	Andy

>
> Hasan

Mime
View raw message