clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Minto van der Sluis <mi...@xup.nl>
Subject Re: Is Clerezza leaking memory?
Date Fri, 29 Nov 2013 12:31:28 GMT
Andy Seaborne schreef op 29-11-2013 9:39:
> On 28/11/13 13:17, Minto van der Sluis wrote:
>> Hi,
>>
>> I just ran into some peculiar behavior.
>>
>> For my current project I have to import 633 files each containing
>> approx 20 MB of xml data (a total of 13 GB). When importing this data
>> into a single graph I hit an out of memory exception on the 7th file.
>>
>> Looking at the heap I noticed that after restarting the application I
>> could load a few more files. So I started looking for the bundle that
>> consumed all the memory. It happened to be the Clerezza TDB Storage
>> provider. See the following image (GC = garbage collection):
>>
>>
>>
>>
>> Looking more closely I noticed that Apache Jena is able to close a
>> graph (graph.close()) But Clerezza is not using this feature and is
>> keeping the graph open all the time.
>
> Jena graphs backed by TDB are simply views of the dataset - they don't
> have any state associated with them directly.  If the reference become
> inaccessible, GC should clean up.
Hi Andy,

The problem, as far as I can tell, is not in Jena TDB itself. The Jena
TDB bundle is still active/running. Only the Clerezza TDB Provider
bundle is stopped (by me). Like my image shows a normal GC does not
release all of the memory. Only after stopping the Clerezza TDB Provider
memory allocated for importing is release. Because of stopping this
particular bundle all jena datastructures become inaccessible and
eligible for GC. Just like the image shows.

My reasoning is that since the Clerezza TDB Provider has a map with weak
references to Jena models these references are never properly garbage
collected. Since I use the same graph all the time all data gets
accumulated and resulting in out of memory. Looking at a memory dump,
most space is occupied by byte arrays containing the imported data.

I use a nasty hack to prevent this dreaded out of memory. After every
import I restart the Clerezza TDB Provider bundle programmatically (hail
OSGI for I wouldn't know how to do this without OSGI). Like this I have
been able to import more that 300 files in a row (still running).

Regards,

Minto




Mime
View raw message