jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Guggisberg" <stefan.guggisb...@gmail.com>
Subject Re: Jackrabbit performance with large binaries
Date Fri, 08 Dec 2006 09:16:49 GMT
hi joe,

On 12/8/06, jdente@21technologies.com <jdente@21technologies.com> wrote:
> Hi,
> I've been storing binary files of different sizes using
> SimpleDBPersistenceManager configured to use postgresql.  I have
> successfully added files of 2.5 MB (around 1 second to save) up through
> 103 MB (around 80 seconds to save).  I am storing the binary files by
> creating a file system using nt:folder, nt:file, and nt:resource.  The
> binary files are then getting streamed into the jcr:data field of the
> appropriate resource node:
>
>           Node resourceNode = fileNode.addNode("jcr:content",
> "nt:resource");
>         resourceNode.setProperty("jcr:mimeType",
> typeHandler.getMimeType());
>         resourceNode.setProperty("jcr:encoding",
> typeHandler.getTextEncoding());
>         resourceNode.setProperty("jcr:data", resourceInput);
>
>         resourceInput is defined as a BufferedInputStream(new
> FileInputStream(binaryFile), 16384);
>         I then save the session.
>
> I have been getting a lot of out of memory exceptions running these tests.
>  The ammount of memory needed to successfully save a file increases
> linearly with the size of the file.  In order to avoid an out of memory
> exception I need to set aside at least 7.5 times as much memory in the VM
> as the size of the file I want to save.  I have a similar problem when
> deleting files, since the entire node is brought into transient memory
> before it is deleted.  Is there a better way to save binary content that
> won't require a constant increase in memory? Is there any way to avoid
> bringing the entire file into memory before it's saved (or bringing the
> entire node into memory again when it's deleted)?

this sounds really bad. jackrabbit should be able to store e.g. a 500mb file
with 64mb of jvm heap without any problems.

large binary data in jackrabbit is always streamed, never materialized.

i guess the postgress jdbc driver does materialize the binary stream,
hence the increase in memory you experience. i don't know much about
postgress, i only verified that the postgress schema works in general.

when you use jackrabbit's default persistence (i.e. embedded derby)
you shouldn't have this problem.

i'd say you have 3 options:
1. store binary data in the fs rather than in the db (externalBlobs=true);
    however, the fs is not transactional; if you experience a power loss in
    the middle of a transaction you might end up with inconsistent binary
    data (e.g. file has been updated although tx never succeeded).
2. use another db (e.g. derby)
3. do some research regarding this issue in the postgress mailing lists;
    maybe there's a configuration option or something similar

cheers
stefan

>
> Thanks for the help,
> Joe.
>

Mime
View raw message