jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Moss" <mos...@googlemail.com>
Subject Re: Jackrabbit performance when adding many documents to repository.
Date Tue, 12 Dec 2006 09:36:59 GMT
On 12/11/06, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> Hi,
> On 12/11/06, David Moss <mossdo@googlemail.com> wrote:
> > I'm looking for some tips to improve performance when adding several
> > documents to the repository.
> > [...]
> > Can anyone advise on the best way to approach this task?
> The save() operation is expensive but so is having a too large
> transient space. The best way to do bulk imports for now is to save()
> the transient changes every now and then, like once every 100 added
> nodes. This should give you a nice performance boost.

Hmm, I've tried this but there didn't seem to be a noticeable difference.  I
wonder if it's simply the additional cost of indexing the documents that
takes so long?

Note also that the RMI layer is not a very efficient way to access the
> repository. For best performance with bulk operations over the RMI
> layer I would definitely recommend using the XML import/export
> operations since they simply stream the XML data over the network.

Thanks.  How would I go about doing this?  I need to be able to add non-xml
documents to the repository in a way that allows them to be indexed and
searched through Lucene.  Can I simply wrap the document binary in a minimal
xml document?

> Jukka Zitting

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message