jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sridhar Raman" <sridhar.ra...@gmail.com>
Subject Re: Help with SearchIndex parameters
Date Thu, 07 Feb 2008 11:32:35 GMT
>
> yes, most probably. Because Jackrabbit stores any pending modification in
> memory, the heap is probably used up and the GC runs very often in your
> import.
> try saving after 1000 nodes.
>

How do I go about doing this?  I import 20000 nodes in one go, and these
elements are all same-level children of a node (let's call it BooksNode).
Currently, I am doing a BooksNode.save() that results in the long commit
time.  How do I actually split this save into groups of 1000 nodes?

Is there any save on an iterator?  Or do I need to modify the NodeImpl.save()
of jackrabbit to do a group-wise save when the node count is above a certain
level?

On Feb 6, 2008 1:48 PM, Marcel Reutegger <marcel.reutegger@gmx.net> wrote:

> Sridhar Raman wrote:
> > I am too sure whether the problem we are facing can be solved by
> tweaking
> > around with the SearchIndex parameters, but I want to give it a shot.
>  The
> > gist of the problem we are facing is that our importing of nodes is very
> > very slow.
>
> how is your content structured? how many properties do your nodes have on
> average. are they any binary properties?
>
> > We have around 25000 nodes that are being imported, and then being
> committed
> > by a single session.save().  This particular operation takes a long
> time.
> > The index folder showed no activity for almost an hour, before it began
> to
> > begin creating the indexes.  Could this be because of some faulty
> > SearchIndex parameters?  I haven't changed the parameters from the
> default
> > values.
>
> no, I don't think so. nodes are only indexed on commit time. in a first
> step the
> nodes are stored using the configured persistence manager and in a second
> step
> indexed by the query handler.
>
> > Also, would the import process be faster if I did the save() in multiple
> > steps?
>
> yes, most probably. Because Jackrabbit stores any pending modification in
> memory, the heap is probably used up and the GC runs very often in your
> import.
> try saving after 1000 nodes.
>
> regards
>  marcel
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message