lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject Re: Indexing becomes slow with time
Date Thu, 30 Apr 2009 17:10:28 GMT

If you're CPU-bound - I've had issues before with GC in long-running indexing tasks loading
very large volumes (100s of millions) of docs. I was seeing lots of CPU usage tied up in GC.

I solved all these problems by firing batches of indexing activity off in seperate processes
then immediately killing them after use. This is the most effective form of garbage collection
you can get...
I don't see any advantage (other than hotspotting) of keeping any of the resources created
during an indexing batch after that batch has been committed. Why burn CPU cycles garbage
collecting when you know there is nothing of value to be retained?

Clearly this may not be an approach where batches of updates are committed very frequently
(i.e. when measured in seconds not minutes) but for heavy ingest this approach can sit quite
happily alongside a seperate long-running process used for servicing searches without loading
the search process with the task of garbage collecting the indexing debris. The search process
can periodically reopen IndexReader to see the new segments created in other indexing processes.

Cheers
Mark




----- Original Message ----
From: Erick Erickson <erickerickson@gmail.com>
To: java-user@lucene.apache.org
Sent: Thursday, 30 April, 2009 14:46:21
Subject: Re: Indexing becomes slow with time

This is surprising behavior, which is another way of saying that,
given what you've said so far, this shouldn't be happening. I'd
really look at system metrics, like whether you're swapping
etc. In particular you might want to try varying how big you
allow your memory footprint to grow before you flush, this is
in the doc Ian pointed out under *
Flush by RAM usage instead of document count*

There's no need to periodically optimize, just do that at the end
if you must.

Best
Erick

On Thu, Apr 30, 2009 at 6:23 AM, liat oren <oren.liat@gmail.com> wrote:

> Yes, I do run optimize...
>
> I did start looking at these tips in the last few days, but didn't think
> the
> optimize makes it so slow.
>
> Thanks!
>
> 2009/4/30 Ian Lea <ian.lea@gmail.com>
>
> > Are you maybe running optimize after every n documents?  There are
> > lots of tips in
> > http://wiki.apache.org/lucene-java/ImproveIndexingSpeed.
> >
> >
> > --
> > Ian.
> >
> >
> > On Thu, Apr 30, 2009 at 8:29 AM, liat oren <oren.liat@gmail.com> wrote:
> > > Hi,
> > >
> > > I noticed that when I start to index, it indexes 7 documents a second.
> > After
> > > 30 minutes it goes down to 3 documents a second.
> > > After two hours it becomes very slow (I stopped it when it arrived to
> > 320MB
> > > and did 1 document in almost a minute)
> > >
> > > As you can see, it happens only after 2000, 3000 documnet.
> > > Should I split them into more indexes?
> > >
> > >
> > > Thanks,
> > > Liat
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message