lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Performance question
Date Tue, 06 Jan 2004 11:51:49 GMT

--- Scott Smith <> wrote:
> I have an application that is reading in XML files and indexing them.
>  Each
> XML file is 3K-6K bytes.  This application preloads a database that I
> will
> add to "on the fly" later.  However, all I want it to do initially is
> take
> some existing files and create the initial index as quick as I can.  
> Since I want to index "on the fly" later, I set the merge factor to
> 10.  I'm
> assuming that I can't create the index initially with one merge
> factor
> (e.g., 100) and then change the merge factor later (true?).

I believe this is wrong.  You can change the merge factor at any time. 
I haven't tested this, though.

> What I see is that it takes 1-3 seconds per xml file to do the index.
>  This
> means I'm indexing around 150k bytes per minute.  I also notice that
> the CPU
> utilization rarely exceeds 5% (looking at task manager on a Windows
> box).  I
> use Xerces to read in the files (SAX interface) and I don't close or
> optimize the index between stories nor do I sleep anyplace.  I've
> looked at
> the page fault numbers and they aren't changing much.  I guess I
> would have
> expected that I would have pretty much pegged the CPU and seen much
> faster
> indexing.
> Any ideas/suggestions? 

Check how much time XML parsing is taking, and how much the actual
indexing.  Lucene indexing is IO bound, not CPU bound, so what you are
seeing (5% CPU usage) sounds like Lucene may be the bottleneck.  But
check your XML parsing code.
Post the code, if you want.
In 1.3 version there are 2 other indexing parameters that you can use
for tuning.  You can try playing with those.  You can also give JVM
more memory.  One of my articles on the Resources page of Lucene's site
mentions this type of stuff.


Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message