lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Murdoch, Paul" <>
Subject RE: Batch Indexing - best practice?
Date Mon, 15 Mar 2010 15:02:22 GMT
Thanks.  I'll try lowering the merge factor and see if speed increases.
The indexing is threaded....similar to the utility class in Listing 10.1
from Lucene in Action.  Search speed is great once the index is
built....close to real time.  So my main problem is getting the indexing
speed fixed.  I do use the StandardAnalyzer for most of my fields.  What
type of performance level should I be trying to hit for indexing
(docs/sec)...just to give me an idea of what to shoot for?


-----Original Message-----
] On Behalf Of Mark Miller
Sent: Monday, March 15, 2010 10:48 AM
Subject: Re: Batch Indexing - best practice?

On 03/15/2010 10:41 AM, Murdoch, Paul wrote:
> Hi,
> I'm using Lucene 2.9.2.  Currently, when creating my index, I'm
> indexWriter.addDocument(doc) for each Document I want to index.  The
> Documents aren't large and I'm averaging indexing about 500 documents
> every 90 seconds.  I'd like to try and speed this up....unless 90
> seconds for 500 Documents is reasonable.  I have the merge factor set
> 1000.  Do you have any suggestions for batch indexing?  Is there
> something like indexWriter.addDocuments(Document[] docs) in the API?
> Thanks.
> Paul
You should lower that merge factor - thats *really* high.

You shouldn't really need much more than 50 or so ... and for search 
speed your going to want fewer segments anyway -
if your just going to end up optimizing at the end, there is no reason 
for such a large merge factor - you will pay for most of what
you saved when you optimize.

That is very slow by the way. Should be much faster - especially if you 
are using multiple threads.

- Mark

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message