lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Performance tips when creating a large index from database.
Date Thu, 22 Oct 2009 12:45:43 GMT
I'm building a lucene index from a database, creating 1 about 1 million 
documents, unsuprisingly this takes quite a long time.
I do this by sending a query  to the db over a range of ids , (10,000) 
records
Add these results in Lucene
Then get next 10,0000 and so on.
When completed indexing I then call optimize()
I also set  indexWriter.setMaxBufferedDocs(1000) and  
indexWriter.setMergeFactor(3000) but don't fully understand these values.
Each document contains about 10 small fields

I'm looking for some ways to improve performance.

This index writing is single threaded, is there a way I can multi-thread 
writing to the indexing ?
I only call optimize() once at the end, is the best way to do it.
I'm going to run a profiler over the code, but are there any rules of 
thumbs on the best values to set for MaxBufferedDocs and Mergefactor()

thanks Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message