lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Ochoa <>
Subject Re: Performance tips when creating a large index from database.
Date Thu, 22 Oct 2009 13:59:38 GMT
Hi Paul:
   Mostly of the time indexing big tables is spent on the table full
scan and network data transfer.
   Please take a quick look at my OOW08 presentation about Oracle
Lucene integration:

    specially slides 13 and 14 which shows time involved during a
WikiPedia dump indexing inside an Oracle database.
    Best regards, Marcelo.
On Thu, Oct 22, 2009 at 9:45 AM, Paul Taylor <> wrote:
> I'm building a lucene index from a database, creating 1 about 1 million
> documents, unsuprisingly this takes quite a long time.
> I do this by sending a query  to the db over a range of ids , (10,000)
> records
> Add these results in Lucene
> Then get next 10,0000 and so on.
> When completed indexing I then call optimize()
> I also set  indexWriter.setMaxBufferedDocs(1000) and
>  indexWriter.setMergeFactor(3000) but don't fully understand these values.
> Each document contains about 10 small fields
> I'm looking for some ways to improve performance.
> This index writing is single threaded, is there a way I can multi-thread
> writing to the indexing ?
> I only call optimize() once at the end, is the best way to do it.
> I'm going to run a profiler over the code, but are there any rules of thumbs
> on the best values to set for MaxBufferedDocs and Mergefactor()
> thanks Paul
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Marcelo F. Ochoa
Want to integrate Lucene and Oracle?
Is Oracle 11g REST ready?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message