lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Performance tips when creating a large index from database.
Date Thu, 22 Oct 2009 13:14:53 GMT
Besides the other suggestions, I'd really, really, really put
some instrumentationin the code and see where you're spending your time. For
a fast hint, put
a cumulative timer around your indexing part only. This will indicate
whether
the time is consumed in querying your database or indexing......

I'd also just use the default merge factor etc. until you answer
this question. They weren't just chosen at random <G>.

Something like this.
long elapsed = 0;
while (more database records) {
    get database record
    long temp = get time
    index Lucene doc
    elapsed += (get time) - temp
}
print time


It's unclear from your message whether you're calling optimize after
every 10,000 docs or not, but don't.

But I really suspect you're spending time interacting with the DB.

If these responses aren't all that helpful, some code samples
would help...

Best
Erick


On Thu, Oct 22, 2009 at 8:45 AM, Paul Taylor <paul_t100@fastmail.fm> wrote:

> I'm building a lucene index from a database, creating 1 about 1 million
> documents, unsuprisingly this takes quite a long time.
> I do this by sending a query  to the db over a range of ids , (10,000)
> records
> Add these results in Lucene
> Then get next 10,0000 and so on.
> When completed indexing I then call optimize()
> I also set  indexWriter.setMaxBufferedDocs(1000) and
>  indexWriter.setMergeFactor(3000) but don't fully understand these values.
> Each document contains about 10 small fields
>
> I'm looking for some ways to improve performance.
>
> This index writing is single threaded, is there a way I can multi-thread
> writing to the indexing ?
> I only call optimize() once at the end, is the best way to do it.
> I'm going to run a profiler over the code, but are there any rules of
> thumbs on the best values to set for MaxBufferedDocs and Mergefactor()
>
> thanks Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message