lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Performance tips when creating a large index from database.
Date Thu, 22 Oct 2009 13:08:24 GMT
See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed.
That includes some info on merge and buffer factors, and recommends
multiple threads.  When I've done this sort of thing in the past it
has tended to be the database that is the problem, but maybe your
database is faster than mine.  Only calling optimize at the end is
correct.  You don't need to call it at all.


--
Ian.


On Thu, Oct 22, 2009 at 1:52 PM, Glen Newton <glen.newton@gmail.com> wrote:
> You might want to consider using LuSql, which is a high performance,
> multithreaded, well documented tool designed specifically for moving
> data from a JDBC database into Lucene (you didn't say if it was a
> JDBC-accessible db...)
>  http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
>
> Disclosure: I am the author of LuSql.
>
> -Glen Newton
>  http://zzzoot.blogspot.com/
>  http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/Glen_Newton
>
>
> 2009/10/22 Paul Taylor <paul_t100@fastmail.fm>:
>> I'm building a lucene index from a database, creating 1 about 1 million
>> documents, unsuprisingly this takes quite a long time.
>> I do this by sending a query  to the db over a range of ids , (10,000)
>> records
>> Add these results in Lucene
>> Then get next 10,0000 and so on.
>> When completed indexing I then call optimize()
>> I also set  indexWriter.setMaxBufferedDocs(1000) and
>>  indexWriter.setMergeFactor(3000) but don't fully understand these values.
>> Each document contains about 10 small fields
>>
>> I'm looking for some ways to improve performance.
>>
>> This index writing is single threaded, is there a way I can multi-thread
>> writing to the indexing ?
>> I only call optimize() once at the end, is the best way to do it.
>> I'm going to run a profiler over the code, but are there any rules of thumbs
>> on the best values to set for MaxBufferedDocs and Mergefactor()
>>
>> thanks Paul
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
>
> --
>
> -
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message